-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config: Switch from jsonschema
to pydantic
#6117
Conversation
Note that this currently is working against |
08f9a61
to
9eb373a
Compare
@danielhollas this would get rid of |
@sphuber sure, I'll take a look, thanks! I was curious and already looked at import time of |
6aae2f4
to
39185eb
Compare
Hi @sphuber, Would you mind merging current main into this branch so that I can do the timings with the just merged import improvements? Note that I am on a workshop this week, so unless there is a rush I'll do it next week. |
39185eb
to
94d19b6
Compare
Done. Thanks a lot. The first commit is against pydantic v1, and the second commit updates to v2. Might be interesting to see the difference between the two. Even though we almost certainly will have to go for v2.
No worries |
Unfortunately, pydantic v2 seems to cause a sizeable import time regression. :-( main branchBenchmark 1: python -c 'import aiida'
Time (mean ± σ): 90.4 ms ± 4.5 ms [User: 80.1 ms, System: 10.2 ms]
Range (min … max): 83.2 ms … 100.3 ms 31 runs
Benchmark 1: python -c 'import aiida.orm'
Time (mean ± σ): 285.7 ms ± 7.0 ms [User: 824.0 ms, System: 712.9 ms]
Range (min … max): 278.6 ms … 299.7 ms 10 runs
Benchmark 1: python -c 'import aiida.cmdline.commands'
Time (mean ± σ): 212.1 ms ± 12.5 ms [User: 191.5 ms, System: 20.1 ms]
Range (min … max): 199.4 ms … 238.2 ms 12 runs pydantic v2(aiida-dev) hollas-atmospec:~/atmospec/aiida-core (feature/pydantic)
17:31 $ hyperfine -w 3 "python -c 'import aiida'"
Benchmark 1: python -c 'import aiida'
Time (mean ± σ): 183.4 ms ± 6.9 ms [User: 166.4 ms, System: 16.8 ms]
Range (min … max): 174.9 ms … 198.2 ms 16 runs
Benchmark 1: python -c 'import aiida.orm'
Time (mean ± σ): 392.0 ms ± 9.1 ms [User: 907.7 ms, System: 733.4 ms]
Range (min … max): 378.3 ms … 405.7 ms 10 runs
Benchmark 1: python -c 'import aiida.cmdline.commands'
Time (mean ± σ): 274.3 ms ± 10.7 ms [User: 253.7 ms, System: 20.0 ms]
Range (min … max): 263.9 ms … 298.0 ms 10 runs pydantic v1Benchmark 1: python -c 'import aiida'
Time (mean ± σ): 135.3 ms ± 7.8 ms [User: 121.8 ms, System: 13.3 ms]
Range (min … max): 112.2 ms … 146.2 ms 21 runs
Benchmark 1: python -c 'import aiida.orm'
Time (mean ± σ): 330.1 ms ± 6.7 ms [User: 857.7 ms, System: 722.0 ms]
Range (min … max): 320.9 ms … 345.7 ms 10 runs
Benchmark 1: python -c 'import aiida.cmdline.commands'
Time (mean ± σ): 219.6 ms ± 14.3 ms [User: 199.2 ms, System: 19.8 ms]
Range (min … max): 196.8 ms … 240.9 ms 13 runs fastjsonschema branchBenchmark 1: python -c 'import aiida'
Time (mean ± σ): 91.4 ms ± 4.3 ms [User: 79.8 ms, System: 11.5 ms]
Range (min … max): 83.6 ms … 98.8 ms 32 runs
(aiida-dev) hollas-atmospec:~/atmospec/aiida-core (fastjsonschema)
17:39 $ hyperfine -w 3 "python -c 'import aiida.orm'"
Benchmark 1: python -c 'import aiida.orm'
Time (mean ± σ): 290.9 ms ± 11.4 ms [User: 814.9 ms, System: 727.7 ms]
Range (min … max): 276.1 ms … 305.5 ms 10 runs
(aiida-dev) hollas-atmospec:~/atmospec/aiida-core (fastjsonschema)
17:39 $ hyperfine -w 3 "python -c 'import aiida.cmdline.commands'"
Benchmark 1: python -c 'import aiida.cmdline.commands'
Time (mean ± σ): 176.9 ms ± 12.3 ms [User: 160.2 ms, System: 16.2 ms]
Range (min … max): 160.2 ms … 196.5 ms 15 runs |
possibly relevant thread pydantic/pydantic#6748 seems to me it makes sense to move this migration to pydantic v2 on hold until they figure out performance |
We are anyway still blocked by materialsproject/emmet#790 |
I see a 90ms difference for importing These timings are not important for running AiiDA scripts, it's always about the responsiveness of the verdi cli |
I found that they added a new option main
pydantic v1
pydantic v2
pydantic v2
|
94d19b6
to
a892996
Compare
Could you please benchmark some relevant cli commands as well, such as In the end we care about responsiveness of the cli. |
Sure, except that the responsiveness of
This switch is not just for developer benefit though, quite the opposite. It is a precursor to a bunch of features that will allow pluginnable |
Ok, fair point for the import times. For the import of aiida.cmdline.commands the timings don't look entirely consistent to me between Daniel's and yours. In any case, the fastjsonschema branch does seem significantly faster...
Ok, that sounds like a valuable feature for plugin developers (still perhaps not most users). For me, the key metric is cli latency (both for commands that involve the DB and those that do not). If we can get away with minimal changes to cli startup times (as you tests would suggest), then that is ok from my perspective. |
Interesting. It looks like lot of people got impacted so there's a good chance things will improve, and it seems they are actively working on it. pydantic/pydantic#7423
Well, if we're comparing worlds with or without pydantic, then we should compare with the fastjsonschema branch as @ltalirz mentioned, and there the difference is almost 100ms for the A somewhat orthogonal question is, why are we automatically parsing the configuration when importing |
This would be great indeed if we could prevent it from loading when not loading. However, I think it might be more tightly coupled then we expect. For example, the |
9a0a97a
to
7b102cd
Compare
ec7a332
to
fe82ae9
Compare
I am planning to merge this soon. @edan-bainglass did you still want to give this a look? @danielhollas I have updated the branch onto the lastest
|
Yes, but doubtful I'll have time. Woke up feverish. Reminiscent of what I had during the coding week 😬 Will be 😴 most of the week. Go ahead if you need to merge. |
@sphuber 50ms sounds acceptable, and hopefully will get better in the future. 🤞 I am confused, why is the SQLAlchemy change included here? Makes it harder to review the changes. |
fe82ae9
to
dffc6b0
Compare
Needed it to check PRs downstream worked correctly that rely on both this PR and the sqlalchemy one. But I removed the commit again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea about pydantic
so just did a quick pass.
dffc6b0
to
858f4dd
Compare
@sphuber could you also benchmark |
|
I'll give it a look later today, not sure I can follow everything but I'll try my best. P.S. Seems everyone in PSI was knocked down by some virus, I got ill last Friday and just recovered yesterday. Guess @mbercx maybe patient zero 🤔 |
1436cc4
to
8ac08cf
Compare
Okay, not great, not terrible. 🤷♂️ It is a bit unfortunate that we're essentially misusing pydantic here: it was clearly designed to be loaded once and than do lots of validations, but here we essentially have a single validation per load.
Interesting. But there is a variation on main as well so I am not too worried about that.
@mbercx What do you have to say for yourself?? You somehow infected me as well, all the way to Oxford! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After 32 conversations, here comes the first review 😉 . This review is on the first commit. Will continue with the second one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sphuber, I only have a tiny question on the second commit.
Damn, hope you didn't spend too much time on this because there is no real point in reviewing them separately. The first commit was against pydantic v1 and the second just updated to pydantic v2. I had kept them separate so I could go back to v1 if that was necessary for any reason. But I think we will definitely want to go with v2 |
I click on the first commit and start to review and after I finish I find there is another, this is life. I didn't spend too long for the second. But some comment for the first is already changed in the second so please just ignore them. |
Thanks a lot for the review @unkcpz . I have addressed the comments and pushed some changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sphuber! All good from my side. I didn't check the performance, but I trust you and @danielhollas.
The configuration of an AiiDA instance is written in JSON format to the
config.json
file. The schema is defined usingjsonschema
to take care of validation, however, some validation, for example of the config options was still happening manually.Other parts of the code want to start using
pydantic
for model definition and configuration purposes, which has become a de-factor standard for these use-cases in the Python ecosystem. Before introducing another dependency, the existingjsonschema
approach is replaced bypydantic
in current code base first.