Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config: Switch from jsonschema to pydantic #6117

Merged
merged 3 commits into from
Oct 25, 2023

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Sep 7, 2023

The configuration of an AiiDA instance is written in JSON format to the config.json file. The schema is defined using jsonschema to take care of validation, however, some validation, for example of the config options was still happening manually.

Other parts of the code want to start using pydantic for model definition and configuration purposes, which has become a de-factor standard for these use-cases in the Python ecosystem. Before introducing another dependency, the existing jsonschema approach is replaced by pydantic in current code base first.

@sphuber
Copy link
Contributor Author

sphuber commented Sep 7, 2023

Note that this currently is working against pydantic==1.10. Ideally I want to use v2, which has been out for a while, but some indirect dependencies upstream have not migrated yet and so we cannot be compatible with it just yet. But they should be coming close. Probably we will update this PR to use v2 before merging it. This would be nice because pydantic v2 should be a lot faster, which matters since the parsing of the config will happen in almost all code pathways of AiiDA, so performance is important.

@sphuber
Copy link
Contributor Author

sphuber commented Sep 7, 2023

@danielhollas this would get rid of jsonschema. However, it does introduce pydantic. Note that we want to add this for other reasons than jsonschema putting a load on import and runtime, but I think it would still be good to have an idea of the impact of pydantic. Would you be able to run your benchmark script on this branch and compare to current main? I plan to also add a version of this branch that uses pydantic v2 which should be a lot faster in runtime, but I don't know about import time.

@danielhollas
Copy link
Collaborator

@sphuber sure, I'll take a look, thanks! I was curious and already looked at import time of pydantic v2, and while it was about 30ms faster than jsonschema it was unfortunately still not negligible. I'll take a closer look.

@sphuber sphuber force-pushed the feature/pydantic branch 5 times, most recently from 6aae2f4 to 39185eb Compare September 12, 2023 10:34
@danielhollas
Copy link
Collaborator

Hi @sphuber,

Would you mind merging current main into this branch so that I can do the timings with the just merged import improvements?

Note that I am on a workshop this week, so unless there is a rush I'll do it next week.

@sphuber
Copy link
Contributor Author

sphuber commented Sep 12, 2023

Would you mind merging current main into this branch so that I can do the timings with the just merged import improvements?

Done. Thanks a lot. The first commit is against pydantic v1, and the second commit updates to v2. Might be interesting to see the difference between the two. Even though we almost certainly will have to go for v2.

Note that I am on a workshop this week, so unless there is a rush I'll do it next week.

No worries

@danielhollas
Copy link
Collaborator

Unfortunately, pydantic v2 seems to cause a sizeable import time regression. :-(

main branch

Benchmark 1: python -c 'import aiida'
  Time (mean ± σ):      90.4 ms ±   4.5 ms    [User: 80.1 ms, System: 10.2 ms]
  Range (minmax):    83.2 ms100.3 ms    31 runs

Benchmark 1: python -c 'import aiida.orm'
  Time (mean ± σ):     285.7 ms ±   7.0 ms    [User: 824.0 ms, System: 712.9 ms]
  Range (minmax):   278.6 ms299.7 ms    10 runs

Benchmark 1: python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     212.1 ms ±  12.5 ms    [User: 191.5 ms, System: 20.1 ms]
  Range (minmax):   199.4 ms238.2 ms    12 runs

pydantic v2

(aiida-dev) hollas-atmospec:~/atmospec/aiida-core (feature/pydantic)
17:31 $ hyperfine -w 3 "python -c 'import aiida'"
Benchmark 1: python -c 'import aiida'
  Time (mean ± σ):     183.4 ms ±   6.9 ms    [User: 166.4 ms, System: 16.8 ms]
  Range (minmax):   174.9 ms198.2 ms    16 runs
 
Benchmark 1: python -c 'import aiida.orm'
  Time (mean ± σ):     392.0 ms ±   9.1 ms    [User: 907.7 ms, System: 733.4 ms]
  Range (minmax):   378.3 ms405.7 ms    10 runs
 
Benchmark 1: python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     274.3 ms ±  10.7 ms    [User: 253.7 ms, System: 20.0 ms]
  Range (minmax):   263.9 ms298.0 ms    10 runs

pydantic v1

Benchmark 1: python -c 'import aiida'
  Time (mean ± σ):     135.3 ms ±   7.8 ms    [User: 121.8 ms, System: 13.3 ms]
  Range (minmax):   112.2 ms146.2 ms    21 runs

Benchmark 1: python -c 'import aiida.orm'
  Time (mean ± σ):     330.1 ms ±   6.7 ms    [User: 857.7 ms, System: 722.0 ms]
  Range (minmax):   320.9 ms345.7 ms    10 runs
 
Benchmark 1: python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     219.6 ms ±  14.3 ms    [User: 199.2 ms, System: 19.8 ms]
  Range (minmax):   196.8 ms240.9 ms    13 runs

fastjsonschema branch

Benchmark 1: python -c 'import aiida'
  Time (mean ± σ):      91.4 ms ±   4.3 ms    [User: 79.8 ms, System: 11.5 ms]
  Range (minmax):    83.6 ms98.8 ms    32 runs
 
(aiida-dev) hollas-atmospec:~/atmospec/aiida-core (fastjsonschema)
17:39 $ hyperfine -w 3 "python -c 'import aiida.orm'"
Benchmark 1: python -c 'import aiida.orm'
  Time (mean ± σ):     290.9 ms ±  11.4 ms    [User: 814.9 ms, System: 727.7 ms]
  Range (minmax):   276.1 ms305.5 ms    10 runs
 
(aiida-dev) hollas-atmospec:~/atmospec/aiida-core (fastjsonschema)
17:39 $ hyperfine -w 3 "python -c 'import aiida.cmdline.commands'"
Benchmark 1: python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     176.9 ms ±  12.3 ms    [User: 160.2 ms, System: 16.2 ms]
  Range (minmax):   160.2 ms196.5 ms    15 runs

@ltalirz
Copy link
Member

ltalirz commented Sep 22, 2023

possibly relevant thread pydantic/pydantic#6748

seems to me it makes sense to move this migration to pydantic v2 on hold until they figure out performance

@sphuber
Copy link
Contributor Author

sphuber commented Sep 22, 2023

seems to me it makes sense to move this migration to pydantic v2 on hold until they figure out performance

We are anyway still blocked by materialsproject/emmet#790
That being said, is a 50 ms difference really a blocker? Especially since it is a one-off when you first import the module. To me doesn't seem like a blocking issue.

@ltalirz
Copy link
Member

ltalirz commented Sep 22, 2023

I see a 90ms difference for importing aiida, doubling import time.

These timings are not important for running AiiDA scripts, it's always about the responsiveness of the verdi cli

@sphuber
Copy link
Contributor Author

sphuber commented Sep 22, 2023

I found that they added a new option defer_build that when set to True, will cause the models to be built lazily upon constructions, and not on definition. This was added in v2.1. I ran the benchmark with this, and it significantly improves the import aiida benchmark to be equivalent of pydantic v1.

main

$ hyperfine -w 3 "python -c 'import aiida'"
Benchmark 1: python -c 'import aiida'
  Time (mean ± σ):      70.3 ms ±  11.3 ms    [User: 59.3 ms, System: 5.6 ms]
  Range (min … max):    56.7 ms …  98.1 ms    35 runs
$ hyperfine -w 3 "python -c 'import aiida.orm'"
Benchmark 1: python -c 'import aiida.orm'
  Time (mean ± σ):     346.1 ms ±  27.2 ms    [User: 423.8 ms, System: 522.9 ms]
  Range (min … max):   312.9 ms … 391.5 ms    10 runs
$ hyperfine -w 3 "python -c 'import aiida.cmdline.commands'"
Benchmark 1: python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     366.5 ms ±  49.8 ms    [User: 310.4 ms, System: 27.7 ms]
  Range (min … max):   318.3 ms … 456.1 ms    10 runs

pydantic v1

$ hyperfine -w 3 "python -c 'import aiida'"
Benchmark 1: python -c 'import aiida'
  Time (mean ± σ):      97.4 ms ±  13.7 ms    [User: 80.2 ms, System: 9.4 ms]
  Range (min … max):    83.7 ms … 127.2 ms    27 runs
$ hyperfine -w 3 "python -c 'import aiida.orm'"
Benchmark 1: python -c 'import aiida.orm'
  Time (mean ± σ):     286.1 ms ±  35.2 ms    [User: 376.4 ms, System: 523.7 ms]
  Range (min … max):   254.3 ms … 350.1 ms    10 runs
$ hyperfine -w 3 "python -c 'import aiida.cmdline.commands'"
Benchmark 1: python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     319.9 ms ±  42.0 ms    [User: 273.6 ms, System: 21.1 ms]
  Range (min … max):   276.6 ms … 372.3 ms    10 runs

pydantic v2

$ hyperfine -w 3 "python -c 'import aiida'"
Benchmark 1: python -c 'import aiida'
  Time (mean ± σ):     299.0 ms ±  26.3 ms    [User: 258.9 ms, System: 17.6 ms]
  Range (min … max):   267.5 ms … 345.8 ms    10 runs
$ hyperfine -w 3 "python -c 'import aiida.orm'"
Benchmark 1: python -c 'import aiida.orm'
  Time (mean ± σ):     339.4 ms ±  34.8 ms    [User: 432.9 ms, System: 503.0 ms]
  Range (min … max):   309.4 ms … 399.8 ms    10 runs
$ hyperfine -w 3 "python -c 'import aiida.cmdline.commands'"
Benchmark 1: python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     353.6 ms ±  34.8 ms    [User: 305.8 ms, System: 20.4 ms]
  Range (min … max):   328.4 ms … 417.5 ms    10 runs

pydantic v2 defer_build

$ hyperfine -w 3 "python -c 'import aiida'"
Benchmark 1: python -c 'import aiida'
  Time (mean ± σ):     115.5 ms ±  16.1 ms    [User: 94.5 ms, System: 12.1 ms]
  Range (min … max):    99.3 ms … 152.2 ms    22 runs
$ hyperfine -w 3 "python -c 'import aiida.orm'"
Benchmark 1: python -c 'import aiida.orm'
  Time (mean ± σ):     320.6 ms ±  25.9 ms    [User: 418.4 ms, System: 510.9 ms]
  Range (min … max):   288.6 ms … 364.0 ms    10 runs
$ hyperfine -w 3 "python -c 'import aiida.cmdline.commands'"
Benchmark 1: python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     367.8 ms ±  46.8 ms    [User: 307.9 ms, System: 30.7 ms]
  Range (min … max):   328.7 ms … 478.6 ms    10 run

@ltalirz
Copy link
Member

ltalirz commented Sep 22, 2023

Could you please benchmark some relevant cli commands as well, such as verdi config for example? There is also the load time test that is run as part of CI if I remember well.

In the end we care about responsiveness of the cli.
If that goes from 70ms to 115ms (just taking aiida import time as an example) without a clear benefit to the user (I get the developer benefit of course), I would still advise against the change.

@sphuber
Copy link
Contributor Author

sphuber commented Sep 22, 2023

In the end we care about responsiveness of the cli.
If that goes from 70ms to 115ms (just taking aiida import time as an example)

Sure, except that the responsiveness of verdi is determined by the import load time of aiida.cmdline where the relative change is a lot less pronounced.

without a clear benefit to the user (I get the developer benefit of course) I would still advise against the change.

This switch is not just for developer benefit though, quite the opposite. It is a precursor to a bunch of features that will allow pluginnable verdi commands, such as verdi profile setup for arbitrary storage backends and for allowing clients to dynamically inspect schemas of various classes.

@ltalirz
Copy link
Member

ltalirz commented Sep 22, 2023

Ok, fair point for the import times.

For the import of aiida.cmdline.commands the timings don't look entirely consistent to me between Daniel's and yours.
In your case, pydantic v2 is faster than main, and deferring model creation slows things down (?)
Do we need to increase the number of runs?

In any case, the fastjsonschema branch does seem significantly faster...

This switch is not just for developer benefit though, quite the opposite. It is a precursor to a bunch of features that will allow pluginnable verdi commands, such as verdi profile setup for arbitrary storage backends and for allowing clients to dynamically inspect schemas of various classes.

Ok, that sounds like a valuable feature for plugin developers (still perhaps not most users).

For me, the key metric is cli latency (both for commands that involve the DB and those that do not). If we can get away with minimal changes to cli startup times (as you tests would suggest), then that is ok from my perspective.

@danielhollas
Copy link
Collaborator

possibly relevant thread pydantic/pydantic#6748

Interesting. It looks like lot of people got impacted so there's a good chance things will improve, and it seems they are actively working on it. pydantic/pydantic#7423

That being said, is a 50 ms difference really a blocker? Especially since it is a one-off when you first import the module. To me doesn't seem like a blocking issue.

Well, if we're comparing worlds with or without pydantic, then we should compare with the fastjsonschema branch as @ltalirz mentioned, and there the difference is almost 100ms for the aiida.cmdline.commands import.

A somewhat orthogonal question is, why are we automatically parsing the configuration when importing aiida.cmdline.commands? Surely I shouldn't need the config if I run verdi --version or if I want to tab-complete a commands (for which every 50-100ms is very noticeable). Not sure how difficult or possible it would be to disentagle this. @sphuber if you think it would be possible I can take a closer look.

@sphuber
Copy link
Contributor Author

sphuber commented Sep 22, 2023

Not sure how difficult or possible it would be to disentagle this. @sphuber if you think it would be possible I can take a closer look.

This would be great indeed if we could prevent it from loading when not loading. However, I think it might be more tightly coupled then we expect. For example, the -v/--verbosity option, which is automatically added to all verdi commands, uses a default value that comes from the logging.aiida_loglevel option. We might be able to load this dynamically though when a command is actually called. There might be other things like this example. But I definitely think it is very much worth looking into. I agree that having a CLI that is as snappy as possible is valuable (I for one hate lag with a passion) but I also think that having pydantic will be very valuable for various features in aiida-core.

@sphuber
Copy link
Contributor Author

sphuber commented Oct 23, 2023

I am planning to merge this soon. @edan-bainglass did you still want to give this a look?

@danielhollas I have updated the branch onto the lastest main such that it contains your latest changes to the CLI speed. I get the following timings now for loading verdi (made sure to uninstall trogon):

main

(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ hyperfine -w 3 verdi
Benchmark 1: verdi
  Time (mean ± σ):     107.3 ms ±  26.2 ms    [User: 86.9 ms, System: 12.0 ms]
  Range (min … max):    84.4 ms … 170.4 ms    32 runs

This PR

(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ hyperfine -w 3 verdi
Benchmark 1: verdi
  Time (mean ± σ):     162.0 ms ±  18.4 ms    [User: 133.4 ms, System: 15.9 ms]
  Range (min … max):   135.8 ms … 204.8 ms    16 runs

There is a ~50 ms slowdown due to the changes. Note though that this PR also contains an update to sqlalchemy v2.0, which may be responsible for some, but I think most of it is probably pydantic. I am sorry to be undoing some of the hard-work you have been doing on the import times, but I think this slow-down is acceptable for the benefits that it will give us. To me the tab-completion feels snappy enough in any case.

@edan-bainglass
Copy link
Member

Yes, but doubtful I'll have time. Woke up feverish. Reminiscent of what I had during the coding week 😬 Will be 😴 most of the week. Go ahead if you need to merge.

@sphuber
Copy link
Contributor Author

sphuber commented Oct 23, 2023

Yes, but doubtful I'll have time. Woke up feverish. Reminiscent of what I had during the coding week 😬 Will be 😴 most of the week. Go ahead if you need to merge.

Sorry to hear, hope you feel better soon. @mbercx @unkcpz could one of you two maybe have a look?

@danielhollas
Copy link
Collaborator

@sphuber 50ms sounds acceptable, and hopefully will get better in the future. 🤞

I am confused, why is the SQLAlchemy change included here? Makes it harder to review the changes.

@sphuber
Copy link
Contributor Author

sphuber commented Oct 23, 2023

@sphuber 50ms sounds acceptable, and hopefully will get better in the future. 🤞

I am confused, why is the SQLAlchemy change included here? Makes it harder to review the changes.

Needed it to check PRs downstream worked correctly that rely on both this PR and the sqlalchemy one. But I removed the commit again

Copy link
Collaborator

@danielhollas danielhollas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about pydantic so just did a quick pass.

aiida/manage/configuration/config.py Outdated Show resolved Hide resolved
tests/manage/configuration/test_options.py Outdated Show resolved Hide resolved
@danielhollas
Copy link
Collaborator

@sphuber could you also benchmark verdi profile list (or some other command that actually loads the profile config, but does not touch DB)

@sphuber
Copy link
Contributor Author

sphuber commented Oct 24, 2023

main

(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ hyperfine -w 3 "verdi profile list"
Benchmark 1: verdi profile list
  Time (mean ± σ):     149.4 ms ±  19.7 ms    [User: 123.4 ms, System: 14.2 ms]
  Range (min … max):   129.8 ms … 195.7 ms    18 runs

PR

(aiida-py311) sph@invader:~/code/aiida/env/dev/aiida-core$ hyperfine -w 3 "verdi profile list"
Benchmark 1: verdi profile list
  Time (mean ± σ):     221.3 ms ±  28.2 ms    [User: 191.8 ms, System: 12.3 ms]
  Range (min … max):   197.6 ms … 281.0 ms    15 runs

Slightly bigger difference ~70 ms. I see quite a bit of variation in timings though, despite the warmup. So not sure how reliable these benchmarks are.

@unkcpz
Copy link
Member

unkcpz commented Oct 24, 2023

Sorry to hear, hope you feel better soon. @mbercx @unkcpz could one of you two maybe have a look?

I'll give it a look later today, not sure I can follow everything but I'll try my best.

P.S. Seems everyone in PSI was knocked down by some virus, I got ill last Friday and just recovered yesterday. Guess @mbercx maybe patient zero 🤔

@sphuber sphuber force-pushed the feature/pydantic branch 3 times, most recently from 1436cc4 to 8ac08cf Compare October 24, 2023 14:05
@danielhollas
Copy link
Collaborator

Slightly bigger difference ~70 ms.

Okay, not great, not terrible. 🤷‍♂️ It is a bit unfortunate that we're essentially misusing pydantic here: it was clearly designed to be loaded once and than do lots of validations, but here we essentially have a single validation per load.

I see quite a bit of variation in timings though, despite the warmup. So not sure how reliable these benchmarks are.

Interesting. But there is a variation on main as well so I am not too worried about that.

P.S. Seems everyone in PSI was knocked down by some virus, I got ill last Friday and just recovered yesterday. Guess @mbercx maybe patient zero 🤔

@mbercx What do you have to say for yourself?? You somehow infected me as well, all the way to Oxford!
(in all seriousness, hope you all get better soon! 🤧 )

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 32 conversations, here comes the first review 😉 . This review is on the first commit. Will continue with the second one.

aiida/common/log.py Show resolved Hide resolved
aiida/manage/configuration/config.py Outdated Show resolved Hide resolved
aiida/manage/configuration/config.py Outdated Show resolved Hide resolved
aiida/manage/configuration/config.py Outdated Show resolved Hide resolved
aiida/manage/configuration/config.py Outdated Show resolved Hide resolved
aiida/manage/configuration/options.py Show resolved Hide resolved
tests/manage/configuration/test_configuration.py Outdated Show resolved Hide resolved
Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sphuber, I only have a tiny question on the second commit.

aiida/manage/configuration/options.py Show resolved Hide resolved
@sphuber
Copy link
Contributor Author

sphuber commented Oct 25, 2023

After 32 conversations, here comes the first review 😉 . This review is on the first commit. Will continue with the second one.

Damn, hope you didn't spend too much time on this because there is no real point in reviewing them separately. The first commit was against pydantic v1 and the second just updated to pydantic v2. I had kept them separate so I could go back to v1 if that was necessary for any reason. But I think we will definitely want to go with v2

@unkcpz
Copy link
Member

unkcpz commented Oct 25, 2023

Damn, hope you didn't spend too much time on this because there is no real point in reviewing them separately.

I click on the first commit and start to review and after I finish I find there is another, this is life. I didn't spend too long for the second. But some comment for the first is already changed in the second so please just ignore them.

@sphuber sphuber requested a review from unkcpz October 25, 2023 13:02
@sphuber
Copy link
Contributor Author

sphuber commented Oct 25, 2023

Thanks a lot for the review @unkcpz . I have addressed the comments and pushed some changes

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sphuber! All good from my side. I didn't check the performance, but I trust you and @danielhollas.

@sphuber sphuber merged commit 4203f16 into aiidateam:main Oct 25, 2023
35 checks passed
@sphuber sphuber deleted the feature/pydantic branch October 25, 2023 14:03
@sphuber sphuber mentioned this pull request Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants