Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to pydantic v2 #1745

Merged
merged 17 commits into from Oct 23, 2023
Merged

Migration to pydantic v2 #1745

merged 17 commits into from Oct 23, 2023

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Aug 11, 2023

This PR contains my WIP attempts to migrate to pydantic v2.

As we were doing some hacky stuff with pydantic internals, this process is not straightforward.

The three current stumbling blocks are
a) how we define StrictField and OptimadeField to do injection/removal of certain keys from the final JSONSchema/OpenAPI schemas, which is currently completely broken by the new pydantic-core.
b) complicated Class Config instances we currently use (only deprecated at the moment, so can get away with this for a bit longer) -- I've modified some already and they seem to work
c) the standard issues supporting pymatgen as a dependency (waiting for their sub-deps to update)

Outside of that, this PR basically:

  • refactors settings around the new pydantic_settings package
  • switches out regex for pattern
  • runs bump-pydantic to update several validators
  • changes all validators to use the new v2 format
  • defines some types in a more pydantic way, using Annotated

@CasperWA
Copy link
Member

CasperWA commented Sep 22, 2023

As a note, a (definitely) temporary solution to at least support v2 would be to take advantage of the fact that the whole pydantic v1 API exists under the v1 module in pydantic v2, i.e., anything pydantic v1 (like from pydantic import root_validator) can be imported in pydantic v2 under the v1 module (like from pydantic.v1 import root_validator).
So by having a "local" path to import from pydantic in the repo (e.g., optimade._pydantic) one can support both v1 and v2 by having a wild import from pydantic or pydantic.v1 depending on the major version detected in pydantic.VERSION.

The barrier for implementing this is quite low, but of course, it doesn't then actually migrate to v2 (which would automatically deprecate v1 support). But it might be a nice temporary step for packages depending on OPT that would like to bump to pydantic v2 😉

@ml-evs
Copy link
Member Author

ml-evs commented Sep 27, 2023

As a note, a (definitely) temporary solution to at least support v2 would be to take advantage of the fact that the whole pydantic v1 API exists under the v1 module in pydantic v2, i.e., anything pydantic v1 (like from pydantic import root_validator) can be imported in pydantic v2 under the v1 module (like from pydantic.v1 import root_validator). So by having a "local" path to import from pydantic in the repo (e.g., optimade._pydantic) one can support both v1 and v2 by having a wild import from pydantic or pydantic.v1 depending on the major version detected in pydantic.VERSION.

The barrier for implementing this is quite low, but of course, it doesn't then actually migrate to v2 (which would automatically deprecate v1 support). But it might be a nice temporary step for packages depending on OPT that would like to bump to pydantic v2 😉

I tried this when v2 first came out but none of our FieldInfo hacks work in this case, so someone would need to rewrite this section. As Johan is at the end of his contract now I guess it falls solely on me, so there will be some delay until I can find the time.

The only dep that is really forcing us to upgrade is still pymatgen, which, although it has no pydantic dependence itself, refuses to make the Materials Project API an optional dep, which is now forced to use pydantic v2 proper. I'm making a PR now that at least pins MP-API. I've used the pymatgen connection in optimade-python-tools myself but I'm increasingly thinking we can just deprecate the pymatgen adapter entirely, or at least remove it as a specific testing dep as its such a waste of time trying to keep up with them for our use case

@CasperWA
Copy link
Member

@ml-evs I've been working a bit on this. Not yet done, but I'm making progress.

pyproject.toml Show resolved Hide resolved
@ml-evs
Copy link
Member Author

ml-evs commented Oct 11, 2023

@ml-evs I've been working a bit on this. Not yet done, but I'm making progress.

🫡

@CasperWA
Copy link
Member

Ok. So I got it down to 3 failing tests, all regurgitating the same single error (check CI runs).
For some reason, there's a diff in the OpenAPI schema generated in CI than what I'm generating locally using Py3.10?
I think that's about it?

Also. My commits are quite messy. I've done some static typing here and there - quite sporadically, all the while trying this and that (some of 'this' might still be left in the code here and there), but essentially, mainly going from using JSON Schema (where we do that) and use model_fields and the FieldInfo from there. The main issue with that is getting the type - it can be done through the annotation attribute, but it's a bit hacky to make it work right. But other than that, I think this approach is much better.

Concerning the current implementatins for StrictField and OptimadeField, one can consider using Annotated metadata stuff instead, essentially creating Annotated types with dedicated JSON Schema serializers and such. That would be "cleaner" in the context of pydantic v2.
One could in theory go back to using JSON Schema in the code again as well, using helper functions that may be found through this docs page. There are some utility functions for dealing with the $ref key and resolving definitions.

It seems to me there are several improvements that could be done throughout the code. But I don't think any of us have the time to dedicate to an overhaul, unfortunately.

@codecov
Copy link

codecov bot commented Oct 14, 2023

Codecov Report

Merging #1745 (9da64fd) into master (8510ffa) will decrease coverage by 0.40%.
The diff coverage is 95.27%.

@@            Coverage Diff             @@
##           master    #1745      +/-   ##
==========================================
- Coverage   90.68%   90.29%   -0.40%     
==========================================
  Files          74       75       +1     
  Lines        4616     4605      -11     
==========================================
- Hits         4186     4158      -28     
- Misses        430      447      +17     
Flag Coverage Δ
project 90.29% <95.27%> (-0.40%) ⬇️
validator 90.18% <95.27%> (-0.40%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
optimade/adapters/base.py 96.61% <100.00%> (-0.36%) ⬇️
optimade/adapters/structures/ase.py 98.61% <100.00%> (ø)
optimade/adapters/structures/cif.py 84.09% <ø> (ø)
optimade/adapters/structures/proteindatabank.py 89.28% <ø> (ø)
optimade/adapters/structures/pymatgen.py 98.50% <100.00%> (ø)
optimade/adapters/structures/utils.py 80.88% <100.00%> (ø)
optimade/client/cli.py 85.93% <100.00%> (+0.22%) ⬆️
optimade/filtertransformers/base_transformer.py 97.45% <100.00%> (ø)
optimade/filtertransformers/elasticsearch.py 84.57% <100.00%> (ø)
optimade/filtertransformers/mongo.py 96.66% <ø> (ø)
... and 29 more

... and 2 files with indirect coverage changes

@CasperWA
Copy link
Member

CasperWA commented Oct 14, 2023

Huzzah! It all seems to work 🥳
I also removed the nullable field from properties as this is no longer part of the OpenAPI spec (as of v3.1). See also #1814.

It should be noted that b7c5588 also removes dimension_types and lattice_vectors from being "required" in the schema. I couldn't find any good reason for this to be the case in the specification - but if it's the case, the default value can simply be changed to an ellipsis I'd think?

@CasperWA
Copy link
Member

Will implement usage of Annotated, as this is recommended in FastAPI as of v0.95.0 (see these release notes).

@CasperWA
Copy link
Member

CasperWA commented Oct 15, 2023

The values for some unused query parameters are wrong/does not match the expected type. There should probably be some other way of specifying one's paging strategy than what is currently implemented.

@ml-evs
Copy link
Member Author

ml-evs commented Oct 15, 2023

It seems to me there are several improvements that could be done throughout the code. But I don't think any of us have the time to dedicate to an overhaul, unfortunately.

Any issues you could raise for specifics?

Huzzah! It all seems to work 🥳 I also removed the nullable field from properties as this is no longer part of the OpenAPI spec (as of v3.1). See also #1814.

Awesome, thanks so much for this @CasperWA!

Now to think about how to merge/release this... I'll do my best to go through most of the changes now, but @JPBergsma also has several major ongoing additions for OPTIMADE v1.2 open as PRs. After a quick look I think they could be adapted for pydantic v2 fairly easily (they don't use any of the existing hacks we had), so I can devote some time at least to trying that out.

I would suggest we merge this PR, try to rebase @JPBergsma's changes over the next few days/weeks, and try to implement a few more bits using this base (e.g., some more 1.2 features that are in draft PRs atm) and if all is settled/tidied, make it the actual v1.0 release that supports the final v1.1.x OPTIMADE with pydantic v2.

That would leave us in the state where all of OPTIMADE v1.1 is supported in v0.25.3 with pydantic v1, and in v1.0.0 with pydantic v2, potentially even with backports possible for v0.25.x series for those that cannot upgrade. The other option would be to make the current v0.25.3 as v1, then pydantic v2 support in an immediate v2, but not sure there is much of an advantage to that, unless we plan to also backport all OPTIMADE v1.2 features to pydantic v1.

@ml-evs ml-evs marked this pull request as ready for review October 15, 2023 11:25
@ml-evs
Copy link
Member Author

ml-evs commented Oct 15, 2023

Also, I think I'm going to try to break off the non-pydantic upgrade stuff to a separate PR, e.g., running pyupgrade to get rid of all our py38 code (like Type/Tuple/List etc.) will make it much easier to review this one. I'll see how messy the merge would be first...

@CasperWA
Copy link
Member

Any issues you could raise for specifics?

I've forgotten most of it at the moment, but I remember coming up with some here and there during the work here. But I think I've already touched on the major ones, as well as tried to upgrade them. It's mainly about the logic in EntryCollections - the one I've looked at was the retrieval of attributes and such, but handling of the query parameters, I think could be done better perhaps?
... I'll try to create some issues when something more tangible hits me again.

Now to think about how to merge/release this... (...)

I would suggest we merge this PR, try to rebase @JPBergsma's changes over the next few days/weeks, and try to implement a few more bits using this base (e.g., some more 1.2 features that are in draft PRs atm) and if all is settled/tidied, make it the actual v1.0 release that supports the final v1.1.x OPTIMADE with pydantic v2.

This sounds like a good plan, if what you're expecting in @JPBergsma's changes is true 😉

That would leave us in the state where all of OPTIMADE v1.1 is supported in v0.25.3 with pydantic v1, and in v1.0.0 with pydantic v2, potentially even with backports possible for v0.25.x series for those that cannot upgrade. The other option would be to make the current v0.25.3 as v1, then pydantic v2 support in an immediate v2, but not sure there is much of an advantage to that, unless we plan to also backport all OPTIMADE v1.2 features to pydantic v1.

I'd suggest the former option here - no need to make it more grandiose than that.
Moving to pydantic v2 for v1.0.0 seems like a great choice. I'd probably want to clean up some of the Py3.8 things you've suggested as well before relasing, but other than that most of those changes could go in subsequent patch or minor releases - it's not actually important for the functionality, just for the developer experience and code consistency.

One could consider taking a second look at dependency handling in the repository as well. For example, there is no need to depend on ruff, since it's only used as a pre-commit hook. And I'd suggest removing everything pylint as well, as there's no need for duplicate linting software. This will erase a lot of code comments as well 😄

@CasperWA
Copy link
Member

CasperWA commented Oct 15, 2023

Concerning TypedRelationships and the _req_type attribute, one could consider another approach than the one I've taken according to this section in the pydantic docs.

EDIT: Changed to typing.ClassVar. This makes a lot more sense as this is what was intended from the get go for this field.

@ml-evs
Copy link
Member Author

ml-evs commented Oct 15, 2023

I've forgotten most of it at the moment, but I remember coming up with some here and there during the work here. But I think I've already touched on the major ones, as well as tried to upgrade them. It's mainly about the logic in EntryCollections - the one I've looked at was the retrieval of attributes and such, but handling of the query parameters, I think could be done better perhaps? ... I'll try to create some issues when something more tangible hits me again.

Fair enough, this upgrade certainly exposed a few things that were very hacky. I'll need to take a proper look but in some cases I think it is worth keeping the old behavior until after the next major release, e.g., I would like to upgrade my odbx server to pydantic v2 with minimal other code changes and things like the change from ENTRY_INFO_SCHEMAS holding the schema function or the model itself will be pretty annoying.

One could consider taking a second look at dependency handling in the repository as well. For example, there is no need to depend on ruff, since it's only used as a pre-commit hook. And I'd suggest removing everything pylint as well, as there's no need for duplicate linting software. This will erase a lot of code comments as well 😄

Sure, I think the pylint stuff was only kept in for you :P Given that pre-commit does not integrate with any editors (by design), I quite like having a dev environment installable too, but I take your point.

@CasperWA
Copy link
Member

CasperWA commented Oct 15, 2023

Fair enough, this upgrade certainly exposed a few things that were very hacky. I'll need to take a proper look but in some cases I think it is worth keeping the old behavior until after the next major release, e.g., I would like to upgrade my odbx server to pydantic v2 with minimal other code changes and things like the change from ENTRY_INFO_SCHEMAS holding the schema function or the model itself will be pretty annoying.

Right. That change is quite breaking indeed. Hmm. Maybe there's a workaround to be found?
If not, it could, in theory, be reverted, but one has to use the $defs key instead to find referenced models, one cannot "resolve" the $ref value any more, which I don't particularly like.

EDIT: Also, the currently implemented change would need some docs rewriting.

One could consider taking a second look at dependency handling in the repository as well. For example, there is no need to depend on ruff, since it's only used as a pre-commit hook. And I'd suggest removing everything pylint as well, as there's no need for duplicate linting software. This will erase a lot of code comments as well 😄

Sure, I think the pylint stuff was only kept in for you :P Given that pre-commit does not integrate with any editors (by design), I quite like having a dev environment installable too, but I take your point.

No worries - it's fine to keep in as well, when it's in an "extra" and you want dev's to have a specific tool or version of a tool available.

@ml-evs ml-evs mentioned this pull request Oct 15, 2023
@CasperWA
Copy link
Member

Also - I'll stop pushing more commits to this branch now to not mess with the work you're currently doing. Just wanted to get that "fix" for _req_type in 😄

Use pydantic v1 API wih v2 package

Replace `regex` with `pattern`

Use pydantic v2 API for model dumps

Go use pydantic v2 API again

Bump deps in pyproject and adjust for pydantic_settings

Attempt to migrate some pydantic v2 features

Run bump-pydantic

Update deps in requirements

Post pydantic-bump tweak number 3

Post pydantic-bump tweak number 4

Post pydantic-bump tweak number 5

Reintroduce email-validator dep via pydantic extra

Placating mypy

Use lax annotated type rather than validator

Attempt to simplify some tests for pydantic v2

Bump pydantic version in pre-commit

Bump to pydantic 2.3.0
- Reintroduce `http_client` extra as an alias for `http-client`
Still need to pass in a proper REF_TEMPLATE.
Lot's of changes here - mainly in
optimade.server.schemas:retrieve_queryable_properties().
Using the `model_fields` and FieldInfo instead of the generated JSON
Schema. Mainly due to the changes in the generated schema.

This led to a bit of hacking concerning the type (FieldInfo.annotation).
But otherwise, I think this is a better approach. Also implemented for
EntryCollection.get_attribute_fields().

Otherwise, changes here and there to make tests work as well as doing
some static typing sporadically.
Remove mp-api and emmet-core dependencies

Remove emmet-core from requirements file as well

Update mkdocstrings dependency in requirements file
Use the new Python handler for mkdocstrings - updating the configuration
in all_models.md to include all submodules for optimade.models.
Needed to properly ONLY check if annotation was Union or Optional before
using `get_args()`. Otherwise a type required as a list of strings is
unpacked to `str`, which we do not want.
Use enum `.value` for comparison (check if this should be used elsewhere
in the code for any Enum classes/types).
Disable ignoring warning about pydantic v2 deprecations in order to use
pydantic v2 methods in remaining places.

Consider completely removing the special function used for setting
`nullable=true` in JSON schema for 'SHOULD' OPTIMADE support fields. It
seems `nullable` is no longer a valid field in OpenAPI v3.1.0, but also,
the properties/fields this is added to already support the value being
`null`.
This is no longer part of the OpenAPI spec as of v3.1.

Fixes #1814
Also update some static typing here and there.
- Avoid running mypy on tests folder.

- Remove mypy ignore statements from tests

- Revert to having implicit Optional
@ml-evs ml-evs force-pushed the ml-evs/pydantic-v2-migration branch from e0063f3 to 63e3f7c Compare October 15, 2023 17:21
Copy link
Member Author

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CasperWA, finally gotten through this, made a few comments that I've fixed along the way, once the tests pass on my changes I will enact the plan described above, i.e.,

  • Merge this PR
  • Treat the new state as v1.0.0 pre-release (either officially or unofficially - I don't think our CI can do pre-releases on PyPI and I don't want to mess with it)
  • Sit on it for a little while to see how well @JPBergsma's current changes slot in
  • Release v1.0.0 with pydantic v2 support, then rebase new features on top of that and begin implementing v1.2 of OPTIMADE

properties=properties,
output_fields_by_format=output_fields_by_format,
schema=CONFIG.schema_url,
# schema=CONFIG.schema_url, # I think this should be removed?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, looks like this snuck in here aswell as just the meta response

Suggested change
# schema=CONFIG.schema_url, # I think this should be removed?

optimade/adapters/structures/adapter.py Outdated Show resolved Hide resolved
optimade/models/optimade_json.py Outdated Show resolved Hide resolved
- Tidy up changes re: `Datatype`

- Revert to using `typing.Callable` due to old Python 3.9 bug

- Move docstring note about dates back to config of Response model

- Revert class name from `Datatype` back to `DataType`

- Remove defunct comment
@ml-evs ml-evs force-pushed the ml-evs/pydantic-v2-migration branch from a551c17 to 9da64fd Compare October 23, 2023 09:26
@ml-evs ml-evs merged commit 379110c into master Oct 23, 2023
11 checks passed
@ml-evs ml-evs deleted the ml-evs/pydantic-v2-migration branch October 23, 2023 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants