Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add adapters - Base design + 'structures' (+ 'references'... sort of) #241

Merged
merged 32 commits into from Apr 22, 2020

Conversation

CasperWA
Copy link
Member

@CasperWA CasperWA commented Mar 28, 2020

The goal of the adapters is to contain a conversion or adaptation ability in a single Python class.

The class will be a proxy for optimade.models.EntryResource, where the specific entries will take the place, e.g., StructureResource.

In this way, one can instatiate the adapter class with a JSON resource object matching it, and get all the properties as attributes. E.g., one could get the lattice vectors of a structure then by calling
structures_adapter_class_object.attributes.lattice_vectors. The reason to not have the adapter class be a direct link or subclass of the model class, is to not inherit all that comes with it, i.e., do not inherit pyndantic's BaseModel, etc. While it can trip most IDEs, it is a more elegant solution that opens the doors to other possibilities, in my opinion.

Besides being a handy wrapper for an entity resource object for a client, it also serves as a converter/adapter to other known and widely used libraries for the specific entity type. For a structures resource object, this may be libraries such as pymatgen, ASE, AiiDA, etc...
It may also be nice to be able to convert the structures resource object to a Crystallographic Information File (CIF) file or other. These adapter/conversion functions may be added in a separate file and then referenced in the adapter class.

Thus, the adapter class stays small at first glance, but spreads its tentacles wide throughout both the current package, but also a multitude of other libraries, packages, and data formats.

In general, whenever the adapter class tries to get an attribute, it goes through the following steps:

  1. If the attribute starts with get_, try to look up if this is a known conversion, then perform the conversion and return it as a string or specific package/library Python class.
  2. Try to fetch the OPTIMADE model property.
  3. Raise AttributeError if the attribute is still unknown, i.e., the previous steps failed and didn't raise already.

Missing parts of this PR:

  • Add tests:
    • General tests for the adapter class.
    • Tests for each converter function:
      • AiiDA
      • ASE
      • CIF
      • PDB
      • PDBx/mmCIF
      • pymatgen
  • Generalize the adapter main class and have Structure inherit from it.
  • Add Reference.
    (Currently added as an empty inheritence, meaning there are currently no conversions available for Reference).

Helpful additions/corrections to this PR:

  • Optimize the various conversion functions (especially get_cif and get_pbdx_mmcif).
  • Add more well-known or much used data format converters and packages/libraries.
    (Keeping in mind that some of these libraries have a large quantity of converters already, e.g., ASE already converts to PDB and CIF formats, and so therefore these should only be included if going through, e.g., ASE, may end in a loss of data - I may not have been entirely strict enough with this myself already 😅).

@CasperWA CasperWA added enhancement New feature or request priority/low Issue or PR with a consensus of low priority labels Mar 28, 2020
@codecov
Copy link

codecov bot commented Mar 28, 2020

Codecov Report

Merging #241 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #241   +/-   ##
=======================================
  Coverage   89.32%   89.32%           
=======================================
  Files          54       54           
  Lines        2220     2220           
=======================================
  Hits         1983     1983           
  Misses        237      237           
Flag Coverage Δ
#unittests 89.32% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f88df44...f88df44. Read the comment docs.

Copy link
Member

@ltalirz ltalirz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @CasperWA , I think having these converters in the optimade python tools is very useful.

May I suggest to add both individual extras for the corresponding dependencies + a converters extra that contains them all?

optimade/adapters/structures/__init__.py Outdated Show resolved Hide resolved
"pdb": get_pdb,
"pdbx_mmcif": get_pdbx_mmcif,
"pymatgen": get_pymatgen_structure,
}
Copy link
Member

@ltalirz ltalirz Mar 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, this dictionary might become an instance of a ConverterCollection class that could include checks on the interface of the functions it contains etc.
For the moment, this is fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. And definitely one for the future, i.e., not something I will explore at this time.

@shyamd
Copy link
Contributor

shyamd commented Mar 29, 2020

I'm not fully seeing what this does. How does this plug into the server for instance?

@CasperWA
Copy link
Member Author

CasperWA commented Mar 29, 2020

I'm not fully seeing what this does. How does this plug into the server for instance?

It's not for server use, more for client use. The main goal is to interpret OPTIMADE structures and retrieve their information/convert them to client-usable data formats and constructs.

Edit: Since this repository is meant to host OPTIMADE tools for use in Python, it should/could cover both server side and client side tools, right? :)

@CasperWA
Copy link
Member Author

CasperWA commented Mar 29, 2020

thanks @CasperWA , I think having these converters in the optimade python tools is very useful.

Thanks :) Me too, hence the "fold back" from the client I'm working on.

May I suggest to add both individual extras for the corresponding dependencies + a converters extra that contains them all?

I thought about this, got tired at the idea, and set it aside for a time when I have not more exciting ideas for how the adapter should work. I.e., I will definitely add this to the setup.py 👍

Edit: I have added the dependencies, but named them client to match with the server we already have. However, converters as you originally suggested may be valid.
As the standard "minimum" dependency for a client, I have added only numpy. To get all converters one would now have to either install all or each extra independently - as has always been the case for the server backends. Although, this is quite different, I guess?

@ml-evs
Copy link
Member

ml-evs commented Mar 31, 2020

Not going to review this until everyone else has finished hacking at it (ping me @CasperWA), but think this looks very useful and is definitely in the spirit of "optimade-python-tools".

Once we get this in, I think we should focus for a while on writing docs that showcase how this repo can be used by implementers of servers and clients. Would we even want to provide a simple Python/CLI client implementation in this repo, given that we have most of the tools already? (I guess this would depend on how much this treads on the toes of your fancier client @CasperWA)

@CasperWA CasperWA force-pushed the add_adapters branch 5 times, most recently from 712662d to 42c70f6 Compare April 7, 2020 11:03
@CasperWA CasperWA requested a review from ltalirz April 7, 2020 13:41
@CasperWA CasperWA marked this pull request as ready for review April 7, 2020 13:41
@CasperWA
Copy link
Member Author

CasperWA commented Apr 7, 2020

@ml-evs @ltalirz @shyamd I consider this ready for review now.
I have added a lot of tests, and tried to come up with edge-cases, and what to do for various optional properties. This has hopefully resulted in more robust conversion functions.

Some of the conversion function tests are the same from test file to test file. This may be made more elegant, but in order to properly separate out the test functions if one should not have the specific dependency installed, this is the design I chose.

Note also, the kind of ugly addition to our CI workflows, due to AiiDA.
In order to create a StructureData AiiDA Node, one must have a functioning AiiDA profile loaded, hence all the extra stuff.
I have opened an issue aiidateam/aiida-core#3898 to try and make this more gentle on the lines of code in our workflows. Since this design idea from the side of AiiDA is so deeply integrated into its code base, it is difficult to see how we can get to a state, where it may not be needed to set up an AiiDA profile, just for testing this.

Finally, if you have any optimizations for the various converters, now is the time! :) If you want to add a bunch of new ones (especially to the references adapter), perhaps consider either doing it in another PR or quickly add it to this PR :)
I know the PDBx/mmCIF is a bit of a mess, but I don't have the time now to research a whole new and to me unknown format to get it properly up to speed - sorry. I feel this is better than nothing for now?

@CasperWA CasperWA changed the title Add adapters - First adapter is for 'structures' Add adapters - Base design + 'structures' (+ 'references Apr 7, 2020
@CasperWA CasperWA changed the title Add adapters - Base design + 'structures' (+ 'references Add adapters - Base design + 'structures' (+ 'references'... sort of) Apr 7, 2020
@CasperWA CasperWA force-pushed the add_adapters branch 2 times, most recently from 8aa4017 to d3ccda7 Compare April 7, 2020 19:12
@CasperWA
Copy link
Member Author

Not going to review this until everyone else has finished hacking at it (ping me @CasperWA), but think this looks very useful and is definitely in the spirit of "optimade-python-tools".

Consider yourself "pinged" @ml-evs :) (But not until Tuesday the 14th and onwards).

Once we get this in, I think we should focus for a while on writing docs that showcase how this repo can be used by implementers of servers and clients. (...)

I think that would definitely be the next big "project" for the repo. Also, we should make sure to update to v1.0.0 ASAP (both the package version, but I was thinking mainly of the OPTIMADE API version). Maybe v1 of the package should indeed wait until we have docs in place.

(...) Would we even want to provide a simple Python/CLI client implementation in this repo, given that we have most of the tools already? (I guess this would depend on how much this treads on the toes of your fancier client @CasperWA)

So as soon as this PR goes through, we could (in theory) adopt my client. That way I can update the current version with this adapter. And I don't know how fancy it is... But you can now check it out on binder to judge for yourself 😅
I could even move the repository aiidalab/aiidalab-optimade over to Materials-Consortia? I have discussed this loosely with @giovannipizzi already.
I would still want to create an AiiDA lab-specific edition of the client, but the most recent version, which is about to be merged into the develop branch, is completely AiiDA- and AiiDA lab-agnostic - only depending on the Jupyter environment, really.

But I don't know if we want an OPTIMADE client in Jupyter under the Materials-Consortia umbrella?

In any case! We should probably create an issue for this discussion, or rather pick it up at a consortium meeting.

The goal of the adapters is to contain a conversion or adaptation
ability in a single Python class.

The class will be a proxy for `optimade.models.EntityResource`, where
the specific entity will take the place, e.g., `StructureResource`.

In this way, one can instatiate the adapter class with a JSON resource
object matching it, and get all the properties as attributes.
E.g., one could get the lattice vectors of a structure then by calling
`structures_adapter_class_object.attributes.lattice_vectors`.
The reason to not have the adapter class be a direct link or subclass of
the model class, is to not inherit all that comes with it, i.e., do not
inherit `pyndantic`'s `BaseModel`, etc.
While it can trip most IDEs, it is a more elegant solution that opens
the doors to other possibilities, in my opinion.

Besides being a handy wrapper for an entity resource object for a
client, it also serves as a converter/adapter to other known and widely
used libraries for the specific entity type.
For a structures resource object, this may be libraries such as
pymatgen, ASE, AiiDA, etc...
It may also be nice to be able to convert the structures resource object
to a Crystallographic Information File (CIF) file or other.
These adapter/conversion functions may be added in a separate file and
then referenced in the adapter class.

Thus, the adapter class stays small at first glance, but spreads its
tentacles wide throughout both the current package, but also a multitude
of other libraries, packages, and data formats.

In general, whenever the adapter class tries to get an attribute, it
goes through the following steps:

1. Try to fetch the OPTIMADE model property.
2. If the attribute starts with `get_`, try to look up if this is a
known conversion, then perform the conversion and return it as a string
or specific package/library Python class.
3. Raise AttributeError if the attribute is still unknown, i.e., the
previous steps failed and didn't raise already.
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great so far @CasperWA, very nice and thorough. I don't have any general comments, except that I agree we should probably split this up into namespace packages as @shyamd suggested, after this PR and before we write the docs.

I've reviewed everything apart from the PDB, CIF and PDBx/mmCIF stuff as I'd like to play around with the results a bit and make sure we're producing valid output files.

.github/workflows/deps_eager.yml Show resolved Hide resolved
.github/workflows/requirements.txt Show resolved Hide resolved
optimade/adapters/base.py Show resolved Hide resolved
optimade/adapters/base.py Outdated Show resolved Hide resolved
optimade/adapters/base.py Outdated Show resolved Hide resolved
optimade/server/config.py Show resolved Hide resolved
setup.py Show resolved Hide resolved
tests/adapters/references/test_references.py Outdated Show resolved Hide resolved
tests/adapters/structures/test_aiida.py Outdated Show resolved Hide resolved
tests/adapters/structures/test_ase.py Outdated Show resolved Hide resolved
Co-Authored-By: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
CasperWA and others added 2 commits April 16, 2020 13:15
Co-Authored-By: Matthew Evans <me388@cam.ac.uk>
For now, add `json` and `dict`, which refer to pydantic's `BaseModel`'s
`.json()` and `.dict()` methods.
See https://pydantic-docs.helpmanual.io/usage/exporting_models for more
information.
Use pytest fixtures to create special structures.
Convert null/None values to float("nan"), which resulted in some
conversion functions working in more edge cases.
@CasperWA
Copy link
Member Author

@ml-evs I haven't addressed all your comments yet, and I don't know how much more I will be able to do today.
However, it is in a much better state now, I think. The tests have been modularized and a clear path to doing some centralized testing of all conversion functions has opened up, but not yet followed.
Even more, is that it seems most of the modules can indeed handle unknown values, as long as it's explicitly converted to float("nan").

Update references tests with pytest fixtures as was done for structures
tests.
@CasperWA
Copy link
Member Author

All right @ml-evs ! I have finished addressing your review comments, it should be ready for re-review.

@CasperWA CasperWA requested a review from ml-evs April 17, 2020 14:58
optimade/adapters/base.py Outdated Show resolved Hide resolved
optimade/adapters/base.py Outdated Show resolved Hide resolved
tests/adapters/references/test_references.py Show resolved Hide resolved
CasperWA and others added 3 commits April 20, 2020 14:39
Properly handle nested attributes for `getattr()` function.

Co-Authored-By: Matthew Evans <me388@cam.ac.uk>
This is to properly get positional coordinates for some viewers, since
SCALE is used to get fractional coordinates.

Co-Authored-By: Matthew Evans <me388@cam.ac.uk>
@CasperWA CasperWA requested a review from ml-evs April 22, 2020 08:53
ml-evs
ml-evs previously approved these changes Apr 22, 2020
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great to me now, thanks for addressing all the comments @CasperWA!

Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once more for luck...

@CasperWA CasperWA merged commit 9bedc78 into Materials-Consortia:master Apr 22, 2020
@CasperWA CasperWA deleted the add_adapters branch April 22, 2020 15:13
@CasperWA CasperWA mentioned this pull request Apr 22, 2020
@ml-evs ml-evs mentioned this pull request May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority/low Issue or PR with a consensus of low priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants