Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Complex data type and support for numbers.Complex in database #5561

Conversation

janssenhenning
Copy link
Contributor

@janssenhenning janssenhenning commented Jun 11, 2022

See #5340

Serialization/Deserialization of complex numbers is added to the the
psql_dos and sqlite_zip backend. These are stored as objects of the form
{'__complex__': True, 'real': x.real, 'imag': x.imag} with the
__complex__ key being used as the identifier for detecting these
objects for deserializations.

This transformation is not done on the level of clean_value but by
customizing the JSON encoder/decoder of the backend. Since clean_value
can be called multiple times on the same value there is no clean way
to distinguish complex numbers that were already transformed or custom
dictionaries, which contain the __complex__ key (For these a
ValidationError is raised).

Added support for complex attributes in QueryBuilder
complex numbers are compared like objects. For comparisons
attribute.complex.real/imag can be used

Open Questions:

  • The custom JSON encoder/decoder should maybe be moved to a common module. Where?
    (For now I moved it into aiida.storage.json since aiida.common.json is already in use)
  • What are the performance implications of customizing the JSON
    serialization/deserialization for both complex numbers and all other cases?
  • Are there more places, where modifications for complex numbers have to be added?
    • Should the JsonAble node be able to handle complex numbers?
  • Add documentation

@ltalirz
Copy link
Member

ltalirz commented Jun 21, 2022

@sphuber is currently on vacation, but mentioning him here in case he doesn't see this PR

@janssenhenning janssenhenning force-pushed the feature/complex-attributes branch 4 times, most recently from 1139053 to ebf544d Compare July 13, 2022 09:42
Serialization/Deserialization of complex numbers is added to the the
psql_dos and sqlite_zip backend. These are stored as objects of the form
`{'__complex__': True, 'real': x.real, 'imag': x.imag}` with the
`__complex__` key being used as the identifier for detecting these
objects for deserializations.

This transformation is not done on the level of `clean_value` but by
customizing the JSON encoder/decoder of the backend. Since `clean_value`
can be called multiple times on the same value thre is no clean way
to distinguish complex numbers that were already transformed or custom
dictionaries, which contain the `__complex__` key (For these a
ValidationError is raised).

Added support for complex attributes in QueryBuilder
complex numbers are compared like objects. For comparisons
attribute.complex.real/imag can be used

Open Questions:
  - What are the performance implications of customising the json
    serialization/deserialization
  - Should the Jsonable class handle complex numbers?
@janssenhenning janssenhenning marked this pull request as ready for review July 13, 2022 11:59
@sphuber
Copy link
Contributor

sphuber commented Jul 15, 2022

Thanks for the PR @janssenhenning . I agree that it would be very interesting and useful to have support for complex numbers to be stored in node attributes. However, I am not sure whether the current implementation, putting the responsibility of the (de)serialization on the storage backend, is desirable.

Now one could argue that it should actually go there since the task of making sure how to represent certain things in the storage backend should exactly be the responsibility of said backend. However, since storage backends are now pluginnable and can be customized complicates this a bit. It would open the situation where a database that is perfectly valid in the psql_dos backend, could not be transferred to a profile using another if that doesn't happen to support complex numbers.

Of course one can say that this is not the only constraint on the storage backend, and there are other requirements that should be implemented, so having one additional requirement shouldn't matter. However, so far (at least for attributes and extras) the requirement is really simple and its content should simply be JSON serializable. So really we should maybe take care of supporting complex numbers on a storage backend agnostic level.

I think we already discussed putting this in the aiida.orm.implementation.utils.clean_value but the problem there is that this only does serialization. It would be possible to convert a + bj into {'__complex__': True, 'real': a, 'imag': b}, but this would not automatically undone on loading the node/attribute. I wonder if we should have another generic layer that sits between our current ORM backend and the actual storage implementation, which just takes care of serializing/deserializing the attributes/extras of nodes. In this way, at least we can guarantee that all storage backend implementations will automatically benefit from it. I am not sure where to put this though yet and if this would work, or whether I am missing something. @giovannipizzi @chrisjsewell thoughts?

@janssenhenning
Copy link
Contributor Author

@sphuber Thank you for your feedback. I appreciate your point that these kinds of serializations/deserializations have to be more high level in order to ensure consistency. If I understand you correctly this would be a counterpart to clean_value for deserializations.

One thing that I found tricky when trying out the serialization in clean_value directly is that there is no guarantee that it will be called only once on the same value. This makes it difficult to distinguish things that have already been serialized or put in by the user directly. Here I wanted to have an error when the __complex__ key is used and this is not really possible. But maybe in this case, it would also be fine to allow users to enter something like {'__complex__': True, 'real':1, 'imag': 2} and have it treated exactly the same as setting a complex number and only raise an exception if the dictionary does not conform to the expected format.

@janssenhenning
Copy link
Contributor Author

Thinking about this more I agree that the location for this serialzation/deserialization layer is not very clear. My thought was that it would be on the same layer, where clean_value is called now on the BackendNode.

However clean_value itself is not really storage backend agnostic since it is only explicitly inserted into the attribute setting in the psql_dos backend. Since e.g. set_attribute at the moment handles both the serialization/cleaning and the actual setting of the attribute in the database model this cannot be separated cleanly.

I think the set_attribute and related methods should only handle the case of setting json-serializable values and the serialization/deserialization would happen in the NodeAttributes/NodeExtras class for instance. The only thing I'm not quite sure how to handle is the case of an unstored node, since here uncleaned values can propagate deeper at the moment and are only cleaned when calling clean_values() via the store() method. This would then again require that all backends
implement the correct serialization in the clean_values method

janssenhenning added a commit to janssenhenning/aiida-core that referenced this pull request Aug 6, 2022
Alternative implementation of aiidateam#5561

These are stored as objects of the form
`{'__complex__': True, 'real': x.real, 'imag': x.imag}` with the
`__complex__` key being used as the identifier for detecting these
objects for deserializations.

For serialization/deserialization we use clean_value and a new function
deserialize_value. `clean_value` is pulled out of BackendNode and is
now done on the level of NodeAttributes and NodeExtras. The
deserialization is added to the get/get_many and all routines of these
classe

Added support for complex attributes in QueryBuilder
complex numbers are compared like objects. For comparisons
attribute.complex.real/imag can be used

Open Questions:
  - What are the performance implications of the added deserialization?
    (Since clean_value is already called I think there would be no impact
     from that side)
  - Jsonable class should handle complex numbers. Setting attributes
    work, but the initial check json.loads(json.dumps(...)) fails
janssenhenning added a commit to janssenhenning/aiida-core that referenced this pull request Aug 6, 2022
Alternative implementation of aiidateam#5561

These are stored as objects of the form
`{'__complex__': True, 'real': x.real, 'imag': x.imag}` with the
`__complex__` key being used as the identifier for detecting these
objects for deserializations.

For serialization/deserialization we use clean_value and a new function
deserialize_value. `clean_value` is pulled out of BackendNode and is
now done on the level of NodeAttributes and NodeExtras. The
deserialization is added to the get/get_many and all routines of these
classe

Added support for complex attributes in QueryBuilder
complex numbers are compared like objects. For comparisons
attribute.complex.real/imag can be used

Open Questions:
  - What are the performance implications of the added deserialization?
    (Since clean_value is already called I think there would be no impact
     from that side)
  - Jsonable class should handle complex numbers. Setting attributes
    work, but the initial check json.loads(json.dumps(...)) fails
@janssenhenning
Copy link
Contributor Author

@sphuber I wrote a PR with a higher level deserialization #5614. This still has problems with the abstraction, especially when it comes to the QueryBuilder but it is a step forwards

janssenhenning added a commit to janssenhenning/aiida-core that referenced this pull request Nov 21, 2022
Alternative implementation of aiidateam#5561

These are stored as objects of the form
`{'__complex__': True, 'real': x.real, 'imag': x.imag}` with the
`__complex__` key being used as the identifier for detecting these
objects for deserializations.

For serialization/deserialization we use clean_value and a new function
deserialize_value. `clean_value` is pulled out of BackendNode and is
now done on the level of NodeAttributes and NodeExtras. The
deserialization is added to the get/get_many and all routines of these
classe

Added support for complex attributes in QueryBuilder
complex numbers are compared like objects. For comparisons
attribute.complex.real/imag can be used

Open Questions:
  - What are the performance implications of the added deserialization?
    (Since clean_value is already called I think there would be no impact
     from that side)
  - Jsonable class should handle complex numbers. Setting attributes
    work, but the initial check json.loads(json.dumps(...)) fails
@sphuber
Copy link
Contributor

sphuber commented Jun 7, 2024

Thanks again for the PR @janssenhenning . Unfortunately we have decided to not go through with it. It would introduce a significant change in the basic contract that attributes should JSON-serializable. There are already a few subtle type conversions that are done under the hood, but we don't think it is wise to start introducing more as it will make it more difficult to keep consistency across storage backends. Especially since there are still open questions about automatic deserialization in both this implementation and the alternative of #5614 Also we have not had requests for native support for complex numbers since these PRs were opened, so it does not seem to be that critical. Thanks anyway for the effort and discussion!

@sphuber sphuber closed this Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants