Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies: remove simplejson #5391

Merged
merged 2 commits into from
Mar 9, 2022

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Feb 24, 2022

Fixes #5374

This library was used as a drop-in replacement for the json module of
the standard library to provide consistency of JSON (de)serializing
between Python 2 and Python 3. Since we have long since dropped support
for Python 2, we can now simply use the standard library.

The module aiida.common.json provided an interface to simplejson and
was used throughout aiida-core instead of json. This module is now
deprecated and aiida-core just uses json directly.

There are two significant changes that needed to be taken into account:

  • simplejson provided automatic serialization for decimal.Decimal
    but json does not. The support for serializing these types to the
    database is maintained by performing the serialization manually in
    the aiida.orm.implementation.utils.clean_value method. All instances
    of decimal.Decimal are serialized as numbers.Real, e.g. floats,
    so the behavior should remain the same.

  • The aiida.common.json wrapper functions dump and load accepted
    file objects in text and binary mode. However, the json analogues
    only support text files. The wrapper functions therefore had to be
    adapted to decode and encode, respectively, the contents of the file
    handle before passing it to the json function.

  • The code that calls json.dump passing a handle from a temporary file
    generated with tempfile.NamedTemporaryFile had to be updated to
    wrap the handle in codecs.getwriter('utf-8') since the default mode
    for NamedTemporaryFile is w+b and json.dump requires a text
    file handle.

Finally, the aiida.common.json module is deprecated and will emit a
deprecation warning when it is imported.

@chrisjsewell
Copy link
Member

Note, you can see the behaviour of simplejson at: https://github.com/simplejson/simplejson/blob/02221b19672b1b35188080435c7360cd2d6af6fb/simplejson/encoder.py#L109

I checked (in Python 3.8) that the built-in json library still works for:

  • numpy.nan
  • numpy.inf
  • tuples (-> list)
  • namedtuples (-> list)

@@ -9,13 +9,13 @@
###########################################################################
# pylint: disable=invalid-name
"""Utilities for removing legacy workflows."""
import json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked the migration tests still work? (given these tests do not run in PRs any more)

Lower in this file, it passes a 'wb' handle to json.dump, which I assume will fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I think even the tests wouldn't catch this because I think that path of the code is uncovered. Will have a look later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should catch it, I added a test specifically for it in #5330

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test specifically for it in

Fantastic, can confirm that it failed, well spotted! I have fixed it and migration tests now run fine. Also removed some remaining uses of aiida.common.json in the tests. Should be good to go now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheers 👍 will have a look in a bit

@sphuber
Copy link
Contributor Author

sphuber commented Feb 25, 2022

Also pinging @csadorf since this would drop a dependency

@giovannipizzi
Copy link
Member

Sorry to drop in only now. Just wanted to mention that, historically, IIRC this was actually introduced not for py2-py3 but because it was faster.
Can you do a quick check if serialising and deserialising a big json with the stdlib and with simplejson really has a significant performance improvement that merits the complexity? I would say if it's 10% faster it's still better to drop it and keep it simple (and it's possible now the stdlib is very fast).

Note that the docs mention It is pure Python code with no dependencies, but includes an optional C extension for a serious speed boost. so one would need to do a comparison with those

@csadorf
Copy link
Contributor

csadorf commented Mar 4, 2022

Sorry to drop in only now. Just wanted to mention that, historically, IIRC this was actually introduced not for py2-py3 but because it was faster. Can you do a quick check if serialising and deserialising a big json with the stdlib and with simplejson really has a significant performance improvement that merits the complexity? I would say if it's 10% faster it's still better to drop it and keep it simple (and it's possible now the stdlib is very fast).

Note that the docs mention It is pure Python code with no dependencies, but includes an optional C extension for a serious speed boost. so one would need to do a comparison with those

One of my colleagues ran some benchmark tests a while back to test exactly this. If anything simplejson might be slower, however, not sure whether he tested the C-extension.

@sphuber
Copy link
Contributor Author

sphuber commented Mar 4, 2022

I think we should test against the version that most people would install by simply installing with pip or conda. If that is without the C-extension, it is only fair we use the benchmark without it as well for comparison. Do you know what gets installed by default @csadorf ? I have no experience with C-extensions

@csadorf
Copy link
Contributor

csadorf commented Mar 4, 2022

I think we should test against the version that most people would install by simply installing with pip or conda. If that is without the C-extension, it is only fair we use the benchmark without it as well for comparison. Do you know what gets installed by default @csadorf ? I have no experience with C-extensions

I'm pretty sure that they are installed/compiled by default if the environment has the compilers. However, this will only work for CPython of course. I think optional in this case means that simplejson will still work even if the compilation fails.

This library was used as a drop-in replacement for the `json` module of
the standard library to provide consistency of JSON (de)serializing
between Python 2 and Python 3. Since we have long since dropped support
for Python 2, we can now simply use the standard library.

The module `aiida.common.json` provided an interface to `simplejson` and
was used throughout `aiida-core` instead of `json`. This module is now
deprecated and `aiida-core` just uses `json` directly.

There are two significant changes that needed to be taken into account:

 * `simplejson` provided automatic serialization for `decimal.Decimal`
   but `json` does not. The support for serializing these types to the
   database is maintained by performing the serialization manually in
   the `aiida.orm.implementation.utils.clean_value` method. All instances
   of `decimal.Decimal` are serialized as `numbers.Real`, e.g. floats,
   so the behavior should remain the same.

 * The `aiida.common.json` wrapper functions `dump` and `load` accepted
   file objects in text and binary mode. However, the `json` analogues
   only support text files. The wrapper functions therefore had to be
   adapted to decode and encode, respectively, the contents of the file
   handle before passing it to the `json` function.

 * The code that calls `json.dump` passing a handle from a temporary file
   generated with `tempfile.NamedTemporaryFile` had to be updated to
   wrap the handle in `codecs.getwriter('utf-8')` since the default mode
   for `NamedTemporaryFile` is `w+b` and `json.dump` requires a text
   file handle.

Finally, the `aiida.common.json` module is deprecated and will emit a
deprecation warning when it is imported.
@chrisjsewell
Copy link
Member

chrisjsewell commented Mar 8, 2022

Do you know what gets installed by default

There are wheels available (i.e. pre-compiled) for most architectures: https://pypi.org/project/simplejson/#files, this is what pip will install

@chrisjsewell
Copy link
Member

actually, there is another difference:

In [1]: import collections, simplejson, json
In [2]: TupleClass = collections.namedtuple("TupleClass", ("a", "b"))
In [3]: value = TupleClass(1, 2)
In [4]: json.dumps(value)
Out[4]: '[1, 2]'
In [5]: simplejson.dumps(value)
Out[5]: '{"a": 1, "b": 2}'

Don't think this would break anything we do though, and the json behaviour seems perhaps better 🤷

Copy link
Member

@chrisjsewell chrisjsewell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more dependencies dropped, the better 😄

@sphuber sphuber merged commit c50a61a into aiidateam:develop Mar 9, 2022
@sphuber sphuber deleted the fix/5374/drop-simplejson branch March 9, 2022 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace use of simplejson with built-in json module
4 participants