Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended JSON encode/decode protocol #12796

Closed
wants to merge 30 commits into from

Conversation

nstarman
Copy link
Member

@nstarman nstarman commented Jan 26, 2022

Signed-off-by: Nathaniel Starkman (@nstarman) nstarkman@protonmail.com

Closes #12793

Improvement suggestions appreciated.

I now have the shape of things, but details need filling in. I like the idea of the registration system and the entry points.
The tests need work: they are repetitive and should be shortened using pytest features.

This is the code pattern showing how the following examples are displayed

import json
import numpy as np
import astropy.coordinates as coord
import astropy.units as u
from astropy.cosmology import units as cu
from astropy.io.misc.json import JSONExtendedEncoder, JSONExtendedDecoder

def show(val):
    print("val:", repr(val), end="\n")
    serialized = json.dumps(val, cls=JSONExtendedEncoder)
    print("dump:", serialized, end="\n")
    with u.add_enabled_units(cu):
        out = json.loads(serialized, cls=JSONExtendedDecoder)
    print("load:", repr(out))
>>> show(np.dtype(float))
val: dtype('float64')
dump: {"!": "numpy.dtype", "value": "float64"}
load: dtype('float64')
>>> show(np.dtype(np.dtype("float", metadata={"a": 1})))
val: dtype('float64')
dump: {"!": "numpy.dtype", "value": "float64", "metadata": {"a": 1}}
load: dtype('float64')
>>> np.dtype("10float64", align=True)
val: dtype(('<f8', (10,)))
dump: {"!": "numpy.dtype", "value": "float64", "shape": [10]}
load: dtype(('<f8', (10,)))
>>> show(np.bool_(True))
val: True
dump: {"!": "numpy.bool_", "value": true}
load: True
>>> show(np.float128("10.0000002"))
val: 10.0000002
dump: {"!": "numpy.float128", "value": "10.0000002000"}
load: 10.0000002
>>> show(np.uintc(10))
val: 10
dump: {"!": "numpy.uint32", "value": "10"}
load: 10
>>> show(np.void(b'abcd'))
val: void(b'\x61\x62\x63\x64')
dump: {"!": "numpy.void", "value": {"!": "builtins.bytes", "value": "abcd"}, "dtype": "|V4"}
load: void(b'\x61\x62\x63\x64')
>>> show(np.array((0, 0.60), dtype=np.dtype([("nu1", float), ("nu2", np.float32)]))[()])
val: (0., 0.6)
dump: {"!": "numpy.void", "value": ["0.0", "0.6"], "dtype": {"value": {"nu1": ["float64", 0], "nu2": ["float32", 8]}, "align": false}}
load: (0., 0.6)
>>> show(np.array(np.float64(10)))
val: array(10.)
dump: {"!": "numpy.ndarray", "value": "10.0", "dtype": "float64"}
load: array(10.)
>>> show(np.array([3], dtype=float))
val: array([3.])
dump: {"!": "numpy.ndarray", "value": ["3.0"], "dtype": "float64"}
load: array([3.])
>>> show(np.array([3], dtype=np.float128))
val: array([3.], dtype=float128)
dump: {"!": "numpy.ndarray", "value": ["3.0"], "dtype": "float128"}
load: array([3.], dtype=float128)
>>> show(np.array((0, 0.6), dtype=np.dtype([("nu1", float), ("nu2", np.float32)])))
val: array((0., 0.6), dtype=[('nu1', '<f8'), ('nu2', '<f4')])
dump: {"!": "numpy.ndarray", "value": {"nu1": {"!": "numpy.ndarray", "value": "0.0", "dtype": "float64"}, "nu2": {"!": "numpy.ndarray", "value": "0.6", "dtype": "float32"}}, "dtype": {"value": {"nu1": ["float64", 0], "nu2": ["float32", 8]}, "align": false}}
load: array([(0., 0.6)], dtype=[('nu1', '<f8'), ('nu2', '<f4')])
>>> show(np.array([(0, 0.6), (1, 1.6)], dtype=np.dtype([("nu1", float), ("nu2", np.float32)])))
val: array([(0., 0.6), (1., 1.6)], dtype=[('nu1', '<f8'), ('nu2', '<f4')])
dump: {"!": "numpy.ndarray", "value": {"nu1": {"!": "numpy.ndarray", "value": ["0.0", "1.0"], "dtype": "float64", "flags": {"align": false}}, "nu2": {"!": "numpy.ndarray", "value": ["0.6", "1.6"], "dtype": "float32"}}, "dtype": {"value": {"nu1": ["float64", 0], "nu2": ["float32", 8]}, "align": false}}
load: array([(0., 0.6), (1., 1.6)], dtype=[('nu1', '<f8'), ('nu2', '<f4')])
>>> show(u.Unit("km"))
val: Unit("km")
dump: {"!": "astropy.units.PrefixUnit", "value": "km"}
load: Unit("km")
>>> show(u.Unit("(km, km, (eV^2, eV))"))
val: Unit("(km, km, (eV2, eV))")
dump: {"!": "astropy.units.StructuredUnit", "value": "(km, km, (eV2, eV))"}
load: Unit("(km, km, (eV2, eV))")
>>> show(u.km * u.eV**2)
val: Unit("eV2 km")
dump: {"!": "astropy.units.CompositeUnit", "value": "eV2 km"}
load: Unit("eV2 km")
>>> show(u.Quantity(10))
val: <Quantity 10.>
dump: {"!": "astropy.units.Quantity", "value": 10.0, "unit": "dimensionless"}
load: <Quantity 10.>
>>> show(u.Quantity((0, 0.6), dtype=np.dtype([("nu1", float), ("nu2", np.float32)]), unit=u.Unit("(eV, eV)")))
val: <Quantity (0., 0.6) (eV, eV)>
dump: {"!": "astropy.units.Quantity", "value": {"!": "numpy.void", "value": ["0.0", "0.6"], "dtype": {"value": {"nu1": ["float64", 0], "nu2": ["float32", 8]}, "align": false}}, "unit": "(eV, eV)"}
load: <Quantity (0., 0.6) (eV, eV)>
>>> show(u.Quantity(np.float128(10)))
val: <Quantity 10.>
dump: {"!": "astropy.units.Quantity", "value": {"!": "numpy.float128", "value": "10.0"}, "unit": "dimensionless"}
load: <Quantity 10.>
>>> show(u.Quantity([(0, 0.6)], dtype=np.dtype([("nu1", float), ("nu2", np.float32)]), unit=u.Unit("(eV, eV)")))
val: <Quantity [(0., 0.6)] (eV, eV)>
dump: {"!": "astropy.units.Quantity", "value": {"!": "numpy.ndarray", "value": {"nu1": {"!": "numpy.ndarray", "value": ["0.0"], "dtype": "float64"}, "nu2": {"!": "numpy.ndarray", "value": ["0.6"], "dtype": "float32"}}, "dtype": {"value": {"nu1": ["float64", 0], "nu2": ["float32", 8]}, "align": false}}, "unit": "(eV, eV)"}
load: <Quantity [(0., 0.6)] (eV, eV)>

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see "When to rebase and squash commits".
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the Extra CI label.
  • Is a change log needed? If yes, did the change log check pass? If no, add the no-changelog-entry-needed label. If this is a manual backport, use the skip-changelog-checks label unless special changelog handling is necessary.
  • Is a milestone set? Milestone must be set but astropy-bot check might be missing; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate backport-X.Y.x label(s) before merge.

@nstarman nstarman added this to the v5.1 milestone Jan 26, 2022
@github-actions
Copy link

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@nstarman nstarman force-pushed the utils-json-extended branch 20 times, most recently from ae70171 to 60c06c0 Compare January 31, 2022 17:02
@nstarman nstarman requested a review from mhvk January 31, 2022 17:18
@nstarman nstarman force-pushed the utils-json-extended branch 3 times, most recently from 15e355c to 01499f4 Compare February 10, 2022 16:31
@astropy astropy deleted a comment from pep8speaks Feb 10, 2022
@nstarman
Copy link
Member Author

nstarman commented Feb 11, 2022

@mhvk, I put a number of examples in the opening comment. I think the structured ndarray is still a little too verbose, but other than that, it's looking pretty good. Do you have any comments / suggestions?
In particular, I'm looking to improve the construction of the test suite.

@pep8speaks
Copy link

pep8speaks commented Feb 11, 2022

Hello @nstarman 👋! It looks like you've made some changes in your pull request, so I've checked the code again for style.

Line 335:15: E999 SyntaxError: invalid syntax

Line 24:13: E999 SyntaxError: invalid syntax

Comment last updated at 2022-02-16 02:57:43 UTC

Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
@nstarman nstarman force-pushed the utils-json-extended branch 5 times, most recently from 00184d5 to b8cdc7b Compare February 15, 2022 21:21
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
@nstarman nstarman force-pushed the utils-json-extended branch 2 times, most recently from 5ff1b72 to 9480c84 Compare February 16, 2022 02:14
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Signed-off-by: Nathaniel Starkman (@nstarman) <nstarkman@protonmail.com>
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had only time for a very rough look, but like that we may actually support json fully. Two main comments/questions:

  1. How will this deal with larger arrays? My sense is that the binary blob used for yaml may be the best solution for anything that is not a scalar. I do worry that float128 does not seem to roundtrip exactly. I thought the numpy machinery took care of using the right number of significant digits.
  2. I worry about duplication... Would it be possible to share some of the machinery with the yaml encoders/decoders? That would also have the advantage of making the representation uniform. Similarly, there is asdf which also encodes all the astropy classes (and seems based on a combination of yaml and json).

p.s. Small question: do we care about the names of structured units?

@nstarman
Copy link
Member Author

nstarman commented Feb 21, 2022

Thanks @mhvk.

  1. How will this deal with larger arrays? My sense is that the binary blob used for yaml may be the best solution for anything that is not a scalar. I do worry that float128 does not seem to roundtrip exactly. I thought the numpy machinery took care of using the right number of significant digits.

The float128 is actually roundtripping exactly, the problem is that the input is poorly done. I wrote np.float128(10.0000002), but python's inexactness meant 10.0000002 was really 10.000000200000000561. I've updated the example to be np.float128("10.0000002") and the round-trip is correct.

A binary blob might work nicely for non-scalars. I can use np.ndarray.dumps() to get a very efficient representation and then pickle.loads() to round-trip. While not as pretty as the current output, I agree it will probably scale better.

  1. I worry about duplication... Would it be possible to share some of the machinery with the yaml encoders/decoders? That would also have the advantage of making the representation uniform. Similarly, there is asdf which also encodes all the astropy classes (and seems based on a combination of yaml and json).

That would be a very good idea! I'll look into it. I fear the YAML code is pretty tailored to the YAML library. Sharing machinery might require a refactor to the YAML code. But the uniformity is a compelling argument. Especially since YAML is a superset of JSON.

p.s. Small question: do we care about the names of structured units?

Yes! I will correct that. I'll need to test what happens when a structured array with dtype names encounters a structured unit with names...

@nstarman
Copy link
Member Author

I fear the YAML code is pretty tailored to the YAML library. Sharing machinery might require a refactor to the YAML code. But the uniformity is a compelling argument. Especially since YAML is a superset of JSON.

Yes. The YAML code is not easily compatible with JSON. Getting them to a common code base will require a lot of work.

@nstarman
Copy link
Member Author

I don't think I have the time to refactor the YAML I/O until it can work with JSON.

@nstarman nstarman closed this Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

On a decoder for JsonCustomEncoder
3 participants