Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix yaml table serialization compatibility with numpy 2 #16416

Merged

Conversation

neutrinoceros
Copy link
Contributor

@neutrinoceros neutrinoceros commented May 8, 2024

Description

Combines the regression test from #16414 and cherry-picks 9cdd95d and 3be8af0 from #15065

This doesn't break any test that I could run locally against numpy 1.26, so it should be backportable, but I'm still keeping it as a draft until I see CI go green, just in case.

Fix #15792

Also resolves ~90% failures (340/380) from tests that currently rely on legacy print options as setup in

np.set_printoptions(legacy="1.25")

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

Copy link

github-actions bot commented May 8, 2024

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link

github-actions bot commented May 8, 2024

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@neutrinoceros neutrinoceros force-pushed the io.misc/bug/yaml_serialization_numpy2 branch from 08c5a4f to 4444cf3 Compare May 8, 2024 16:31
@neutrinoceros neutrinoceros marked this pull request as ready for review May 8, 2024 16:52
Copy link
Member

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

astropy/io/misc/tests/test_yaml.py Outdated Show resolved Hide resolved
astropy/io/misc/tests/test_yaml.py Outdated Show resolved Hide resolved
docs/changes/io.misc/16416.bugfix.rst Outdated Show resolved Hide resolved
@pllim pllim requested review from mhvk and taldcroft May 8, 2024 17:23
@pllim pllim added this to the v6.1.1 milestone May 8, 2024
@pllim pllim added Bug numpy-dev backport-v6.1.x on-merge: backport to v6.1.x labels May 8, 2024
@pllim
Copy link
Member

pllim commented May 8, 2024

Diff LGTM but I'll let subpackage maintainer(s) review and approve. Thanks!

def represent_float(self, data):
# Override to change repr(data) to str(data) since otherwise all the
# numpy scalars fail in not NUMPY_LT_1_20.
if data != data or (data == 0.0 and data == 1.0):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand, why not np.isnan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that's for performance reasons, though, that's a question for @mhvk
(math.isnan should also be faster than np.isnan for scalars)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I vaguely recall that I just copied the snipped from how pyyaml does it, except changing repr(data) to str(data) (I clearly should have added a note with that...). I'm not quite sure why they worked this way - perhaps there are python float-like's that do not support isnan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok no problem - if we can get things to work with math.isnan I think that would be preferable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record here's the line you apparently copied https://github.com/yaml/pyyaml/blob/48838a3c768e3d1bcab44197d800145cfd0719d6/lib/yaml/representer.py#L172
It was written 15 years ago (circa 2009), while math.isnan was introduced in Python 2.6, released in October 2008, so it's possible that the only reason they didn't use it was backward compatibility, and it just never got refactored.
Since the whole function is copied from pyyaml, it seems best to keep it as is, but I'll add a note so this knowledge doesn't get lost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @mhvk I also updated your comment. It said that numpy scalars failed on "not NUMPY_LT_1_20" which I assume was a mistake and you really meant "not NUMPY_LT_2_0"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @neutrinoceros, that all makes sense.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the two in-line comments, shouldn't we also get the next commit, for complex? 6b7916d4e3736eaac73ca25372c341079ae38291



def test_serialize_mixin_column(tmp_path):
# see https://github.com/astropy/astropy/issues/16414
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why our regular tests didn't catch this. We test a whole slew of different SkyCoord for all serializations (see astropy/io/tests/mixin_columns.py) - was the problem specific to frame="galactic"? If so, perhaps add an extra SkyCoord in mixin_columns? (Though note that one then needs to add quite a bit of detail further down in the file...) Though in a way also fine just as is - after all, it is the incantation that proved there was a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is actually not specific to frame="galactic", so I'll drop this argument to avoid confusion.
According to its docstring, astropy/io/tests/mixin_columns.py is used in ascii/tests/test_ecsv.py, fits/tests/test_connect.py, and misc/tests/test_hdf5.py, but not in yaml tests (yet). I'll see if there are any problems with using it here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wrote a new test based on test_hdf5.py::test_hdf5_mixins_qtable_to_table, which also fails on main and pass with this branch. I ditched the first test, let me know if I should keep both.

def represent_float(self, data):
# Override to change repr(data) to str(data) since otherwise all the
# numpy scalars fail in not NUMPY_LT_1_20.
if data != data or (data == 0.0 and data == 1.0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I vaguely recall that I just copied the snipped from how pyyaml does it, except changing repr(data) to str(data) (I clearly should have added a note with that...). I'm not quite sure why they worked this way - perhaps there are python float-like's that do not support isnan?

@neutrinoceros neutrinoceros force-pushed the io.misc/bug/yaml_serialization_numpy2 branch 2 times, most recently from 9a7c15d to 0f855e8 Compare May 9, 2024 07:45
@neutrinoceros
Copy link
Contributor Author

shouldn't we also get the next commit, for complex?

It doesn't hurt and is clearly correct, but I should point out that it's also not tested. I'll push it now.

@@ -707,7 +707,7 @@ def assert_objects_equal(obj1, obj2, attrs, compare_class=True):


@pytest.mark.skipif(not HAS_H5PY, reason="requires h5py")
def test_hdf5_mixins_qtable_to_table(tmp_path):
def code(tmp_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, that's a syntaxically valid mistake I must have made by trying to open vscode from another editor and typing in the wrong tab. Thanks for spotting it

@@ -316,3 +319,57 @@ def test_yaml_load_of_object_arrays_fail():
order: C
shape: !!python/tuple [3]"""
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression test is fine but more importantly there should be YAML-level unit tests covering all the cases in the new code. I.e. serializing a simple dict structure with all the impacted data types and confirming the output and correct round-trip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I'll work on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the unit tests in question already exists, I just needed to add a pytest fixture to compensate for global settings (see #15096).
They also capture the regressions properly, so maybe we should just ditch the more complex integration test, if that's okay with you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent about the tests already being there! And yes, I'm good with dropping the integration test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, I rebased and greatly simplified the branch !

}


def test_fits_yaml_mixins_qtable_to_table(tmp_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test should live in test_mixin.py instead of test_yaml.py (which should be more low-level YAML).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it here is consistent with how the "inspiration" test lives in test_hdf5.py. Moving that one would be out of scope here but maybe we can move both in a follow up PR ?

@neutrinoceros neutrinoceros force-pushed the io.misc/bug/yaml_serialization_numpy2 branch 2 times, most recently from 2d8894a to 928098f Compare May 9, 2024 12:56
@neutrinoceros neutrinoceros force-pushed the io.misc/bug/yaml_serialization_numpy2 branch from 928098f to 43a42ae Compare May 9, 2024 14:02
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks, now I understand why the tests didn't have a problem before - it was the use of legacy printing. With this, it looks all great!

@mhvk mhvk merged commit 539115c into astropy:main May 9, 2024
27 of 28 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/astropy that referenced this pull request May 9, 2024
@mhvk
Copy link
Contributor

mhvk commented May 9, 2024

Sorry, @taldcroft, I merged perhaps too quickly! A problem with trying to do a few things while really not at work; should just have approved and left it. Though hopefully you were OK with the latest push too...

@neutrinoceros neutrinoceros deleted the io.misc/bug/yaml_serialization_numpy2 branch May 9, 2024 15:15
@taldcroft
Copy link
Member

@mhvk @neutrinoceros - Sometimes codecov seems to give false reports, but here I don't see any coverage of the inf, nan, or 1e17 cases. Those would be good, so a follow-up PR would be a good thing.

@neutrinoceros
Copy link
Contributor Author

added to my todo list for tomorrow !

pllim added a commit that referenced this pull request May 9, 2024
…416-on-v6.1.x

Backport PR #16416 on branch v6.1.x (BUG: fix yaml table serialization compatibility with numpy 2)
@pllim
Copy link
Member

pllim commented May 9, 2024

Thanks, all! I opened a follow-up issue at #16429

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Yaml serializer fails with Numpy 2.0
5 participants