Skip to content

apply_ufunc: don't modify attrs on input variables #10330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 27, 2025

Conversation

erik-mansson
Copy link
Contributor

@erik-mansson erik-mansson commented May 18, 2025

Fixing #8047 (and at least the #8480 duplicate) by copying coordinate-variables before overwriting their .attrs in structure.merge.merge_collected().

In a call to apply_ufunc(), the output coordinates and their attributes are produced in merge_collected() based on the combine_attrs argument (which is "override" or "drop" depending on the keep_attrs argument). However, a side effect of the line merged_vars[name].attrs = merge_attrs(... was that also the .attrs of the coordinate on the input arrays would be changed to the output arrays', i.e. dropped or changed to those of the first input array.

The proposed solution of creating a shallow copy of the input coordinate before assigning attributes works, according to the new tests I added in in test_ufuncs.py and test_computation.py, with a slight performance penalty (3% on an unpublished test I did, didn't install enough things to run the asv performance test suite locally).

Although the last else-clause of merge_collected() has similar code assigning merged_vars[name].attrs, my tests pass without doing a copy there. Feel free to propose test cases that will exercise that code (indexed_elements being empty?) and still cause a bug like #8047 or #8480.

I also extended the tests in test_merge.py to ensure that input attributes aren't affected by merge. These tests passed from the beginning, so they are not strictly related to the bug, but meant as precautions to catch hypothetical regressions.

Erik Månsson added 2 commits May 18, 2025 02:08
The additional tests in test_ufuncs.py and test_computation.py currently fail due to GH8047, because apply_ufunc() relies on merge_coordinates_without_align() which may overwrite .attrs on input coordinates. The additional tests in test_merge.py currently pass, because Dataset.merge() dosen't seem affected by the bug.
Calls to xarray.apply_ufunc() (which is used by for instance xarray.where()) have a call stack of
core.apply_ufunc.apply_ufunc()
core.apply_ufunc.apply_dataarray_vfunc()
core.apply_ufunc.build_output_coords_and_indexes
structure.merge.merge_coordinates_without_align()
structure.merge.merge_collected()
and in merge_collected() the .attrs of a coordinate in an original input array could be overwritten depending on combine_attrs, even if the intent was just to produce the desired attributes for the returned result.

This very simple fix always makes a copy before assigning attributes.
Copy link

welcome bot commented May 18, 2025

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thanks!

@dcherian dcherian changed the title Fix #8047 by setting output attrs on a copy of input coordinate apply_ufunc: don't modify attrs on input variables May 27, 2025
@dcherian dcherian enabled auto-merge (squash) May 27, 2025 17:49
@dcherian dcherian merged commit 6abb769 into pydata:main May 27, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

apply_ufunc drops passed DataArray's attrs when keep_attrs=False
2 participants