-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement literal np.timedelta64
coding
#10101
base: main
Are you sure you want to change the base?
Conversation
@spencerkclark Thanks! I'll try to look into this tomorrow. The major interaction issues are with |
xarray/coding/variables.py
Outdated
if np.issubdtype(variable.data.dtype, np.timedelta64): | ||
dims, data, attrs, encoding = unpack_for_encoding(variable) | ||
resolution, _ = np.datetime_data(variable.dtype) | ||
attrs["dtype"] = f"timedelta64[{resolution}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about also including units
in attrs?
That would make timedelta64 encoding still specify units in the style of CF conventions, which could make us a little more compatible with non-Xarray tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be possible, but this would need an additional check inside CFTimedeltaCoder
to prevent premature encoding and decoding if both attributes are attached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a good idea, though does increase the complexity a little. I gave it a try in my latest push.
xarray/coding/variables.py
Outdated
# overwrite (!) dtype in encoding, and remove from attrs | ||
# needed for correct subsequent encoding | ||
encoding["dtype"] = attrs.pop("dtype") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically we use the pop_to()
helper to do this safely, e.g., dtype = pop_to(encoding, attrs, "dtype", name=name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up! I somewhat blindly inherited this from the BooleanCoder
. I added some better tests around this issue.
Thanks @spencerkclark for taking a look at this! I left a couple of suggestions. Incidentally, I think everyone will be happier the sooner we can relax the nanosecond precision restriction in Xarray :). |
@spencerkclark I do not immediately see any issues with other coders with the current implementation. Taking @shoyer's suggestion into account, we would need to make sure that xarray always gives preference to |
…rk/xarray into timedelta64-encoding
for more information, see https://pre-commit.ci
Looking at the implementation, maybe we don't need the separate new coder class and could keep this all in one objects? I think the behavior could probably fit into a single coder:
|
Thanks @shoyer—indeed it's probably simpler to have this all live in The remaining awkward bit relates to interaction with masking and scaling. Since we are overloading the dtype encoding, we do not have a way of retaining the numerical dtype of the data that was written to disk during a round trip. I pushed a failing test in 503db4a to provide an example. Maybe I am just not thinking about this the proper way, however. |
For now I ended up punting on this in favor of forbidding providing any other encoding parameters when encoding timedeltas this new way. The previous encoding path supports this, and that functionality is still maintained in this PR. Another approach would be to use a different name for the attribute we assign the timedelta dtype to, say So possibly this is closer now—I added some more tests and laid the groundwork for eventually turning off automatic decoding of variables with time-like units attributes (we would set |
This PR implements @shoyer's suggested approach for "literal" coding of
np.timedelta64
values. Accordingly, it provides a pathway for roundtrippingnp.timedelta64
data without aFutureWarning
, and preserves the encoding of variables that were encoded on disk with the previous approach.I still want to reflect a little more on whether we want any more tests, but this seems functional at the moment—i.e. this example runs without a warning1:
@kmuehlbauer let me know if you have any initial thoughts, particularly with respect to possible interaction with other coders.
whats-new.rst
Footnotes
Note I needed to move away from nanosecond resolution here, since creating bytes with
to_netcdf
will attempt to castint64
data toint32
which leads to overflow. ↩