Implement literal `np.timedelta64` coding #10101

spencerkclark · 2025-03-06T13:39:37Z

This PR implements @shoyer's suggested approach for "literal" coding of np.timedelta64 values. Accordingly, it provides a pathway for roundtripping np.timedelta64 data without a FutureWarning, and preserves the encoding of variables that were encoded on disk with the previous approach.

I still want to reflect a little more on whether we want any more tests, but this seems functional at the moment—i.e. this example runs without a warning¹:

>>> import xarray; import numpy as np
>>> deltas = np.array([1, 2, 3], dtype='timedelta64[D]').astype('timedelta64[s]')
>>> ds = xarray.Dataset({'lead_time': deltas})
>>> xarray.open_dataset(ds.to_netcdf())
<xarray.Dataset> Size: 24B
Dimensions:    (lead_time: 3)
Coordinates:
  * lead_time  (lead_time) timedelta64[s] 24B 1 days 2 days 3 days
Data variables:
    *empty*

@kmuehlbauer let me know if you have any initial thoughts, particularly with respect to possible interaction with other coders.

Closes Timedelta64 data cannot be round-tripped to netCDF files without a warning #10099
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

Note I needed to move away from nanosecond resolution here, since creating bytes with to_netcdf will attempt to cast int64 data to int32 which leads to overflow. ↩

kmuehlbauer · 2025-03-06T14:38:40Z

@spencerkclark Thanks! I'll try to look into this tomorrow. The major interaction issues are with CFMaskCoder/CFScaleOffsetCoder and CFDatetimeCoder. So I do not expect too much issues with other coders here, but I'll check anyway.

shoyer · 2025-03-07T01:14:33Z

xarray/coding/variables.py

+        if np.issubdtype(variable.data.dtype, np.timedelta64):
+            dims, data, attrs, encoding = unpack_for_encoding(variable)
+            resolution, _ = np.datetime_data(variable.dtype)
+            attrs["dtype"] = f"timedelta64[{resolution}]"


What about also including units in attrs?

That would make timedelta64 encoding still specify units in the style of CF conventions, which could make us a little more compatible with non-Xarray tools.

Should be possible, but this would need an additional check inside CFTimedeltaCoder to prevent premature encoding and decoding if both attributes are attached.

I agree this is a good idea, though does increase the complexity a little. I gave it a try in my latest push.

shoyer · 2025-03-07T01:17:13Z

xarray/coding/variables.py

+            # overwrite (!) dtype in encoding, and remove from attrs
+            # needed for correct subsequent encoding
+            encoding["dtype"] = attrs.pop("dtype")


Typically we use the pop_to() helper to do this safely, e.g., dtype = pop_to(encoding, attrs, "dtype", name=name)

Thanks for the heads up! I somewhat blindly inherited this from the BooleanCoder. I added some better tests around this issue.

shoyer · 2025-03-07T01:18:44Z

Thanks @spencerkclark for taking a look at this! I left a couple of suggestions.

Incidentally, I think everyone will be happier the sooner we can relax the nanosecond precision restriction in Xarray :).

kmuehlbauer · 2025-03-07T12:48:04Z

@kmuehlbauer let me know if you have any initial thoughts, particularly with respect to possible interaction with other coders.

@spencerkclark I do not immediately see any issues with other coders with the current implementation. Taking @shoyer's suggestion into account, we would need to make sure that xarray always gives preference to dtype-attribute instead of units (if both are given).

…rk/xarray into timedelta64-encoding

for more information, see https://pre-commit.ci

shoyer · 2025-03-08T22:41:45Z

Looking at the implementation, maybe we don't need the separate new coder class and could keep this all in one objects?

I think the behavior could probably fit into a single coder:

Encoding time units (to disk):
- Convert data timedelta64 -> integer
- Write both units and dtype attributes
Decoding time units (from disk):
- Convert data: integer -> timedelta64
- If dtype != 'timedelta64', issue a future warning.

spencerkclark · 2025-03-09T00:24:35Z

Thanks @shoyer—indeed it's probably simpler to have this all live in CFTimedeltaCoder, and we can re-use some existing code there. I went ahead and made that change.

The remaining awkward bit relates to interaction with masking and scaling. Since we are overloading the dtype encoding, we do not have a way of retaining the numerical dtype of the data that was written to disk during a round trip. I pushed a failing test in 503db4a to provide an example. Maybe I am just not thinking about this the proper way, however.

spencerkclark · 2025-03-22T19:28:27Z

The remaining awkward bit relates to interaction with masking and scaling. Since we are overloading the dtype encoding, we do not have a way of retaining the numerical dtype of the data that was written to disk during a round trip. I pushed a failing test in 503db4a to provide an example. Maybe I am just not thinking about this the proper way, however.

For now I ended up punting on this in favor of forbidding providing any other encoding parameters when encoding timedeltas this new way. The previous encoding path supports this, and that functionality is still maintained in this PR.

Another approach would be to use a different name for the attribute we assign the timedelta dtype to, say "xarray_dtype" instead of "dtype", but this strays from how things are handled for boolean values, and I am not sure we are losing much by not supporting these other encoding parameters ("_FillValue", "missing_value", "add_offset", and "scale_factor") in this code path. I'm open to discuss, however.

So possibly this is closer now—I added some more tests and laid the groundwork for eventually turning off automatic decoding of variables with time-like units attributes (we would set CFTimedeltaCoder.decode_via_units to False by default instead of True). Ultimately though this is a delicate PR that should be carefully discussed / reviewed.

Proof of concept literal timedelta64 coding

063437b

spencerkclark mentioned this pull request Mar 6, 2025

Timedelta64 data cannot be round-tripped to netCDF files without a warning #10099

Open

Ensure test_roundtrip_timedelta_data test uses old encoding pathway

03f2988

spencerkclark added 2 commits March 6, 2025 19:45

Remove no longer relevant test

bdb53d7

Merge branch 'main' into timedelta64-encoding

05c3ce6

shoyer reviewed Mar 7, 2025

View reviewed changes

spencerkclark and others added 11 commits March 8, 2025 09:47

Include units attribute

00d9eaa

Move coder to times.py

b043b45

Merge branch 'main' into timedelta64-encoding

6f4e6e4

Add what's new entry

7f73753

Merge branch 'timedelta64-encoding' of https://github.com/spencerkcla…

4a8e111

…rk/xarray into timedelta64-encoding

Restore test and reduce diff

9ce2a24

Fix typing

eb6e19a

[pre-commit.ci] auto fixes from pre-commit.com hooks

436e588

for more information, see https://pre-commit.ci

Fix doctests

a305238

Restore original order of encoders

b406c64

Add return types to tests

a21b137

spencerkclark added 3 commits March 8, 2025 18:40

Move everything to CFTimedeltaCoder; reuse code where possible

5108b02

Fix mypy

452968c

Use Kai's offset and scale_factor logic for all encoding

503db4a

spencerkclark added 5 commits March 21, 2025 20:46

Merge branch 'main' into timedelta64-encoding

9aee097

Fix bad merge

56f55e2

Forbid mixing other encoding with literal timedelta64 encoding

c5e7de9

Expose fine-grained control over decoding pathways

d1744af

Rename test

7c7b071

spencerkclark added 7 commits March 22, 2025 12:47

Use consistent dtype spelling

da1edc4

Continue supporting non-timedelta dtype-only encoding

2bb4b99

Fix example attribute in docstring

0220ed5

Update what's new

c83fcb3

Fix typo

d1e8a5e

Complete test

7b94d35

Fix docstring

f269e68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement literal `np.timedelta64` coding #10101

Implement literal `np.timedelta64` coding #10101

spencerkclark commented Mar 6, 2025 •

edited

Loading

kmuehlbauer commented Mar 6, 2025

shoyer Mar 7, 2025

kmuehlbauer Mar 7, 2025

spencerkclark Mar 8, 2025

shoyer Mar 7, 2025

spencerkclark Mar 8, 2025

shoyer commented Mar 7, 2025

kmuehlbauer commented Mar 7, 2025

shoyer commented Mar 8, 2025

spencerkclark commented Mar 9, 2025

spencerkclark commented Mar 22, 2025

Implement literal np.timedelta64 coding #10101

Are you sure you want to change the base?

Implement literal np.timedelta64 coding #10101

Conversation

spencerkclark commented Mar 6, 2025 • edited Loading

Footnotes

kmuehlbauer commented Mar 6, 2025

shoyer Mar 7, 2025

Choose a reason for hiding this comment

kmuehlbauer Mar 7, 2025

Choose a reason for hiding this comment

spencerkclark Mar 8, 2025

Choose a reason for hiding this comment

shoyer Mar 7, 2025

Choose a reason for hiding this comment

spencerkclark Mar 8, 2025

Choose a reason for hiding this comment

shoyer commented Mar 7, 2025

kmuehlbauer commented Mar 7, 2025

shoyer commented Mar 8, 2025

spencerkclark commented Mar 9, 2025

spencerkclark commented Mar 22, 2025

Implement literal `np.timedelta64` coding #10101

Implement literal `np.timedelta64` coding #10101

spencerkclark commented Mar 6, 2025 •

edited

Loading