Fixes to dataset equivalence testing on xarray loads. by pp-mo · Pull Request #195 · SciTools/ncdata

pp-mo · 2026-02-11T13:50:04Z

Found that later xarray (v2026.01.0) was breaking the xarray load "direct vs via ncdata" tests.

Apparently, the exact meaning of Dataset.identical has changed.
From experiment, the problem seems to be that Dataset.identical now checks indexes.
Presumably this? : pydata/xarray#11035

This doesn't really affect ncdata behaviour : it was always the case that e.g. printing the with/without-ncdata loaded datasets showed differences ("via-ncdata" version has extra lazy coords and missing indexes).
Just the means of checking the "equivalence" needed fixing.

pp-mo · 2026-02-11T14:00:39Z

tests/integration/test_xarray_load_and_save_equivalence.py


-    # Treat as OK if it passes xarray comparison
+    # Check that datasets are "equal" : but NB this only compares values
+    assert xr_ds.equals(xr_ncdata_ds)


This just proves that A and B were "equal" (whatever that means) before I mangled them.
Do we really need this as well ?

for more information, see https://pre-commit.ci

pp-mo · 2026-02-12T11:39:52Z

tests/integration/test_xarray_load_and_save_equivalence.py

+    xr_ncdata_ds, xr_ds = equivalence_fix_datasets(
+        ds_from=xr_ncdata_ds, ds_to=xr_ds
+    )
    assert xr_ds.identical(xr_ncdata_ds)


So, Dataset.identical is what just changed, and xarray don't consider that a breaking change, because it's classed as a "FIX" : pydata/xarray#11035

They don't state what "identical" actually means., but it is now comparing indexes.
However it still considers lazy data as "identical" to real -- and I'm still relying on that.

TBH I'd be much happier if "identical" meant "in all respects" : then I could then adapt/equalise the datasets in specific ways before testing with "identical".
Unfortunately, Xarray are a bit vague about equality testing.
They provide "identical" "equals" and "broadcast equals"
https://docs.xarray.dev/en/latest/api/dataarray.html#comparisons.
But as noted, Dataset.equals only compares data, not metadata (an odd choice), so that really doesn't cover what I want either

So, perhaps I should just write a custom comparison routine here, with the exactly necessary tolerance engineered ? The problem with that is, I need to be confident that I have understood what are all the possible content components of xarray Datasets -- and that, again, isn't made totally clear (it's obviously based on netcdf, but what makes a variable a coordinate is never clearly stated, indexes are an additional thing, etc etc).
I don't think I can reasonably use the ncdata dataset comparison, since the point here is to compare xarray datasets.

what are all the possible content components of xarray Datasets -- and that, again, isn't made totally clear

OK we do have this section : https://docs.xarray.dev/en/latest/user-guide/data-structures.html#dataset
So perhaps I was being a bit unfair. But it doesn't mention indexes.

pp-mo · 2026-02-12T11:55:43Z

@chrisbunney I'd value your opinion of this, before I merge it !

Since I'm about to migrate the repo to Scitools, it might make a useful testcase for permissions there.

Fixes to dataset equivalence testing on xarray loads.

e3b4660

pp-mo commented Feb 11, 2026

View reviewed changes

Added towncrier fragment.

58ffa85

pp-mo force-pushed the xr202601_identical_fix branch from db82739 to 58ffa85 Compare February 11, 2026 16:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

2a3b2e8

for more information, see https://pre-commit.ci

scitools-ci bot added this to 🚴 Peloton Feb 12, 2026

pp-mo commented Feb 12, 2026

View reviewed changes

pp-mo closed this Feb 12, 2026

pp-mo deleted the xr202601_identical_fix branch February 12, 2026 15:41

github-project-automation bot moved this to Done in 🚴 Peloton Feb 12, 2026

pp-mo mentioned this pull request Feb 12, 2026

Xr202601 identical fix, take 2 #197

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to dataset equivalence testing on xarray loads.#195

Fixes to dataset equivalence testing on xarray loads.#195
pp-mo wants to merge 3 commits intomainfrom
xr202601_identical_fix

pp-mo commented Feb 11, 2026

Uh oh!

pp-mo Feb 11, 2026

Uh oh!

pp-mo Feb 12, 2026 •

edited

Loading

Uh oh!

pp-mo Feb 12, 2026 •

edited

Loading

Uh oh!

pp-mo commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pp-mo commented Feb 11, 2026

Uh oh!

pp-mo Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

pp-mo Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pp-mo Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pp-mo commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pp-mo Feb 12, 2026 •

edited

Loading

pp-mo Feb 12, 2026 •

edited

Loading