Skip to content

Fixes to dataset equivalence testing on xarray loads.#195

Closed
pp-mo wants to merge 3 commits intomainfrom
xr202601_identical_fix
Closed

Fixes to dataset equivalence testing on xarray loads.#195
pp-mo wants to merge 3 commits intomainfrom
xr202601_identical_fix

Conversation

@pp-mo
Copy link
Member

@pp-mo pp-mo commented Feb 11, 2026

Found that later xarray (v2026.01.0) was breaking the xarray load "direct vs via ncdata" tests.

Apparently, the exact meaning of Dataset.identical has changed.
From experiment, the problem seems to be that Dataset.identical now checks indexes.
Presumably this? : pydata/xarray#11035

This doesn't really affect ncdata behaviour : it was always the case that e.g. printing the with/without-ncdata loaded datasets showed differences ("via-ncdata" version has extra lazy coords and missing indexes).
Just the means of checking the "equivalence" needed fixing.


# Treat as OK if it passes xarray comparison
# Check that datasets are "equal" : but NB this only compares values
assert xr_ds.equals(xr_ncdata_ds)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just proves that A and B were "equal" (whatever that means) before I mangled them.
Do we really need this as well ?

@pp-mo pp-mo force-pushed the xr202601_identical_fix branch from db82739 to 58ffa85 Compare February 11, 2026 16:00
xr_ncdata_ds, xr_ds = equivalence_fix_datasets(
ds_from=xr_ncdata_ds, ds_to=xr_ds
)
assert xr_ds.identical(xr_ncdata_ds)
Copy link
Member Author

@pp-mo pp-mo Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, Dataset.identical is what just changed, and xarray don't consider that a breaking change, because it's classed as a "FIX" : pydata/xarray#11035

They don't state what "identical" actually means., but it is now comparing indexes.
However it still considers lazy data as "identical" to real -- and I'm still relying on that.

TBH I'd be much happier if "identical" meant "in all respects" : then I could then adapt/equalise the datasets in specific ways before testing with "identical".
Unfortunately, Xarray are a bit vague about equality testing.
They provide "identical" "equals" and "broadcast equals"
https://docs.xarray.dev/en/latest/api/dataarray.html#comparisons.
But as noted, Dataset.equals only compares data, not metadata (an odd choice), so that really doesn't cover what I want either

So, perhaps I should just write a custom comparison routine here, with the exactly necessary tolerance engineered ? The problem with that is, I need to be confident that I have understood what are all the possible content components of xarray Datasets -- and that, again, isn't made totally clear (it's obviously based on netcdf, but what makes a variable a coordinate is never clearly stated, indexes are an additional thing, etc etc).
I don't think I can reasonably use the ncdata dataset comparison, since the point here is to compare xarray datasets.

Copy link
Member Author

@pp-mo pp-mo Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are all the possible content components of xarray Datasets -- and that, again, isn't made totally clear

OK we do have this section : https://docs.xarray.dev/en/latest/user-guide/data-structures.html#dataset
So perhaps I was being a bit unfair. But it doesn't mention indexes.

@pp-mo
Copy link
Member Author

pp-mo commented Feb 12, 2026

@chrisbunney I'd value your opinion of this, before I merge it !

Since I'm about to migrate the repo to Scitools, it might make a useful testcase for permissions there.

@pp-mo pp-mo closed this Feb 12, 2026
@pp-mo pp-mo deleted the xr202601_identical_fix branch February 12, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant