fix(core): make dataset hash deterministic across pandas versions by lewisjared · Pull Request #741 · Climate-REF/climate-ref

lewisjared · 2026-06-18T11:02:34Z

Description

DatasetCollection / ExecutionDatasetCollection hashed dataset identifiers via pandas.util.hash_pandas_object, whose output varies across pandas releases and platforms. The same committed catalog.yaml therefore hashed to different values in different environments, which let a regression baseline minted on one machine fail the CI coupling gate on another, and could make the solver re-run executions unnecessarily.

This replaces the pandas-based hash with a hashlib SHA1 digest over the sorted slug values, and combines the per-source digests in a fixed, source-type-keyed order, so the result is stable across pandas versions and platforms and independent of row/insertion order.

Because the hash is also the solver's dataset_hash execution-dedup key (solver.py), existing databases will re-run each execution once after upgrading; results are unaffected. The committed example catalog_hash values and the dataset-hash regression snapshots have been regenerated to match.

Checklist

Please confirm that this pull request has done the following:

Tests added
Documentation added (where applicable)
Changelog item added to changelog/

DatasetCollection and ExecutionDatasetCollection hashed dataset identifiers via pandas.util.hash_pandas_object, whose output varies across pandas releases and platforms. A regression baseline minted in one environment then failed the CI coupling gate in another, and the solver could re-run executions needlessly. Hash the sorted slug values with hashlib instead, combining the per-source digests in a fixed order (and keying on source type to avoid cross-type collisions). Regenerate the committed example catalog_hash values and the dataset-hash regression snapshots to match. Existing databases will re-run each execution once after upgrading because the dataset_hash changes; results are unaffected.

codecov · 2026-06-18T11:04:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag	Coverage Δ
core	`92.56% <100.00%> (+<0.01%)`	⬆️
providers	`91.82% <ø> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
.../climate-ref-core/src/climate_ref_core/datasets.py	`90.12% <100.00%> (+0.37%)`	⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… merge PR #741 (merged into main) changed datasets.hash to a pandas-version-independent algorithm. Recompute the catalog_hash for the gpp-fluxnet2015, lai-avh15c1 and mrsos-wangmao cmip6 baselines with the new algorithm so each catalog.yaml _metadata.hash matches its manifest.json catalog_hash and the recomputed value, keeping the coupling gate green now that the algorithm has changed.

chore: label change as breaking

82b1744

lewisjared merged commit a272d4a into main Jun 18, 2026
25 checks passed

lewisjared deleted the fix/deterministic-dataset-hash branch June 18, 2026 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(core): make dataset hash deterministic across pandas versions#741

fix(core): make dataset hash deterministic across pandas versions#741
lewisjared merged 2 commits into
mainfrom
fix/deterministic-dataset-hash

lewisjared commented Jun 18, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lewisjared commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lewisjared commented Jun 18, 2026 •

edited

Loading

codecov Bot commented Jun 18, 2026 •

edited

Loading