feat(solver): validate DataRequirement columns against adapter finalisation#561
feat(solver): validate DataRequirement columns against adapter finalisation#561lewisjared merged 5 commits intomainfrom
Conversation
…sation Adapters now declare which columns require finalisation via columns_requiring_finalisation. The solver validates at solve time that no DataRequirement filters or groups on these columns, raising a clear ValueError instead of silently producing zero results from NaN rows.
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces adapter-declared “columns requiring finalisation” and adds solver-time validation to prevent DataRequirement filters/grouping on columns that are unavailable prior to dataset finalisation (avoiding silent empty/incorrect results). It also fixes REF environment variable overrides not being applied on Config initialization and updates docs/tests accordingly.
Changes:
- Add
columns_requiring_finalisationto the adapter interface and implement it for CMIP6; expose it viaDataCatalog.columns_requiring_finalisation. - Validate
DataRequirement.filters/group_byagainstcolumns_requiring_finalisationin the solver, raising a clearValueError. - Fix
Configenv override application by adding the missing post-init hook; update docs + tests.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/climate-ref/src/climate_ref/solver.py | Adds _validate_requirement_columns() and calls it during dataset extraction. |
| packages/climate-ref/src/climate_ref/datasets/base.py | Extends DatasetAdapter protocol with columns_requiring_finalisation (default empty). |
| packages/climate-ref/src/climate_ref/datasets/cmip6.py | Declares the CMIP6 set of columns that require file-open finalisation. |
| packages/climate-ref/src/climate_ref/data_catalog.py | Adds columns_requiring_finalisation property delegating to the adapter. |
| packages/climate-ref/src/climate_ref/config.py | Ensures env var overrides are applied on init via __attrs_post_init__. |
| packages/climate-ref/tests/unit/test_solver.py | Adds validation unit tests and updates mocks for new catalog attribute. |
| packages/climate-ref/tests/unit/test_data_catalog.py | Adds tests for DataCatalog.columns_requiring_finalisation. |
| packages/climate-ref/tests/unit/test_config.py | Adds coverage for REF_CMIP6_PARSER env override. |
| packages/climate-ref/tests/integration/test_lazy_solver_parity.py | Derives NaN/restore column set from adapter rather than hardcoding. |
| docs/gen_config_stubs.py | Formats environment variable names as inline code in generated docs. |
| docs/configuration.md | Minor formatting cleanup. |
| changelog/561.fix.md | Changelog entry for the Config env override fix. |
| changelog/561.feature.md | Changelog entry for solver validation feature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
* origin/main: chore: clean up chore: add fix changelog entry for PR #561 chore: add changelog entry for PR #561 chore: Support env variables for parser feat(solver): validate DataRequirement columns against adapter finalisation test: improve test coverage test: add an integration test comparing the lazy loading to complete runs chore: fix regression output test: Fix failing test fix: invalidate catalog cache after finalisation and handle pre-parsed datetimes refactor(cmip6): simplify finalise_datasets fix(tests): remove unused config variable from test_db fixture and related tests docs: use British English spelling conventions docs: update ingestion documentation for lazy finalization docs: add changelog for lazy dataset ingestion test: add tests for lazy finalization and fix datetime conversion bug feat: add DataCatalog wrapper and integrate lazy finalization into solver feat: add FinaliseableDatasetAdapterMixin and implement on CMIP6 adapter feat: infer frequency from table_id in DRS parser refactor: remove unused config param from register_dataset # Conflicts: # packages/climate-ref/tests/unit/test_solver.py
* origin/main: (80 commits) chore: Update comment chore: upgrade pins for ilamb fix: revert compat=override on open_mfdataset docs: add changelog for #565 chore: Upgrade lockfile and fix some errors chore: add coverage chore: add default separator in alembic fix: time_coder warning chore: Pin to use tas fix(solver): preserve DataCatalog wrapper in apply_dataset_filters fix(tests): use to_frame() when accessing DataCatalog in solver tests docs: Changelog chore: run the finalise in threads chore: clean up chore: add fix changelog entry for PR #561 feat(cli): add --dataset-filter option to datasets list command chore: add changelog entry for PR #561 feat(solver): add --dataset-filter option to filter input datasets when solving chore: Support env variables for parser feat(solver): add optional --limit flag to solve command ...
* origin/main: (86 commits) Bump version: 0.11.0 → 0.11.1 docs: add changelog for #567 fix: reduce memory during ingestion and add solve logging fix: prevent DRS re-ingestion from regressing finalised datasets Bump version: 0.10.0 → 0.11.0 chore: Update comment chore: upgrade pins for ilamb fix: revert compat=override on open_mfdataset docs: add changelog for #565 chore: Upgrade lockfile and fix some errors chore: add coverage chore: add default separator in alembic fix: time_coder warning chore: Pin to use tas fix(solver): preserve DataCatalog wrapper in apply_dataset_filters fix(tests): use to_frame() when accessing DataCatalog in solver tests docs: Changelog chore: run the finalise in threads chore: clean up chore: add fix changelog entry for PR #561 ...
Summary
columns_requiring_finalisationclass attribute onDatasetAdapter(default: empty frozenset)CMIP6DatasetAdapteroverrides this with the 20 columns that require opening netCDF files (realm, units, branch times, etc.)DataRequirementfilters or groups on these columns, raising a clearValueErrorinstead of silently producing zero results from NaN rowsTest plan
DataCatalog.columns_requiring_finalisationpropertytest_lazy_solver_parity) passes with derived columns