Skip to content

Comments

feat(solver): validate DataRequirement columns against adapter finalisation#561

Merged
lewisjared merged 5 commits intomainfrom
check-column-finalisation
Feb 23, 2026
Merged

feat(solver): validate DataRequirement columns against adapter finalisation#561
lewisjared merged 5 commits intomainfrom
check-column-finalisation

Conversation

@lewisjared
Copy link
Contributor

@lewisjared lewisjared commented Feb 23, 2026

Summary

  • Adapters now declare which columns require finalisation via a columns_requiring_finalisation class attribute on DatasetAdapter (default: empty frozenset)
  • CMIP6DatasetAdapter overrides this with the 20 columns that require opening netCDF files (realm, units, branch times, etc.)
  • The solver validates at solve time that no DataRequirement filters or groups on these columns, raising a clear ValueError instead of silently producing zero results from NaN rows
  • Integration test now derives from the adapter instead of hardcoding the column set

Test plan

  • Unit tests for validation (filter, group_by, skip for raw DataFrame, skip for empty set)
  • Unit tests for DataCatalog.columns_requiring_finalisation property
  • Integration test (test_lazy_solver_parity) passes with derived columns

…sation

Adapters now declare which columns require finalisation via
columns_requiring_finalisation. The solver validates at solve time that
no DataRequirement filters or groups on these columns, raising a clear
ValueError instead of silently producing zero results from NaN rows.
@codecov
Copy link

codecov bot commented Feb 23, 2026

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
packages/climate-ref/src/climate_ref/solver.py 92.30% 0 Missing and 1 partial ⚠️
Flag Coverage Δ
core 93.19% <95.83%> (+0.01%) ⬆️
providers 89.65% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
packages/climate-ref/src/climate_ref/config.py 97.96% <100.00%> (+0.03%) ⬆️
...ckages/climate-ref/src/climate_ref/data_catalog.py 89.74% <100.00%> (+1.50%) ⬆️
...kages/climate-ref/src/climate_ref/datasets/base.py 98.43% <100.00%> (+0.02%) ⬆️
...ages/climate-ref/src/climate_ref/datasets/cmip6.py 95.57% <100.00%> (+0.03%) ⬆️
packages/climate-ref/src/climate_ref/solver.py 99.02% <92.30%> (-0.46%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces adapter-declared “columns requiring finalisation” and adds solver-time validation to prevent DataRequirement filters/grouping on columns that are unavailable prior to dataset finalisation (avoiding silent empty/incorrect results). It also fixes REF environment variable overrides not being applied on Config initialization and updates docs/tests accordingly.

Changes:

  • Add columns_requiring_finalisation to the adapter interface and implement it for CMIP6; expose it via DataCatalog.columns_requiring_finalisation.
  • Validate DataRequirement.filters / group_by against columns_requiring_finalisation in the solver, raising a clear ValueError.
  • Fix Config env override application by adding the missing post-init hook; update docs + tests.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/climate-ref/src/climate_ref/solver.py Adds _validate_requirement_columns() and calls it during dataset extraction.
packages/climate-ref/src/climate_ref/datasets/base.py Extends DatasetAdapter protocol with columns_requiring_finalisation (default empty).
packages/climate-ref/src/climate_ref/datasets/cmip6.py Declares the CMIP6 set of columns that require file-open finalisation.
packages/climate-ref/src/climate_ref/data_catalog.py Adds columns_requiring_finalisation property delegating to the adapter.
packages/climate-ref/src/climate_ref/config.py Ensures env var overrides are applied on init via __attrs_post_init__.
packages/climate-ref/tests/unit/test_solver.py Adds validation unit tests and updates mocks for new catalog attribute.
packages/climate-ref/tests/unit/test_data_catalog.py Adds tests for DataCatalog.columns_requiring_finalisation.
packages/climate-ref/tests/unit/test_config.py Adds coverage for REF_CMIP6_PARSER env override.
packages/climate-ref/tests/integration/test_lazy_solver_parity.py Derives NaN/restore column set from adapter rather than hardcoding.
docs/gen_config_stubs.py Formats environment variable names as inline code in generated docs.
docs/configuration.md Minor formatting cleanup.
changelog/561.fix.md Changelog entry for the Config env override fix.
changelog/561.feature.md Changelog entry for solver validation feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lewisjared lewisjared merged commit 5a2eff0 into main Feb 23, 2026
30 checks passed
@lewisjared lewisjared deleted the check-column-finalisation branch February 23, 2026 03:54
lewisjared added a commit that referenced this pull request Feb 23, 2026
* origin/main:
  chore: clean up
  chore: add fix changelog entry for PR #561
  chore: add changelog entry for PR #561
  chore: Support env variables for parser
  feat(solver): validate DataRequirement columns against adapter finalisation
  test: improve test coverage
  test: add an integration test comparing the lazy loading to complete runs
  chore: fix regression output
  test: Fix failing test
  fix: invalidate catalog cache after finalisation and handle pre-parsed datetimes
  refactor(cmip6): simplify finalise_datasets
  fix(tests): remove unused config variable from test_db fixture and related tests
  docs: use British English spelling conventions
  docs: update ingestion documentation for lazy finalization
  docs: add changelog for lazy dataset ingestion
  test: add tests for lazy finalization and fix datetime conversion bug
  feat: add DataCatalog wrapper and integrate lazy finalization into solver
  feat: add FinaliseableDatasetAdapterMixin and implement on CMIP6 adapter
  feat: infer frequency from table_id in DRS parser
  refactor: remove unused config param from register_dataset

# Conflicts:
#	packages/climate-ref/tests/unit/test_solver.py
lewisjared added a commit that referenced this pull request Feb 24, 2026
* origin/main: (80 commits)
  chore: Update comment
  chore: upgrade pins for ilamb
  fix: revert compat=override on open_mfdataset
  docs: add changelog for #565
  chore: Upgrade lockfile and fix some errors
  chore: add coverage
  chore: add default separator in alembic
  fix: time_coder warning
  chore: Pin to use tas
  fix(solver): preserve DataCatalog wrapper in apply_dataset_filters
  fix(tests): use to_frame() when accessing DataCatalog in solver tests
  docs: Changelog
  chore: run the finalise in threads
  chore: clean up
  chore: add fix changelog entry for PR #561
  feat(cli): add --dataset-filter option to datasets list command
  chore: add changelog entry for PR #561
  feat(solver): add --dataset-filter option to filter input datasets when solving
  chore: Support env variables for parser
  feat(solver): add optional --limit flag to solve command
  ...
lewisjared added a commit that referenced this pull request Feb 25, 2026
* origin/main: (86 commits)
  Bump version: 0.11.0 → 0.11.1
  docs: add changelog for #567
  fix: reduce memory during ingestion and add solve logging
  fix: prevent DRS re-ingestion from regressing finalised datasets
  Bump version: 0.10.0 → 0.11.0
  chore: Update comment
  chore: upgrade pins for ilamb
  fix: revert compat=override on open_mfdataset
  docs: add changelog for #565
  chore: Upgrade lockfile and fix some errors
  chore: add coverage
  chore: add default separator in alembic
  fix: time_coder warning
  chore: Pin to use tas
  fix(solver): preserve DataCatalog wrapper in apply_dataset_filters
  fix(tests): use to_frame() when accessing DataCatalog in solver tests
  docs: Changelog
  chore: run the finalise in threads
  chore: clean up
  chore: add fix changelog entry for PR #561
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant