Skip to content

feat(esmvaltool): add CMIP7 support to all ESMValTool diagnostics#519

Merged
lewisjared merged 13 commits intomainfrom
esmval-requirements
Feb 16, 2026
Merged

feat(esmvaltool): add CMIP7 support to all ESMValTool diagnostics#519
lewisjared merged 13 commits intomainfrom
esmval-requirements

Conversation

@lewisjared
Copy link
Contributor

@lewisjared lewisjared commented Feb 9, 2026

Summary

  • Add CMIP7 as an alternative data source for all 16 ESMValTool diagnostics in data_requirements
  • Add CMIP7 to recipe infrastructure for CMIP7 replacing the deprecated drs attribute
  • Add get_cmip_source_type() helper so update_recipe methods dynamically select CMIP6 or CMIP7 input files

Per-diagnostic pattern

Each CMIP7 DataRequirement mirrors its CMIP6 counterpart with:

  • frequency filter instead of table_id
  • variant_label instead of member_id in group_by
  • SourceDatasetType.CMIP7 for AddSupplementaryDataset

@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

❌ Patch coverage is 71.21212% with 76 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ef-esmvaltool/src/climate_ref_esmvaltool/recipe.py 50.00% 16 Missing and 6 partials ⚠️
...esmvaltool/diagnostics/climate_drivers_for_fire.py 48.00% 12 Missing and 1 partial ⚠️
..._ref_esmvaltool/diagnostics/sea_ice_sensitivity.py 50.00% 10 Missing and 1 partial ⚠️
...tool/src/climate_ref_esmvaltool/diagnostics/ecs.py 50.00% 8 Missing and 1 partial ⚠️
...e_ref_esmvaltool/diagnostics/cloud_scatterplots.py 74.07% 6 Missing and 1 partial ⚠️
...limate-ref-core/src/climate_ref_core/esgf/cmip7.py 85.71% 3 Missing and 1 partial ⚠️
...ool/src/climate_ref_esmvaltool/diagnostics/base.py 77.77% 2 Missing and 2 partials ⚠️
...climate-ref-core/src/climate_ref_core/providers.py 60.00% 1 Missing and 1 partial ⚠️
...ol/diagnostics/climate_at_global_warming_levels.py 80.00% 1 Missing and 1 partial ⚠️
...ages/climate-ref/src/climate_ref/models/dataset.py 77.77% 2 Missing ⚠️
Files with missing lines Coverage Δ
...te-ref-core/src/climate_ref_core/cmip6_to_cmip7.py 92.85% <100.00%> (ø)
...imate-ref-core/src/climate_ref_core/constraints.py 96.61% <ø> (ø)
...-esmvaltool/src/climate_ref_esmvaltool/__init__.py 100.00% <100.00%> (ø)
..._esmvaltool/diagnostics/cloud_radiative_effects.py 100.00% <100.00%> (ø)
...ool/src/climate_ref_esmvaltool/diagnostics/enso.py 100.00% <100.00%> (ø)
.../src/climate_ref_esmvaltool/diagnostics/example.py 100.00% <100.00%> (ø)
...valtool/diagnostics/regional_historical_changes.py 95.53% <100.00%> (ø)
...e_ref_esmvaltool/diagnostics/sea_ice_area_basic.py 100.00% <100.00%> (ø)
...tool/src/climate_ref_esmvaltool/diagnostics/tcr.py 100.00% <100.00%> (ø)
...ool/src/climate_ref_esmvaltool/diagnostics/tcre.py 100.00% <100.00%> (ø)
... and 13 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the climate-ref-esmvaltool provider to support running all ESMValTool diagnostics against CMIP7 datasets (alongside existing CMIP6/obs4MIPs support), mainly by adding CMIP7 facet mappings, CMIP7-aware recipe generation, and OR-ed data_requirements.

Changes:

  • Add CMIP7 facet mapping + ESMValTool config entries (drs/rootpath) and CMIP7 path preparation.
  • Update diagnostics to accept CMIP6 or CMIP7 via OR-ed data_requirements, and select the active CMIP source dynamically in update_recipe.
  • Add/adjust CMIP7 test cases for regional diagnostics and update the base config test expectation.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/climate-ref-esmvaltool/tests/unit/diagnostics/test_base.py Update expected rootpath entries after adding CMIP7.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/recipe.py Add CMIP7 facet mapping; tweak prepare_climate_data handling.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/base.py Add get_cmip_source_type() helper; add CMIP7 to ESMValTool config and selector extraction.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/zec.py Add CMIP7 alternative requirement + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/tcre.py Add CMIP7 alternative requirement + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/tcr.py Add CMIP7 alternative requirement + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/sea_ice_sensitivity.py Add CMIP7 alternative requirement + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/sea_ice_area_basic.py Add CMIP7 alternative requirement + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/regional_historical_changes.py Add CMIP7 data requirement alternatives + CMIP7 TestCases.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/example.py Add CMIP7 alternative requirement + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/enso.py Add CMIP7 alternative requirements + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/ecs.py Add CMIP7 alternative requirement + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/cloud_scatterplots.py Generalize CMIP requirements (CMIP6/CMIP7) + make plot title project-aware.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/cloud_radiative_effects.py Add CMIP7 alternative requirement + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/climate_drivers_for_fire.py Add CMIP7 alternative requirement + dynamic recipe update source.
packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/climate_at_global_warming_levels.py Add CMIP7 alternative requirement + CMIP7-specific matching/grouping facets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lewisjared
Copy link
Contributor Author

@bouweandela I was able to run both the cmip6/cmip7 test cases for the historical annual cycle locally.

Let me know if there is any obvious mistakes before I start fleshing out more of the test cases.

@bouweandela
Copy link
Contributor

bouweandela commented Feb 9, 2026

Looks like a great start @lewisjared!

Regarding the CMIP7 data requirements:

  • the CMIP6 table_id consisted of both the realm and the frequency, so you may want to put entries for realm in the data requirements too
  • the ESMValTool diagnostics have been developed for global data, so I doubt that anything other than region: glb will work

I noticed some failing tests because facets are different per project (e.g. obs4MIPs doesn't have ensemble members) but that is not reflected in the current implementation, see https://github.com/Climate-REF/climate-ref/actions/runs/21719510099/job/62644893377#logs

@bouweandela
Copy link
Contributor

We may also want to consider/check which the diagnostic authors which branding_suffix should be used for the variables in the data requirements. All of them? Only one? Different per variable?

@lewisjared
Copy link
Contributor Author

the CMIP6 table_id consisted of both the realm and the frequency, so you may want to put entries for realm in the data requirements too

According to the global attributes doc "Note that "realm" may be assigned multiple realms, separated by a single space, with the first one listed considered primary.".
I think we will currently compare exactly. I'll make a issue and follow up with this.

Do we have any information about how the external_variables files will be named? I can't find anything on the CMIP7 guidance site.

@bouweandela
Copy link
Contributor

According to the global attributes doc "Note that "realm" may be assigned multiple realms, separated by a single space, with the first one listed considered primary.".

I used the CMIP6 table_id instead of just frequency in the data requirements for the ESMValTool diagnostics because I noticed that some of the diagnostics failed to run if the realm was different from expected. ESMValTool recipes will expect the value of mip, which I believe translates to realm in CMIP7 and table_id in CMIP6, to have the value for table_id (minus the "Table " bit) used at the top of the CMOR tables. For CMIP7, the table_id in the CMOR tables appears to be the same as realm: https://github.com/WCRP-CMIP/cmip7-cmor-tables/blob/6737d39d5424ad20550ad117f28512cf69fa2901/tables/CMIP7_aerosol.json#L15, while for CMIP6 it was usually a composite of the first (few) letter(s) of the realm followed by the frequency https://github.com/PCMDI/cmip6-cmor-tables/blob/087fe45d21c082e28723e0f930e4266abe91b853/Tables/CMIP6_Amon.json#L5.

Do we have any information about how the external_variables files will be named? I can't find anything on the CMIP7 guidance site.

They will follow the same naming scheme as other variables. Here is an example of the areacella variable in the CMOR tables: https://github.com/WCRP-CMIP/cmip7-cmor-tables/blob/6737d39d5424ad20550ad117f28512cf69fa2901/tables/CMIP7_atmos.json#L54
and with the CMIP6 as CMIP7 CMORizer example notebook that results in something like MIP-DRS7/CMIP7/CMIP/PCMDI/PCMDI-test-1-0/historical/r1i1p1f3/glb/fx/areacella/ti-u-hxy-u/gn/v20260109/areacella_ti-u-hxy-u_fx_glb_gn_PCMDI-test-1-0_historical_r1i1p1f3.nc

Copilot AI review requested due to automatic review settings February 12, 2026 14:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… handling

- Add CMIP7 data requirements to all ESMValTool diagnostics
- Add CMIP7 recipe handling with branded variable support
- Add DReq mappings and region/branded variable metadata
- Add branded_variable_name as derived column in CMIP7 data catalog
- Filter CMIP7-only facets from CMIP6 ESGF searches
- Add CMIP7 test data specifications to all diagnostics
- Use CMIP7-style filenames for converted datasets
- Install dev ESMValCore for CMIP7 support
- Unify DReq enrichment and clean up CMIP7 data requirements
…so-characteristics

The hardcoded TROPFLUX dataset entry was missing the "mip": "Omon"
facet that the upstream ESMValTool recipe requires, causing a frequency
mapping error during CMIP7 test case execution.
When the CMIP7 conversion cache filenames change (e.g. adding time
ranges), stale symlinks from previous test runs remain in the
climate_data directory. ESMValCore's glob matches these dangling
symlinks and fails with FileNotFoundError.

Now removes dangling symlinks from each target directory before
creating new ones, preventing stale references from confusing
ESMValCore's file discovery.
…atasets

rewrite_mip_for_cmip7 only checked top-level and diagnostic-level
datasets, missing recipes that place CMIP7 datasets at the variable
level (climate-drivers-for-fire, ecs, sea-ice-sensitivity). Refactored
detection into _iter_recipe_datasets and mip rewriting into
_rewrite_variable_mip for clarity. The rewrite also preserves the
original CMIP6 mip on non-CMIP7 additional_datasets (e.g. OBS).
…7 conversion

CMIP6 activity_id can contain multiple activities separated by spaces
(e.g. "C4MIP CDRMIP") which breaks directory paths and glob patterns.
Use only the first activity for CMIP7 DRS paths and metadata.

Also add version and sanitized activity_id as global attributes on
converted CMIP7 NetCDF files so that parse_cmip7_file can construct
correct instance_ids with version directories.
…le CMIP7 per-variable datasets

- Add missing realm facet to all CMIP7 FacetFilter definitions
  (atmos, ocean, seaIce, land as appropriate)
- Change test_data_spec source_ids where ESGF data was unavailable:
  ZEC: CanESM5 -> ACCESS-ESM1-5, TCRE: CanESM5 -> MPI-ESM1-2-LR,
  cloud-scatterplots-cli-ta: CanESM5 -> CESM2
- Add table_id filter to cloud-scatterplots-cli-ta CMIP7Request to
  avoid ESGF returning ta from wrong table (AERmonZ instead of Amon)
- Fix cloud_scatterplots update_recipe to remove diagnostic-level
  additional_datasets when using CMIP7 per-variable datasets
…cessor

ESMValCore changed annual_statistics to drop the year coordinate by
default (keep_group_coordinates=False). Several diagnostic scripts
still rely on it, so patch recipes at runtime to set
keep_group_coordinates=True on every annual_statistics step.
Use max() instead of next() when globbing the executions directory
so the most recent timestamped recipe run is always selected.
@lewisjared
Copy link
Contributor Author

lewisjared commented Feb 16, 2026

CMIP7 ESMValTool Diagnostic Status

Tested all 18 ESMValTool diagnostics against CMIP7 data. 9 pass, 8 fail, 1 disabled.

Passing (9)

  • cloud-scatterplots-clwvi-pr
  • enso-basic-climatology
  • global-mean-timeseries
  • regional-historical-annual-cycle
  • regional-historical-timeseries
  • regional-historical-trend
  • transient-climate-response
  • transient-climate-response-emissions
  • zero-emission-commitment

Failing — blocked on ESMValCore upstream issues (8)

CMIP7 CMOR tables missing (4 diagnostics):
Variables like lwcre, swcre, and rtnt can't be resolved when project='CMIP7' because the mip value doesn't map to a loadable CMOR table. ESMValGroup/ESMValCore#2980.

  • climate-drivers-for-fire (tasmax / atmos)
  • cloud-radiative-effects (lwcre / atmos)
  • cloud-scatterplots-clivi-lwcre (lwcre / atmos)
  • cloud-scatterplots-clt-swcre (swcre / atmos)
  • equilibrium-climate-sensitivity (rtnt / atmos)

The other set of issues which are yet to be resolved is due to the fixes not being applied for these CMIP7 datasets. @bouweandela Is there a gold standard source that we could use that doesn't require fixes?

CoordinateMultiDimError on curvilinear ocean grids (3 diagnostics):
area_statistics fails on multi-dimensional coordinates (e.g., CanESM5 tripolar ocean grid gn) because try_adding_calculated_cell_area runs before using the provided areacello.

  • enso-characteristics (CanESM5 tos)
  • sea-ice-area-basic (CanESM5 siconc)
  • sea-ice-sensitivity (CanESM5 siconc)

CMORCheckError on vertical coordinate metadata (1 diagnostic):
alevel coordinate's standard_name check fails for the cli variable.

  • cloud-scatterplots-cli-ta (CESM2 cli)

Disabled (1)

  • climate-at-global-warming-levels — needs scenario data, not yet available in CMIP7

* origin/main:
  chore: rename changelog to match PR number
  feat: add file dimensions to all ESMValTool diagnostics
  feat: add file dimensions to ESMValTool diagnostics
@lewisjared lewisjared mentioned this pull request Feb 16, 2026
42 tasks
… rename to branded_variable

- Add SQLAlchemy hybrid_property on CMIP7Dataset for branded_variable
  (variable_id + "_" + branding_suffix), eliminating the standalone
  _add_branded_variable_name() function and load_catalog() override
- Rename branded_variable_name to branded_variable across the codebase
  to match the CMIP7 file attribute name
- Add variant_label to diagnostic facets for TCR, TCRE, ZEC, and
  regional historical trend so CMEC bundle validation passes for both
  CMIP6 (member_id) and CMIP7 (variant_label) test cases
operator: mean
annual_statistics:
operator: mean
keep_group_coordinates: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bouweandela Did the default for this change recently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least I'm not going crazy!

@lewisjared lewisjared merged commit fb80c2e into main Feb 16, 2026
24 of 25 checks passed
@lewisjared lewisjared deleted the esmval-requirements branch February 16, 2026 03:41
@bouweandela
Copy link
Contributor

bouweandela commented Feb 16, 2026

The other set of issues which are yet to be resolved is due to the fixes not being applied for these CMIP7 datasets. @bouweandela Is there a gold standard source that we could use that doesn't require fixes?

You could load the CMIP6 data using esmvalcore.dataset.Dataset, this will apply the fixes and will give you an iris.cube.Cube. You could save that using iris.save or convert it to xarray using ncdata and then you can do the conversion to CMIP7.

@lewisjared
Copy link
Contributor Author

I was trying to avoid the dependency on esmvalcore unless truely needed. The conversions don't just happen in a single standalone script.

@bouweandela
Copy link
Contributor

It depends on what needs to be fixed, but most fixes are fairly straightforward, so it may not be much work to copy them from ESMValCore. This one is applied to all datasets: https://github.com/ESMValGroup/ESMValCore/blob/da81d5f67158f3d2603831b56ab6b4fb8a388d86/esmvalcore/cmor/_fixes/fix.py#L360 and then there are project/dataset/variable specific fixes in https://github.com/ESMValGroup/ESMValCore/blob/main/esmvalcore/cmor/_fixes/, organized as project.dataset.Variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants