Skip to content

refactor!: rename ignore datasets to grey list and decouple fetching#622

Open
lewisjared wants to merge 7 commits intomainfrom
refactor/grey-list-decouple
Open

refactor!: rename ignore datasets to grey list and decouple fetching#622
lewisjared wants to merge 7 commits intomainfrom
refactor/grey-list-decouple

Conversation

@lewisjared
Copy link
Copy Markdown
Contributor

@lewisjared lewisjared commented Apr 12, 2026

Description

Rename ignore_datasets -> grey_list across the codebase. The shipped file moves from default_ignore_datasets.yaml at the repo root to config/default_grey_list.yaml, the Config.ignore_datasets_file field becomes Config.grey_list_file, and the REF_IGNORE_DATASETS_* env vars become REF_GREY_LIST_*. The IgnoreFacets constraint class keeps its name — it's a generic facet-exclusion mechanism; the grey list is one consumer of it.

Decouple fetching from Config construction. Config.default() no longer performs network I/O as a side effect (it previously triggered a download via a field factory). The fetch now happens lazily in ExecutionSolver.build_from_config, via a new refresh_grey_list_file() helper. A new Config.grey_list_url field (overridable via REF_GREY_LIST_URL) controls where the grey list is fetched from; setting it empty disables fetching entirely, which is what offline / air-gapped HPC deployments need.

Default cache location moved from the platformdirs user cache (often read-only on containers / HPC) to $REF_CONFIGURATION/grey_list.yaml, alongside ref.toml and the database. A missing grey list file is now tolerated as an empty list rather than raising.

Tests now use the in-tree config/default_grey_list.yaml deterministically (with fetching disabled), so a PR editing the in-tree file will correctly succeed or fail the regression tests instead of silently picking up the cached main-branch copy — the original #590 bug.

Adds a "Grey list" section to docs/configuration.md covering both config fields, the lazy fetch behaviour, and an explicit offline / HPC workflow.

Breaking change: Anyone setting REF_IGNORE_DATASETS_* env vars or referencing Config.ignore_datasets_file / default_ignore_datasets.yaml needs to update to the new names. The default cache file location also changed, so stale caches under the platformdirs user cache will be ignored.

Checklist

Please confirm that this pull request has done the following:

  • Tests added
  • Documentation added (where applicable)
  • Changelog item added to changelog/

…etching

The grey list is now described by two independent fields:

- `Config.grey_list_file` controls where the file lives (override e.g. for
  k8s pods where the user cache is read-only).
- `Config.grey_list_url` controls where it is fetched from. Set to "" to
  disable fetching for offline/air-gapped environments.

`Config.default()` no longer performs network I/O as a side effect; the
download moves to an explicit `refresh_grey_list_file` call inside
`ProviderRegistry.build_from_config`. This also fixes #590 — solver
regression tests now read the in-tree `default_grey_list.yaml` instead of
silently picking up the cached main-branch copy.

BREAKING CHANGE: `default_ignore_datasets.yaml`,
`Config.ignore_datasets_file`, and `REF_IGNORE_DATASETS_*` env vars are
renamed to their `grey_list` equivalents.
…h only on solve

Three fixes from the Codex review of the prior commit, plus user-facing
documentation:

- `Config.grey_list_file` now uses `converter=Path`, so
  `REF_GREY_LIST_FILE` overrides actually produce a `Path` instead of a
  bare string and downstream `read_text()` / `exists()` calls keep working.
- `DiagnosticProvider.configure` treats a missing grey list file as an
  empty list rather than raising `FileNotFoundError`. This unblocks
  offline runs where the user disables fetching with `grey_list_url=""`
  before the file has been seeded.
- The `refresh_grey_list_file` call moves from
  `ProviderRegistry.build_from_config` to
  `ExecutionSolver.build_from_db`, so read-only commands like
  `ref providers list` and `ref datasets list` no longer touch the
  network. Fetching only happens at solve time.

Also adds a "Grey list" section to `docs/configuration.md` covering the
two configuration values, the offline / air-gapped HPC workflow, and how
the lazy fetch behaves.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
core 93.40% <100.00%> (+0.02%) ⬆️
providers 91.85% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...climate-ref-core/src/climate_ref_core/providers.py 94.14% <100.00%> (+0.05%) ⬆️
packages/climate-ref/src/climate_ref/config.py 98.11% <100.00%> (+0.12%) ⬆️
packages/climate-ref/src/climate_ref/solver.py 98.36% <100.00%> (+0.01%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…EF_CONFIGURATION

- Relocate `default_grey_list.yaml` from the repo root to
  `config/default_grey_list.yaml` so the repo root stays uncluttered;
  this is also a step toward eventually owning the official version in
  the climate-ref-aft repo.
- Default `Config.grey_list_file` now resolves to
  `${REF_CONFIGURATION}/grey_list.yaml` instead of the user cache
  directory. The configuration dir is reliably writable on container/HPC
  deployments where `~/.cache` often is not.
- Update the default fetch URL, the test fixture path, and the docs
  accordingly.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR renames the “ignore datasets” mechanism to a “grey list” across the codebase, and removes network I/O side effects from Config.default() by making grey list fetching lazy and solver-triggered.

Changes:

  • Rename ignore_datasets_filegrey_list_file and introduce grey_list_url + corresponding REF_GREY_LIST_* env vars.
  • Move grey list refresh logic into ExecutionSolver.build_from_db() (lazy fetch) and update provider configuration to tolerate missing grey list files.
  • Update docs/tests to use the in-tree config/default_grey_list.yaml deterministically and validate “no network I/O” for read-only flows.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/climate-ref/src/climate_ref/config.py Introduces grey list config fields + refresh helper; removes fetch-on-config-load behavior.
packages/climate-ref/src/climate_ref/solver.py Refreshes the grey list lazily at solve construction time when grey_list_url is set.
packages/climate-ref/src/climate_ref/conftest_plugin.py Ensures tests use in-tree grey list and disables fetching for determinism/offline.
packages/climate-ref-core/src/climate_ref_core/providers.py Switches provider configuration to read grey_list_file and treat missing file as empty.
docs/configuration.md Adds “Grey list” documentation, including offline/HPC workflow and lazy-fetch semantics.
config/default_grey_list.yaml Renamed/updated default grey list file and header comments.
packages/climate-ref/tests/unit/test_config.py Updates config tests for new fields; adds refresh/no-fetch regression coverage.
packages/climate-ref/tests/unit/test_solver.py Adds solver tests to ensure grey list refresh happens (or is skipped) as expected.
packages/climate-ref/tests/unit/test_provider_registry.py Adds coverage ensuring provider registry build is network-free (read-only commands).
packages/climate-ref-core/tests/unit/test_providers.py Updates provider tests to use grey list and verifies missing-file behavior.
packages/climate-ref-pmp/tests/unit/test_provider.py Updates PMP provider tests for renamed config field.
changelog/622.breaking.md Adds breaking-change note documenting the rename and behavior change.
Comments suppressed due to low confidence (1)

config/default_grey_list.yaml:20

  • The header comment says to disable fetching by setting the URL to None, but Config.grey_list_url is a string and the docs/code use an empty string (e.g. REF_GREY_LIST_URL= / grey_list_url = "") to disable fetching. Please update this comment to match the actual configuration behavior so offline instructions aren’t misleading.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/climate-ref/src/climate_ref/config.py Outdated
Comment thread changelog/622.breaking.md Outdated
- Preserve existing grey list file mtime on download failure
- Fix grammar in changelog breaking change note
…safe

Loading a config that still uses `ignore_datasets_file` or
`REF_IGNORE_DATASETS_FILE` now raises a `ValueError` with migration
instructions instead of silently dropping the value and falling back to
the new defaults. Marked as a TODO to remove the shim before 1.0.0.

Grey list refresh no longer creates an empty placeholder when the
download fails: a cached file is reused if present, otherwise a new
`GreyListRefreshError` is raised so a transient network failure cannot
silently disable grey list protections for 6 hours.

Also rewrap the new prose (changelog, docs section, docstrings) to
follow semantic line breaks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants