Skip to content

Tiered CI workflows for regression native baselines#733

Merged
lewisjared merged 6 commits into
mainfrom
feat/regression-ci-workflows
Jun 18, 2026
Merged

Tiered CI workflows for regression native baselines#733
lewisjared merged 6 commits into
mainfrom
feat/regression-ci-workflows

Conversation

@lewisjared

@lewisjared lewisjared commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Description

PR-4 of RFC 0005. Wires the regression-baseline lifecycle verbs (ci-gate,
replay, mint, sync) — shipped in the earlier stack PRs — into CI, split along
the trust boundary so that write credentials live in exactly one manually-gated step.

Stacked on feat/regression-r2-backend (PR-5); review/merge that first.

Three workflows

Workflow Trigger Credentials What it does
regression-pr-gate.yaml every pull request none Runs ref test-cases ci-gate, fails on fail, replays each replay case against the public baseline, flags execute. Safe on fork PRs.
regression-mint.yaml manual dispatch R2 write Gated behind the native-baselines Environment; mints native baselines and commits the regenerated manifest back to the dispatched feature branch.
regression-drift.yaml nightly + manual none sync + replay to catch baselines that no longer reproduce within tolerance.

The PR gate's replay fan-out lives in scripts/ci/regression-pr-gate.sh. It defers all
data download until a case actually needs replaying, so a PR that touches no baselines
(the common case) completes without any downloads.

Documentation

  • docs/background/regression-baselines.md: a new Continuous integration section.
  • docs/how-to-guides/testing-diagnostics.md: a diagnostic-developer pull request
    workflow
    section with a mermaid diagram, a per-outcome action table, and how to
    publish native baselines via the gated mint workflow.

Required repository configuration (before mint works)

Create a native-baselines Environment (Settings → Environments) with required
reviewers
, and add two environment secrets holding an object-scoped R2 token:

  • R2_ACCESS_KEY_IDREF_NATIVE_STORE_ACCESS_KEY_ID
  • R2_SECRET_ACCESS_KEYREF_NATIVE_STORE_SECRET_ACCESS_KEY

The PR gate and nightly drift need no setup.

Verification

  • actionlint + shellcheck clean on all four files; pre-commit passes.
  • ref test-cases ci-gate --json emits clean, ANSI-free, jq-parseable JSON.
  • The exact replay the gate issues for the example case reproduces the committed
    bundle against live R2 (4 native files materialised, 3 bundle files compared).

Scope

Covers the example provider (the only one migrated so far). Provider migrations
(PR-6) will add each provider to the drift/replay lists; real-provider mint will need
the self-hosted runner rather than ubuntu-latest.

Checklist

Please confirm that this pull request has done the following:

  • Tests added — n/a; CI configuration, validated with actionlint/shellcheck and a live ci-gate/replay run
  • Documentation added (where applicable)
  • Changelog item added to changelog/

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
core 92.54% <100.00%> (+<0.01%) ⬆️
providers 91.80% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...e-ref-core/src/climate_ref_core/regression/gate.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lewisjared lewisjared force-pushed the feat/regression-r2-backend branch from d88ceb1 to e7ca61e Compare June 18, 2026 04:14
Wire the regression-baseline lifecycle verbs into CI along the trust
boundary, with credentials confined to a single manually-gated step:

- regression-pr-gate.yaml: on every pull request, run `ref test-cases
  ci-gate` and replay each case it routes to `replay`; fail on `fail`,
  warn on `execute`. Public runner, no credentials, safe on fork PRs.
  The replay fan-out lives in scripts/ci/regression-pr-gate.sh, which
  defers data download until a case actually needs replaying.
- regression-mint.yaml: manually dispatched, gated behind the
  `native-baselines` Environment; mints native baselines and commits the
  regenerated manifest back to the dispatched branch.
- regression-drift.yaml: nightly (and on-demand) sync + replay to catch
  baselines that no longer reproduce within tolerance.
- background/regression-baselines.md: new "Continuous integration"
  section describing the three workflow tiers and the GitHub Environment
  + R2 secrets the mint job requires.
- how-to-guides/testing-diagnostics.md: a diagnostic-developer "pull
  request workflow" section with a mermaid diagram, a per-outcome action
  table, the test_case_version bump rule, and how to publish native
  baselines via the gated mint workflow; refresh the committed-bundle
  layout to show manifest.json and the two-layer model.
Trim duplication and tangents from the regression-baseline documentation:

- testing-diagnostics: collapse the test-data directory tree (previously
  drawn three times) to a single canonical layout, merge the two adjacent
  regression sections into one, condense the ESGF/HPC caching note, and
  lean on the background page for the two-layer model rather than
  re-explaining it.
- background: tighten the continuous-integration section wording.
- workflows + gate script: reflow the inline comments to semantic line
  breaks (comment-only; no behaviour change).
Address code-review findings on the tiered CI workflows:
- distinguish a real coupling `fail` from a hard `ci-gate` error, so a
  misconfigured base ref no longer reports as an unauthorised baseline change
- export NO_COLOR in the gate script so its JSON stays parseable off-CI
- use a provider matrix in the drift workflow as the single source of truth
  for which providers are migrated, instead of two hardcoded spots
- note that the mint commit (pushed with GITHUB_TOKEN) does not re-trigger
  the PR gate, in both the workflow and the baseline docs
- printf over echo for jq input, and fix a comment typo
@lewisjared lewisjared force-pushed the feat/regression-ci-workflows branch from b8a85d0 to 3d5a0bb Compare June 18, 2026 04:27
Base automatically changed from feat/regression-r2-backend to main June 18, 2026 04:28
@lewisjared lewisjared merged commit 2975157 into main Jun 18, 2026
23 of 24 checks passed
@lewisjared lewisjared deleted the feat/regression-ci-workflows branch June 18, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant