Tiered CI workflows for regression native baselines#733
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
d88ceb1 to
e7ca61e
Compare
Wire the regression-baseline lifecycle verbs into CI along the trust boundary, with credentials confined to a single manually-gated step: - regression-pr-gate.yaml: on every pull request, run `ref test-cases ci-gate` and replay each case it routes to `replay`; fail on `fail`, warn on `execute`. Public runner, no credentials, safe on fork PRs. The replay fan-out lives in scripts/ci/regression-pr-gate.sh, which defers data download until a case actually needs replaying. - regression-mint.yaml: manually dispatched, gated behind the `native-baselines` Environment; mints native baselines and commits the regenerated manifest back to the dispatched branch. - regression-drift.yaml: nightly (and on-demand) sync + replay to catch baselines that no longer reproduce within tolerance.
- background/regression-baselines.md: new "Continuous integration" section describing the three workflow tiers and the GitHub Environment + R2 secrets the mint job requires. - how-to-guides/testing-diagnostics.md: a diagnostic-developer "pull request workflow" section with a mermaid diagram, a per-outcome action table, the test_case_version bump rule, and how to publish native baselines via the gated mint workflow; refresh the committed-bundle layout to show manifest.json and the two-layer model.
Trim duplication and tangents from the regression-baseline documentation: - testing-diagnostics: collapse the test-data directory tree (previously drawn three times) to a single canonical layout, merge the two adjacent regression sections into one, condense the ESGF/HPC caching note, and lean on the background page for the two-layer model rather than re-explaining it. - background: tighten the continuous-integration section wording. - workflows + gate script: reflow the inline comments to semantic line breaks (comment-only; no behaviour change).
Address code-review findings on the tiered CI workflows: - distinguish a real coupling `fail` from a hard `ci-gate` error, so a misconfigured base ref no longer reports as an unauthorised baseline change - export NO_COLOR in the gate script so its JSON stays parseable off-CI - use a provider matrix in the drift workflow as the single source of truth for which providers are migrated, instead of two hardcoded spots - note that the mint commit (pushed with GITHUB_TOKEN) does not re-trigger the PR gate, in both the workflow and the baseline docs - printf over echo for jq input, and fix a comment typo
b8a85d0 to
3d5a0bb
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
PR-4 of RFC 0005. Wires the regression-baseline lifecycle verbs (
ci-gate,replay,mint,sync) — shipped in the earlier stack PRs — into CI, split alongthe trust boundary so that write credentials live in exactly one manually-gated step.
Stacked on
feat/regression-r2-backend(PR-5); review/merge that first.Three workflows
regression-pr-gate.yamlref test-cases ci-gate, fails onfail, replays eachreplaycase against the public baseline, flagsexecute. Safe on fork PRs.regression-mint.yamlnative-baselinesEnvironment; mints native baselines and commits the regenerated manifest back to the dispatched feature branch.regression-drift.yamlsync+replayto catch baselines that no longer reproduce within tolerance.The PR gate's replay fan-out lives in
scripts/ci/regression-pr-gate.sh. It defers alldata download until a case actually needs replaying, so a PR that touches no baselines
(the common case) completes without any downloads.
Documentation
docs/background/regression-baselines.md: a new Continuous integration section.docs/how-to-guides/testing-diagnostics.md: a diagnostic-developer pull requestworkflow section with a mermaid diagram, a per-outcome action table, and how to
publish native baselines via the gated mint workflow.
Required repository configuration (before
mintworks)Create a
native-baselinesEnvironment (Settings → Environments) with requiredreviewers, and add two environment secrets holding an object-scoped R2 token:
R2_ACCESS_KEY_ID→REF_NATIVE_STORE_ACCESS_KEY_IDR2_SECRET_ACCESS_KEY→REF_NATIVE_STORE_SECRET_ACCESS_KEYThe PR gate and nightly drift need no setup.
Verification
actionlint+shellcheckclean on all four files;pre-commitpasses.ref test-cases ci-gate --jsonemits clean, ANSI-free, jq-parseable JSON.replaythe gate issues for the example case reproduces the committedbundle against live R2 (4 native files materialised, 3 bundle files compared).
Scope
Covers the
exampleprovider (the only one migrated so far). Provider migrations(PR-6) will add each provider to the drift/replay lists; real-provider mint will need
the self-hosted runner rather than
ubuntu-latest.Checklist
Please confirm that this pull request has done the following:
ci-gate/replayrunchangelog/