Coverage-based test selection (done right) — shadow mode#1461
Conversation
Claude Code ReviewHead SHA: 6d068b0 Files changed:
Findings:
changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
if not changed_fpp:
return [], list(cases), "rung7: no Fortran source changed"When The spec's first invariant is "every uncertainty resolves to run". An empty explicit list ( The minimal fix is to treat |
|
Good catch — fixed in f7a692f. An empty/whitespace |
…sts via rung 5 (soundness)
Fix 1: in build_coverage_map Phase 2, a test with non-empty failures produced only partial .gcda files (crash mid-pipeline). Previously those tests were still added to test_results, and their truncated coverage was cached. A later .fpp change that ran only in the missing stage would be silently skipped. Fix: when failures is non-empty, record in all_failures and continue without adding to test_results — absent entries are conservatively included by select_tests (rung 5). Fix 2: in _parse_gcov_json_output, a mid-stream json.JSONDecodeError returned the partial result set, which is untrustworthy (the truncated JSON stream may be missing coverage for .fpp files that were not yet serialised). Fix: return None on that error path so the caller omits the test from the map entirely. Fix 6 (comment): correct the post_process comment (~line 347) — the regular suite runs post_process only under --test-all (which CI sets), not never.
…ching
Fix 3: ALWAYS_RUN_ALL_EXACT + prefixes enumerated only a handful of toolchain
files, missing case.py, build.py, common.py, state.py, sched.py, etc. Any
toolchain/mfc/*.py change (except cases.py) affects every test's generation
or execution, so under-enumeration was unsound. Replace with a catch-all:
any(f.startswith('toolchain/mfc/') and f.endswith('.py') and f != CASES_PY).
Drop the now-redundant individual file entries and toolchain/mfc/params/ and
toolchain/mfc/run/ prefixes (all subsumed). Keep CMakeLists.txt,
toolchain/cmake/, toolchain/bootstrap/, and src include rules.
Fix 4: rung 3 matched only .f90 and .f, missing .F90, .F95, .F03, .F08, .FOR
and all other uppercase/mixed variants. Changed files ending in those extensions
under src/ would fall through to per-test selection against a coverage map that
only tracks .fpp, causing silent under-inclusion. Fix: case-insensitive match
against the full tuple (.f90, .f, .f95, .f03, .f08, .for).
Tests: add three new unit tests covering the above fixes.
… caveat Add 'permissions: contents: write' at the workflow top level so the coverage-refresh job is authorized to commit and push the updated coverage_map.json.gz back to master. Without this, the GITHUB_TOKEN has only read permissions in newer default permission settings. Also add a comment on the git push step noting that branch protection may still reject the default GITHUB_TOKEN and that a PAT or GitHub App with bypass-branch-protection permission may be needed.
There was a problem hiding this comment.
Pull request overview
Re-introduces execution-coverage-based test selection for ./mfc.sh test --only-changes in shadow mode (prints what would run on PRs while still executing the full suite), along with tooling to build/refresh the committed coverage map and workflows to keep it healthy.
Changes:
- Adds coverage-map-driven selection logic (
param_hash-keyed), integrates it into thetestcommand, and wires PR CI to run the selector in shadow mode. - Adds a gcov-based coverage map builder and scheduled workflows to refresh the committed map and fail loudly if it becomes stale/under-covering.
- Adds unit tests covering the selection ladder, map I/O, and changed-file detection behavior.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| toolchain/mfc/test/test.py | Integrates --only-changes selection (shadow/enforce) and adds --build-coverage-map entrypoint. |
| toolchain/mfc/test/test_coverage_unit.py | New unit tests for coverage selection, hashing, map I/O, and changed-file detection. |
| toolchain/mfc/test/coverage.py | Implements param-hash keying, selection ladder, changed-file detection, summary formatting, and map health checks. |
| toolchain/mfc/test/coverage_build.py | Adds gcov-based builder to generate tests/coverage_map.json.gz from per-test execution coverage. |
| toolchain/mfc/test/case.py | Adds coverage_key() to cases/builders to support map lookups by param_hash. |
| toolchain/mfc/cli/commands.py | Adds CLI flags: --only-changes, --select-enforce, --changed-files, --changes-branch, --build-coverage-map. |
| .github/workflows/test.yml | Enables shadow-mode selector on PR CI and passes paths-filter output as --changed-files. |
| .github/workflows/coverage-refresh.yml | Scheduled / on-push workflow to rebuild and commit refreshed coverage map. |
| .github/workflows/coverage-health.yml | Scheduled workflow to fail if the committed map is stale or under-covers current tests. |
| .github/workflows/common/test.sh | Enables shadow-mode selector in the common SLURM test wrapper on PRs. |
| .github/workflows/common/coverage-refresh.sh | SLURM-side script to build with gcov and run the map builder. |
| .github/scripts/check_coverage_map_health.py | Health-check script used by the coverage-health workflow. |
| ALWAYS_RUN_ALL_EXACT = frozenset( | ||
| [ | ||
| "CMakeLists.txt", | ||
| ] | ||
| ) | ||
| ALWAYS_RUN_ALL_PREFIXES = ( | ||
| "src/common/include/", # GPU/Fypp macro & include files (CPU map can't line-attribute) | ||
| "toolchain/cmake/", # build system | ||
| "toolchain/bootstrap/", # build/run scripts | ||
| ) |
| def load_map(path: Path) -> Tuple[Optional[dict], Optional[dict]]: | ||
| """Return (entries_without_meta, meta), or (None, None) if missing/corrupt.""" | ||
| if not Path(path).exists(): | ||
| return None, None | ||
| try: | ||
| with gzip.open(path, "rt", encoding="utf-8") as f: | ||
| data = json.load(f) | ||
| except (OSError, gzip.BadGzipFile, json.JSONDecodeError, UnicodeDecodeError): | ||
| return None, None | ||
| if not isinstance(data, dict) or "_meta" not in data: | ||
| return None, None | ||
| meta = data.pop("_meta") | ||
| return data, meta |
| Argument(name="changed-files", dest="changed_files", type=str, default=None, help="Newline- or comma-separated changed-file list (from CI paths-filter). Overrides git detection."), | ||
| Argument(name="changes-branch", dest="changes_branch", type=str, default="master", help="Branch to diff against for --only-changes."), |
| SELECT="" | ||
| [ "${{ github.event_name }}" = "pull_request" ] && SELECT="--only-changes" | ||
| /bin/bash mfc.sh test -v --max-attempts 3 -j $(nproc) $SELECT --changed-files "$CHANGED_FILES" $TEST_ALL $TEST_PCT $PRECISION |
| # MPI-compiled binaries must be launched via an MPI launcher (even ppn=1). | ||
| # Use --bind-to none to avoid binding issues with concurrent launches. | ||
| if shutil.which("mpirun"): | ||
| mpi_cmd = ["mpirun", "--bind-to", "none", "-np", str(ppn)] | ||
| elif shutil.which("srun"): | ||
| mpi_cmd = ["srun", "--ntasks", str(ppn)] | ||
| else: | ||
| raise MFCException("No MPI launcher found (mpirun or srun). MFC binaries require an MPI launcher.\n On Ubuntu: sudo apt install openmpi-bin\n On macOS: brew install open-mpi") |
…validate map shape Fix 1: Replace the allowlist of run-all files with an inverted design: define a small, conservative allowlist of provably test-irrelevant files (docs/*.md, LICENSE, etc.) and treat any changed file that is not .fpp, not cases.py, and not in that allowlist as unattributable -> run all. This closes the gap where mfc.sh, .github/**, tests/**, toolchain/pyproject.toml, and similar files would silently fall through to rung-7 (skip-all) under --select-enforce. Fix 2: In load_map, validate that every entry is str -> list[str] after popping _meta. A malformed entry returns (None, None) so the caller runs the full suite rather than silently misrouting tests. Tests: expand test_docs_only_still_skips_all to cover docs/, LICENSE, .claude/; add test_unattributable_nonsource_change_runs_all for mfc.sh / pyproject.toml / tests/** / .github/workflows; add test_load_map_rejects_malformed_entry.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1461 +/- ##
=======================================
Coverage 61.31% 61.31%
=======================================
Files 72 72
Lines 19771 19771
Branches 2852 2852
=======================================
Hits 12123 12123
Misses 5699 5699
Partials 1949 1949 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What
Re-introduces execution-coverage-based test selection (
./mfc.sh test --only-changes), replacing the gcov coverage-cache removed in #1460. It ships in shadow mode (PR jobs print what they would select but still run the full suite); enforcement is a separate later change, gated on shadow evidence.Design spec:
docs/superpowers/specs/2026-05-29-coverage-test-selection-design.md. Plan:docs/superpowers/plans/2026-05-29-coverage-test-selection.md.Why this one is sound where the old one wasn't
The old cache rotted invisibly (UUID-keyed, never auto-refreshed, silent fallback). The post-mortem established that only execution coverage is sound — static labels miss transitive side effects. This rebuild fixes the root causes:
param_hash(SHA-256 of the test's full resolved params), notCRC32(trace)— stable to cosmetic edits, invalidates on real behavior changes, decoupled from the golden-file UUID.tests/coverage_map.json.gz), refreshed by a bot on master — freshness is a git-visible fact, not an evictable cache.coverage-healthworkflow fails red if the map is stale/under-covers (kept off the PR path so it can't wedge PRs).The conservative ladder (soundness)
A test runs if any rung holds: changed-file detection failed → all · macro/codegen/build/
src/**/include/input → all · changed.f90/.f→ all · changed.fppcovered by zero tests → all · test param_hash absent → run it · coverage overlap → run it · else skip. Every uncertainty resolves to run.Validated end-to-end against a real gcov map (574 tests)
Known gcov caveat (handled)
gcov rolls
#:include'd.fppinto the parent compilation unit, so include files aren't reliably attributed. Closed by rule: anysrc/**/include/change forces a full run, so the attribution gap can never cause under-inclusion. (Map built on macOS gfortran 15.2; CI's Linux build may attribute differently — the map rebuilds there, and the include-rule keeps it sound regardless.)Test plan
--gcovbuild +--build-coverage-mapover all tests (collector validated at runtime)--select-enforce