Skip to content

Enforce benchmark manifests in eCPS comparisons#244

Merged
MaxGhenis merged 1 commit into
mainfrom
codex/enforce-comparison-benchmark-manifest-20260606
Jun 6, 2026
Merged

Enforce benchmark manifests in eCPS comparisons#244
MaxGhenis merged 1 commit into
mainfrom
codex/enforce-comparison-benchmark-manifest-20260606

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Make the official eCPS replacement comparison fail before writing a verdict JSON when it is not tied to a pre-existing frozen production-eCPS benchmark manifest. This addresses the release-process failure where a freshly recomputed or patched comparison surface could stand in for the canonical eCPS baseline.

Changes:

  • add --benchmark-manifest to microplex-us-ecps-replacement-comparison
  • compare the freshly generated frozen eCPS baseline certificate to manifest evidence for baseline H5 sha, target DB sha, scorer checkout, PolicyEngine-US version, target surface metadata, and scoring config
  • include the matched manifest descriptor in the comparison payload
  • add pass/fail tests for manifest enforcement

This does not change the loss calculation. It makes the comparison refuse to publish a verdict unless the scoring surface matches the pinned baseline evidence.

Verification

  • uv run ruff format src/microplex_us/pipelines/ecps_replacement_comparison.py tests/pipelines/test_ecps_replacement_comparison.py
  • uv run ruff check src/microplex_us/pipelines/ecps_replacement_comparison.py tests/pipelines/test_ecps_replacement_comparison.py
  • uv run --extra dev --extra policyengine python -m pytest -q tests/pipelines/test_ecps_replacement_comparison.py -k "benchmark_manifest or gate_contract" -> 3 passed, 20 deselected
  • uv run --extra dev --extra policyengine python -m pytest -q tests/pipelines/test_ecps_replacement_comparison.py -> 23 passed

@MaxGhenis MaxGhenis merged commit aba8903 into main Jun 6, 2026
5 checks passed
@MaxGhenis MaxGhenis deleted the codex/enforce-comparison-benchmark-manifest-20260606 branch June 6, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant