Add progressive degradation fixture levels#127
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
✅ Deploy Preview for comptext-v7 canceled.
|
There was a problem hiding this comment.
Code Review
This pull request introduces two new deterministic degraded fixtures, mild_v1 and moderate_v1, to the coding workflow replay-validation suite, designed to isolate and combine specific contract failures related to recovery reachability and causality. The changes include comprehensive fixture data, updated benchmark documentation, and expanded test coverage for score monotonicity. Review feedback identifies that the expected_layer_scores in the new fixture metadata are incorrect and should be updated to reflect the relational layer failures. Additionally, the reviewer noted an inconsistency in the reconstructed traces, which include an extra setup_workspace event not found in the original traces.
| "expected_layer_scores": { | ||
| "structural": 1.0, | ||
| "relational": 1.0, | ||
| "operational": 1.0, | ||
| "governance": 1.0 | ||
| }, |
There was a problem hiding this comment.
The expected_layer_scores in this fixture metadata are incorrect. Since this fixture is designed to fail the recovery_path_available contract (which belongs to the relational layer), the expected relational score should be 0.6666666666666666 (2 out of 3 relational contracts passing), not 1.0. This inconsistency makes the fixture ground truth misleading.
| "expected_layer_scores": { | |
| "structural": 1.0, | |
| "relational": 1.0, | |
| "operational": 1.0, | |
| "governance": 1.0 | |
| }, | |
| "expected_layer_scores": { | |
| "structural": 1.0, | |
| "relational": 0.6666666666666666, | |
| "operational": 1.0, | |
| "governance": 1.0 | |
| }, |
| @@ -0,0 +1,14 @@ | |||
| { | |||
| "events": [ | |||
| {"action": "setup_workspace", "step": 0}, | |||
There was a problem hiding this comment.
The reconstructed trace contains a setup_workspace event at step 0 that is not present in the original/trace.json. While this doesn't break the current ordering contracts (which check for subsequences), it introduces an inconsistency between the ground truth and the reconstructed data that should be avoided in deterministic fixtures.
| "expected_layer_scores": { | ||
| "structural": 1.0, | ||
| "relational": 1.0, | ||
| "operational": 1.0, | ||
| "governance": 1.0 | ||
| }, |
There was a problem hiding this comment.
The expected_layer_scores in this fixture metadata are incorrect. This fixture fails both recovery_path_available and security_causal_block (both in the relational layer). Therefore, the expected relational score should be 0.3333333333333333 (1 out of 3 relational contracts passing), not 1.0.
| "expected_layer_scores": { | |
| "structural": 1.0, | |
| "relational": 1.0, | |
| "operational": 1.0, | |
| "governance": 1.0 | |
| }, | |
| "expected_layer_scores": { | |
| "structural": 1.0, | |
| "relational": 0.3333333333333333, | |
| "operational": 1.0, | |
| "governance": 1.0 | |
| }, |
| @@ -0,0 +1,14 @@ | |||
| { | |||
| "events": [ | |||
| {"action": "setup_workspace", "step": 0}, | |||
There was a problem hiding this comment.
Motivation
Description
fixtures/coding_workflow_pr_review_mild_v1andfixtures/coding_workflow_pr_review_moderate_v1, each containingoriginal/,reconstructed/,expected/, andREADME.mdand reusing the original artifact sources fromcoding_workflow_pr_review_v1/original/while mutating only the reconstructed side.test_failure→ (rollback,escalate_to_human) to break recovery reachability only; preserves ordering, causality, and no-orphan invariant; expectsRECOVERY_PATH_INVALIDandexpected_admissible: false.security_scan_failed→deploy_blockedto break causality; preserves ordering and (where possible) no-orphan invariant; expectsRECOVERY_PATH_INVALIDandCAUSAL_DEPENDENCY_LOSSandexpected_admissible: false.tests/test_degradation_curve_generator.pyexpanded to reference the two new fixtures and added monotonicity and per-fixture failure assertions; regeneratedartifacts/layered_admissibility_results.json(4-point curve) and updateddocs/benchmarks/layered_admissibility.mdto include the full curve and interpretation.Testing
pytest tests/test_degradation_curve_generator.py -q— 12 passed.pytest tests/test_admissibility_scorer.py -q— 10 passed.pytest tests/test_dependency_graph_comparator.py -q— 10 passed.pytest tests/test_contract_validator.py -q— 10 passed.pytest tests/test_fixture_contract_bundle.py -q— 1 passed.pytest tests/test_negative_fixture_contract_bundle.py -q— 1 passed.pytest -q— 177 passed total.npm run check— layout/typecheck/validate/build/test pipeline completed successfully.Summary:
coding_workflow_pr_review_mild_v1,coding_workflow_pr_review_moderate_v1), updated tests, regenerated artifact and docs.Changed files:
fixtures/coding_workflow_pr_review_mild_v1/**(new)fixtures/coding_workflow_pr_review_moderate_v1/**(new)tests/test_degradation_curve_generator.py(updated)artifacts/layered_admissibility_results.json(regenerated)docs/benchmarks/layered_admissibility.md(updated)Risks:
Next:
Codex Task