test: harden mock contract + multi-source E2E + opt-in concurrency (#210)#211
Merged
test: harden mock contract + multi-source E2E + opt-in concurrency (#210)#211
Conversation
#207 was a real contract bug — the per-file bake never wrote release-manifest.json — that mocks didn't catch because they only asserted the happy path (textures + catalog + sentinel). The JS/shell/Rust clients hard-failed against a fresh bake; only live anvil-dev runs surfaced it. Tightening the unit-test contract so a future regression can't ship. Adds two test classes (7 new tests): TestManifestEmission (substrate-contract path set): - commit history must include {release-manifest.json, <src>.json, <src>/<tier>/.tier_complete} - manifest commit bundled atomically with catalog (single commit) - manifest commit carries `parent_commit` for the CAS lock - sentinel is strictly the LAST commit TestCasRetryOnManifestCommit (the previously-untested retry loop): - one 412 → fresh manifest re-fetch → retry → success - continuous 412 → exactly 6 attempts before propagating - non-412 (401 auth error) → propagates immediately, no retry Net: 5 → 12 tests on the bake path.
…t A) Mirror of the bake-side hardening. Locks down the derive path's manifest emission contract and the previously-untested 412-retry loop added in #209. TestDeriveManifestEmission: - commit history must include {release-manifest.json, <src>.json, <src>/<target_tier>/.tier_complete} - manifest body's sources.<src>.tiers.<target_tier> = {complete: true} (catches a regression that drops the tier-update from the merge) TestDeriveCasRetryOnManifestCommit: - one 412 → retry → success (asserts 2 attempts + 2 manifest fetches) - continuous 412 → exactly 6 attempts then propagate (locks the budget) - 401 (non-412) → propagates immediately, no retry Net: 13 → 18 tests on the derive path.
Two new E2E classes that catch contract slips mocks can't: TestMultiSourceManifestMerge (always-on under MAT_VIS_E2E=1): - Bakes polyhaven THEN ambientcg sequentially into the same release tag, asserts the manifest accumulates both sources. A regression that clobbers the second bake's manifest write would only show one source here. Lives nightly via e2e.yml from #205. TestConcurrentBakesShareTag (opt-in MAT_VIS_E2E_CONCURRENCY=N): - Spawns N=3 multiprocessing.Process workers (default) baking polyhaven/ambientcg/gpuopen at tier 1k into ONE release tag. The CAS retry loop is the only thing keeping their manifest writes from clobbering each other. - Threading.Thread is NOT used — Python's GIL serializes live HTTP and never fires the retry path; verified during the #207 fix. - Asserts every worker succeeds AND the final manifest carries every source. A "race lost" outcome would manifest as a missing source entry. Retry exhaustion would surface as a worker returning the 412 in its `error` field. Both clean up their throwaway tags on teardown. The opt-in test serves as the trigger for the Option-B "fragments + final merge" escalation path: if it ever fails repeatedly in nightly CI, that's the data-driven trigger to redesign. Net: 7 → 9 E2E test classes.
6 tasks
This was referenced Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the bulk of #210 — the test-hardening response to #207's bake-side bug that all four mocked client suites missed. Three commits:
bake-side: 7 new tests in `test_hf_bake_per_file.py`. Asserts the substrate-contract path set (manifest + catalog + sentinel), commit bundling, sentinel-last ordering, and the previously-untested CAS retry loop (412 retry, exhaustion, non-412 not retried). Net: 5 → 12 tests.
derive-side: 5 new tests in `test_hf_derive_per_file.py` mirroring the bake hardening. Net: 13 → 18 tests.
E2E: 2 new classes in `tests/e2e/test_per_file_roundtrip.py`:
Why this matters
#207 (the bug that motivated #210) shipped through every mocked test suite because they only asserted the happy path. JS/shell/Rust hard-failed against a fresh bake; Python's tree-fallback masked the gap. The CAS retry loop added in #208 was completely untested — a regression dropping `parent_commit` from the commit kwargs would have silently re-introduced the multi-source race.
What's still open (deferred to follow-up)
Test plan
Architectural decision (chat thread)
User asked about Option B (per-shard fragments + final merge job) vs current Option A (CAS retry). Decided to stay with A for v0.6.0 and use #210's opt-in concurrency test as the data-driven trigger for B if A ever fails under load. Documented in the issue comments.