Skip to content

test: harden mock contract + multi-source E2E + opt-in concurrency (#210)#211

Merged
gerchowl merged 3 commits intodevfrom
test/210-hardening
Apr 28, 2026
Merged

test: harden mock contract + multi-source E2E + opt-in concurrency (#210)#211
gerchowl merged 3 commits intodevfrom
test/210-hardening

Conversation

@gerchowl
Copy link
Copy Markdown
Contributor

Summary

Lands the bulk of #210 — the test-hardening response to #207's bake-side bug that all four mocked client suites missed. Three commits:

  1. bake-side: 7 new tests in `test_hf_bake_per_file.py`. Asserts the substrate-contract path set (manifest + catalog + sentinel), commit bundling, sentinel-last ordering, and the previously-untested CAS retry loop (412 retry, exhaustion, non-412 not retried). Net: 5 → 12 tests.

  2. derive-side: 5 new tests in `test_hf_derive_per_file.py` mirroring the bake hardening. Net: 13 → 18 tests.

  3. E2E: 2 new classes in `tests/e2e/test_per_file_roundtrip.py`:

    • `TestMultiSourceManifestMerge` (always-on under `MAT_VIS_E2E=1`) — sequential polyhaven + ambientcg bakes into one tag, asserts manifest accumulates both sources. Catches future regressions to the merge logic during nightly E2E.
    • `TestConcurrentBakesShareTag` (opt-in `MAT_VIS_E2E_CONCURRENCY=N`) — N `multiprocessing.Process` workers (default N=3) baking different sources at tier 1k into one release tag. The CAS retry is the only thing keeping their manifest writes from clobbering. Threading is explicitly NOT used (GIL serializes live HTTP, never fires the retry path).

Why this matters

#207 (the bug that motivated #210) shipped through every mocked test suite because they only asserted the happy path. JS/shell/Rust hard-failed against a fresh bake; Python's tree-fallback masked the gap. The CAS retry loop added in #208 was completely untested — a regression dropping `parent_commit` from the commit kwargs would have silently re-introduced the multi-source race.

What's still open (deferred to follow-up)

  • Workflow matrix expansion in `e2e.yml`: fan out JS/shell/Rust client jobs after the Python bake. Requires orchestration redesign (the Python E2E auto-cleans its tag via the fixture, so a multi-job workflow needs a non-cleaning bake step + a separate teardown job). Filing as a small follow-up — it's pure YAML/dagger plumbing, not test logic.

Test plan

  • `just test-fast` — 339 + 211 = 550 pass, 16+22 skipped
  • New tests use `expect_raises` helper to verify exception propagation cleanly
  • CAS retry tests assert exact retry budget (6 attempts) — regression that bumps the budget will fail visibly
  • CI: matrix tests + integration test
  • Optional manual: `MAT_VIS_E2E=1 MAT_VIS_E2E_CONCURRENCY=3 uv run pytest tests/e2e/test_per_file_roundtrip.py::TestConcurrentBakesShareTag -v` against a personal HF token

Architectural decision (chat thread)

User asked about Option B (per-shard fragments + final merge job) vs current Option A (CAS retry). Decided to stay with A for v0.6.0 and use #210's opt-in concurrency test as the data-driven trigger for B if A ever fails under load. Documented in the issue comments.

#207 was a real contract bug — the per-file bake never wrote
release-manifest.json — that mocks didn't catch because they only
asserted the happy path (textures + catalog + sentinel). The
JS/shell/Rust clients hard-failed against a fresh bake; only live
anvil-dev runs surfaced it.

Tightening the unit-test contract so a future regression can't ship.

Adds two test classes (7 new tests):

TestManifestEmission (substrate-contract path set):
- commit history must include {release-manifest.json, <src>.json,
  <src>/<tier>/.tier_complete}
- manifest commit bundled atomically with catalog (single commit)
- manifest commit carries `parent_commit` for the CAS lock
- sentinel is strictly the LAST commit

TestCasRetryOnManifestCommit (the previously-untested retry loop):
- one 412 → fresh manifest re-fetch → retry → success
- continuous 412 → exactly 6 attempts before propagating
- non-412 (401 auth error) → propagates immediately, no retry

Net: 5 → 12 tests on the bake path.
…t A)

Mirror of the bake-side hardening. Locks down the derive path's
manifest emission contract and the previously-untested 412-retry
loop added in #209.

TestDeriveManifestEmission:
- commit history must include {release-manifest.json, <src>.json,
  <src>/<target_tier>/.tier_complete}
- manifest body's sources.<src>.tiers.<target_tier> = {complete: true}
  (catches a regression that drops the tier-update from the merge)

TestDeriveCasRetryOnManifestCommit:
- one 412 → retry → success (asserts 2 attempts + 2 manifest fetches)
- continuous 412 → exactly 6 attempts then propagate (locks the budget)
- 401 (non-412) → propagates immediately, no retry

Net: 13 → 18 tests on the derive path.
Two new E2E classes that catch contract slips mocks can't:

TestMultiSourceManifestMerge (always-on under MAT_VIS_E2E=1):
- Bakes polyhaven THEN ambientcg sequentially into the same release
  tag, asserts the manifest accumulates both sources. A regression
  that clobbers the second bake's manifest write would only show one
  source here. Lives nightly via e2e.yml from #205.

TestConcurrentBakesShareTag (opt-in MAT_VIS_E2E_CONCURRENCY=N):
- Spawns N=3 multiprocessing.Process workers (default) baking
  polyhaven/ambientcg/gpuopen at tier 1k into ONE release tag. The
  CAS retry loop is the only thing keeping their manifest writes
  from clobbering each other.
- Threading.Thread is NOT used — Python's GIL serializes live HTTP
  and never fires the retry path; verified during the #207 fix.
- Asserts every worker succeeds AND the final manifest carries
  every source. A "race lost" outcome would manifest as a missing
  source entry. Retry exhaustion would surface as a worker returning
  the 412 in its `error` field.

Both clean up their throwaway tags on teardown. The opt-in test
serves as the trigger for the Option-B "fragments + final merge"
escalation path: if it ever fails repeatedly in nightly CI, that's
the data-driven trigger to redesign.

Net: 7 → 9 E2E test classes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:testing Test infrastructure, BATS, pytest

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant