feat: ADR-0011 commit + Dagger safety rail + round-trip test + vocab map (#178)#178
Merged
feat: ADR-0011 commit + Dagger safety rail + round-trip test + vocab map (#178)#178
Conversation
Captures the two-layer index-record design accepted in /spike 152 after four-lens adversarial review: data-modeling, consumer-DX, ops-burden, storage/cost. Decision (clean break at v0.6.0, no back-compat aliases): - Layer 1 `mat_vis`: curated, unified, semver-stable. Only surface exposed through client search() / index() / filter(). Missing fields are null, never absent. - Layer 2 `upstream.raw`: verbatim upstream payload + per-source license. Escape hatch for bespoke consumers; schema drift on this side is acceptable. Partly reshapes ADR-0001's record surface; leaves ADR-0007/8/9/10 (substrate + pipeline) unchanged. The code behind this ADR already landed via: - PR #166 (Phase A — mat_vis block + v3 index schema) - PR #170 (Phases B+C — populate mat_vis curated fields + upstream.raw mirror + CI schema-diff gate) This commit closes the loop by committing the decision doc itself so future readers can see the locked design alongside the code. Refs issue #152, v0.6.0 milestone.
Every Dagger write fn (bake, derive, derive-ktx2, merge-shards) now: - Defaults `--repo-id` to `gerchowl/mat-vis-tst` (the public scratch dataset). Smoke-tests, feature-branch trials, and new-contributor dispatches all land there by default — never on the canonical `gerchowl/mat-vis`. - Gains `--allow-prod` (default false). Calls targeting any non-*-tst repo without this flag raise a clear ValueError before any HF request is made. Added `_guard_prod_target(repo_id, allow_prod)` as the single check point — matches the pattern of the existing shard/unshard guards. Error message explicitly names the default and the escape hatch so operators who actually want prod don't have to spelunk. Rationale: the whole point of Dagger parity was "same module runs everywhere" — which means production writes are now one flag away from any laptop, anvil-dev shell, or CI run. The switch from "default prod / opt into scratch" to "default scratch / opt into prod" flips the blast radius of a mistake from "overwrote the release catalog" to "wrote to the same scratch we already test on". Refs ADR-0010, ADR-0011.
New docs/development/running-on-anvil-dev.md walks through the "daily-driver baker" path: - One-time bootstrap (install Dagger to ~/.local/bin — avoids the multi-user nix daemon wedge on this NixOS VM; enable rootless podman.socket for Dagger to talk to) - Smoke bake against gerchowl/mat-vis-tst (default target — no --allow-prod needed) - Production bake against gerchowl/mat-vis (requires --allow-prod per the ADR-0010 safety rail) - Full-matrix loop with tailnet-OTLP monitoring The whole point of Dagger parity is that this same `dagger call bake …` works identically on a laptop, anvil-dev, and GH Actions. Anvil-dev is just the daily driver because (a) no 6 h runner cap, (b) no 20-slot org concurrency budget, (c) same tailnet as the OTel collector. GH workflow stays in the repo as the reproducible / public-contributor path. Refs ADR-0010, ADR-0011.
Validates the two-layer index record contract survives the full
baker → atomic HF commit → catalog JSON path. Asserts:
- Every record carries a mat_vis block with the full ADR-0011 key
set (name, category, tags, description, physical, pbr, attribution,
dates, upstream_id).
- Curated fields are populated (name, SPDX license, upstream_id).
- Category normalization does not collapse every record to 'other'.
- Key set is identical across sources (ADR-0011 §"missing values are
null, never absent").
- When present, the upstream block has {source, schema_version, raw}.
Reads catalogs from gerchowl/mat-vis-tst@v0.0.2-smoke-adr0011 via
raw HTTP — the current MatVisClient hardcodes gerchowl/mat-vis, so
client-level round-trip assertions wait on a separate repo= override
parameter (out of scope for this PR).
HF_INTEGRATION-gated. Default-skipped in the regular pytest run.
Refs ADR-0011, #152.
Captures the full category + top-100 tag vocabulary for all four upstream sources as observed on 2026-04-21: - docs/sources/metadata-vocabulary.md — counts, top-15 categories per source, which tokens intentionally fall through to 'other'. - docs/sources/metadata-vocabulary.json — machine-readable sidecar for diff'able regression detection on upstream drift. - scripts/probe-metadata-vocab.py — re-probes + rewrites the JSON. Source counts: ambientcg 1993 (95 categories, displayCategory) polyhaven 754 (54 categories, list) gpuopen 454 (22 categories, UUID → title) physicallybased 86 ( 7 categories) Purpose: this is the input for the normalize_category mapping work (mat-vis#150, #151, future tickets). Every PR that expands the keyword map should re-run the probe so drift between "what upstream serves" and "what the baker knows about" is one git diff away. README gets a new "Upstream metadata vocabulary" section. Refs ADR-0011, #152, #178, #150, #151.
The earlier v0.0.2 slice baked 3 materials per source. For gpuopen the first 3 all turned out to be Wallpapers, which intentionally fall through normalize_category to 'other' (see ADR-0011 doc + the existing comment block above normalize_category). That tripped test_category_not_universally_other on a sampling artifact, not a normalizer bug. v0.0.3 bakes 30 materials per source on anvil-dev via Dagger, giving the normalizer enough variety to show real categories per source. All 5 round-trip tests now pass cleanly against live HF data. No code change beyond the TAG constant + a longer rationale comment. Scripts to reproduce sit in docs/development/running-on-anvil-dev.md.
Four polish items from the /land review round, all non-blockers:
1. _guard_prod_target: namespace-scoped match (.dagger/.../main.py).
Previous `endswith("-tst")` admitted "anyone/whatever-tst".
Now checks the repo name (after the last /) matches mat-vis-tst
exactly, or mat-vis-<suffix>-tst. Error message updated to name
the actual convention.
2. Round-trip test fixture: pytest.skip when the scratch tag 404s or
the tree is empty (tests/test_mat_vis_block_roundtrip.py).
Contributors flipping HF_INTEGRATION=1 without first running a
smoke bake now get a clear "re-bake the smoke slice" message
pointing at the anvil-dev doc, instead of a bare AssertionError.
3. Vocab doc command: the probe script writes the JSON itself;
don't redirect stdout (would truncate to empty)
(docs/sources/metadata-vocabulary.md).
4. anvil-dev doc: flake-provided `nix develop` is now the primary
dagger install path. curl|sh falls back when the multi-user nix
daemon is wedged, with the diagnostic error explicitly named so
operators know which branch to take
(docs/development/running-on-anvil-dev.md).
Tests unchanged in number (310 + 5) — the skip path only triggers
when the fixture data is missing, which isn't the case now.
Dev hosts like 'anvil'/'anvil-dev' are private infra no contributor has access to. Renaming / generalizing the committed docs so references point at ``<your-remote-host>`` placeholders instead. - docs/development/running-on-anvil-dev.md → running-bakes-on-a- remote-host.md, with the body rewritten to describe "any Linux host with podman + a shell account", nix-develop as the primary install path, curl|sh as the fallback. - docs/observability/README.md + docker-compose.yml: replace the three remaining ``anvil`` references with generic placeholders. No functional change. The anvil-specific runbook lived in a single earlier commit on this branch (not merged to dev yet); this commit cleans it up before the PR squash lands anything public.
See PR #178 body for full rationale. 320 tests pass (+10 for new cases).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
/land on "a full working bake process" — the locked design per ADRs 0007–0011.
What
8 commits, each independently reviewable:
--allow-prodsafety rail — all 4 write fns (bake/derive/derive-ktx2/merge-shards) default togerchowl/mat-vis-tst; any non-`*-tst` target needs explicit opt-in. Prevents feature-branch runs from landing in the public catalog.How it was verified
Review round outcome (/land step 6)
Fresh-agent review: no blockers, 4 polish items — all folded into commit 7.
Follow-up candidates worth filing separately (not pulled into scope)
Test plan
Closes #152 (via ADR doc commit). Refs ADRs 0007/0008/0009/0010/0011.