Skip to content

feat: ADR-0011 commit + Dagger safety rail + round-trip test + vocab map (#178)#178

Merged
gerchowl merged 9 commits intodevfrom
feature/178-bake-land
Apr 21, 2026
Merged

feat: ADR-0011 commit + Dagger safety rail + round-trip test + vocab map (#178)#178
gerchowl merged 9 commits intodevfrom
feature/178-bake-land

Conversation

@gerchowl
Copy link
Copy Markdown
Contributor

/land on "a full working bake process" — the locked design per ADRs 0007–0011.

What

8 commits, each independently reviewable:

  1. ADR-0011 doc committed from issue ADR-0011: mirror upstream metadata verbatim alongside normalized fields (hybrid index) #152's accepted-design body. Code already landed via feat: Phase A — mat_vis block + v3 index schema (#152) #166 + feat: Phase C — upstream.raw + client accessor + CI schema-diff gate (#152) #170.
  2. Dagger --allow-prod safety rail — all 4 write fns (bake/derive/derive-ktx2/merge-shards) default to gerchowl/mat-vis-tst; any non-`*-tst` target needs explicit opt-in. Prevents feature-branch runs from landing in the public catalog.
  3. Remote-host run guide at `docs/development/running-bakes-on-a-remote-host.md` — nix-develop primary, curl fallback, placeholder `` throughout. No private infra leaked.
  4. Round-trip integration test — 5 assertions on baker → HF → catalog shape, HF_INTEGRATION-gated. Skips gracefully when the fixture tag is missing.
  5. Upstream metadata vocabulary — `docs/sources/metadata-vocabulary.{md,json}` + `scripts/probe-metadata-vocab.py` captures every category + top-100 tag per source (ambientcg 1993/95, polyhaven 754/54, gpuopen 454/22, physicallybased 86/7).
  6. Test TAG bump to v0.0.3-smoke-adr0011 (30 materials/source, enough variety for the category-diversity assertion).
  7. Review-round polish — tighter `_guard_prod_target` match, test pytest.skip on missing fixture, vocab doc redirect fix, nix-first / curl-fallback order.
  8. Host-specific name scrub — generalized all remaining references to private dev infrastructure.

How it was verified

  • 310 / 310 existing unit tests pass locally.
  • 5 / 5 round-trip tests pass against live HF data — produced by running `dagger call bake` over podman on a dev host (proving ADR-0010 parity end-to-end):
    • `physicallybased scalar` — 86 materials
    • `polyhaven 1k` — 30 materials
    • `ambientcg 1k` — 30 materials
    • `gpuopen 1k` — 30 materials
    • All committed atomically to `gerchowl/mat-vis-tst@v0.0.3-smoke-adr0011`
  • Every catalog JSON on that tag now carries `mat_vis` + `upstream` blocks with all ADR-0011-required keys populated.

Review round outcome (/land step 6)

Fresh-agent review: no blockers, 4 polish items — all folded into commit 7.

Follow-up candidates worth filing separately (not pulled into scope)

  • Expand `normalize_category` to cover observed tokens still mapping to 'other' (e.g. `Planks` → wood, 59 ambientcg records affected).
  • Client `MatVisClient` currently hardcodes `gerchowl/mat-vis` — add `repo=` override so round-trip tests can use the client instead of raw HTTP.
  • Scheduled drift detector: CI cron that re-runs `probe-metadata-vocab.py` and opens an issue if the JSON diffs.

Test plan

  • `uv run pytest tests/ -q` — 310 passed locally (5 HF-gated skipped).
  • `HF_INTEGRATION=1 uv run pytest tests/test_mat_vis_block_roundtrip.py` — 5/5 pass against live HF.
  • End-to-end `dagger call bake` on an external Linux host via rootless podman succeeded for all 4 sources.
  • Full v2026.04.1 matrix is unblocked: operators can now run `dagger call bake --context=. --source=... --tier=... --release-tag=v2026.04.1 --hf-token=... --repo-id=gerchowl/mat-vis --allow-prod=true` (follow-up dispatch).

Closes #152 (via ADR doc commit). Refs ADRs 0007/0008/0009/0010/0011.

Captures the two-layer index-record design accepted in /spike 152
after four-lens adversarial review: data-modeling, consumer-DX,
ops-burden, storage/cost.

Decision (clean break at v0.6.0, no back-compat aliases):

- Layer 1 `mat_vis`: curated, unified, semver-stable. Only surface
  exposed through client search() / index() / filter(). Missing
  fields are null, never absent.
- Layer 2 `upstream.raw`: verbatim upstream payload + per-source
  license. Escape hatch for bespoke consumers; schema drift on this
  side is acceptable.

Partly reshapes ADR-0001's record surface; leaves ADR-0007/8/9/10
(substrate + pipeline) unchanged.

The code behind this ADR already landed via:
- PR #166 (Phase A — mat_vis block + v3 index schema)
- PR #170 (Phases B+C — populate mat_vis curated fields +
  upstream.raw mirror + CI schema-diff gate)

This commit closes the loop by committing the decision doc itself
so future readers can see the locked design alongside the code.

Refs issue #152, v0.6.0 milestone.
Every Dagger write fn (bake, derive, derive-ktx2, merge-shards) now:

- Defaults `--repo-id` to `gerchowl/mat-vis-tst` (the public scratch
  dataset). Smoke-tests, feature-branch trials, and new-contributor
  dispatches all land there by default — never on the canonical
  `gerchowl/mat-vis`.
- Gains `--allow-prod` (default false). Calls targeting any non-*-tst
  repo without this flag raise a clear ValueError before any HF
  request is made.

Added `_guard_prod_target(repo_id, allow_prod)` as the single check
point — matches the pattern of the existing shard/unshard guards.
Error message explicitly names the default and the escape hatch so
operators who actually want prod don't have to spelunk.

Rationale: the whole point of Dagger parity was "same module
runs everywhere" — which means production writes are now one flag
away from any laptop, anvil-dev shell, or CI run. The switch from
"default prod / opt into scratch" to "default scratch / opt into
prod" flips the blast radius of a mistake from "overwrote the
release catalog" to "wrote to the same scratch we already test on".

Refs ADR-0010, ADR-0011.
New docs/development/running-on-anvil-dev.md walks through the
"daily-driver baker" path:

- One-time bootstrap (install Dagger to ~/.local/bin — avoids the
  multi-user nix daemon wedge on this NixOS VM; enable rootless
  podman.socket for Dagger to talk to)
- Smoke bake against gerchowl/mat-vis-tst (default target — no
  --allow-prod needed)
- Production bake against gerchowl/mat-vis (requires --allow-prod
  per the ADR-0010 safety rail)
- Full-matrix loop with tailnet-OTLP monitoring

The whole point of Dagger parity is that this same `dagger call bake …`
works identically on a laptop, anvil-dev, and GH Actions. Anvil-dev
is just the daily driver because (a) no 6 h runner cap, (b) no
20-slot org concurrency budget, (c) same tailnet as the OTel
collector. GH workflow stays in the repo as the reproducible /
public-contributor path.

Refs ADR-0010, ADR-0011.
Validates the two-layer index record contract survives the full
baker → atomic HF commit → catalog JSON path. Asserts:

- Every record carries a mat_vis block with the full ADR-0011 key
  set (name, category, tags, description, physical, pbr, attribution,
  dates, upstream_id).
- Curated fields are populated (name, SPDX license, upstream_id).
- Category normalization does not collapse every record to 'other'.
- Key set is identical across sources (ADR-0011 §"missing values are
  null, never absent").
- When present, the upstream block has {source, schema_version, raw}.

Reads catalogs from gerchowl/mat-vis-tst@v0.0.2-smoke-adr0011 via
raw HTTP — the current MatVisClient hardcodes gerchowl/mat-vis, so
client-level round-trip assertions wait on a separate repo= override
parameter (out of scope for this PR).

HF_INTEGRATION-gated. Default-skipped in the regular pytest run.

Refs ADR-0011, #152.
Captures the full category + top-100 tag vocabulary for all four
upstream sources as observed on 2026-04-21:

- docs/sources/metadata-vocabulary.md — counts, top-15 categories
  per source, which tokens intentionally fall through to 'other'.
- docs/sources/metadata-vocabulary.json — machine-readable sidecar
  for diff'able regression detection on upstream drift.
- scripts/probe-metadata-vocab.py — re-probes + rewrites the JSON.

Source counts:
  ambientcg       1993  (95 categories, displayCategory)
  polyhaven        754  (54 categories, list)
  gpuopen          454  (22 categories, UUID → title)
  physicallybased   86  ( 7 categories)

Purpose: this is the input for the normalize_category mapping work
(mat-vis#150, #151, future tickets). Every PR that expands the
keyword map should re-run the probe so drift between "what upstream
serves" and "what the baker knows about" is one git diff away.

README gets a new "Upstream metadata vocabulary" section.

Refs ADR-0011, #152, #178, #150, #151.
The earlier v0.0.2 slice baked 3 materials per source. For gpuopen
the first 3 all turned out to be Wallpapers, which intentionally
fall through normalize_category to 'other' (see ADR-0011 doc + the
existing comment block above normalize_category). That tripped
test_category_not_universally_other on a sampling artifact, not a
normalizer bug.

v0.0.3 bakes 30 materials per source on anvil-dev via Dagger, giving
the normalizer enough variety to show real categories per source.
All 5 round-trip tests now pass cleanly against live HF data.

No code change beyond the TAG constant + a longer rationale comment.
Scripts to reproduce sit in docs/development/running-on-anvil-dev.md.
Four polish items from the /land review round, all non-blockers:

1. _guard_prod_target: namespace-scoped match (.dagger/.../main.py).
   Previous `endswith("-tst")` admitted "anyone/whatever-tst".
   Now checks the repo name (after the last /) matches mat-vis-tst
   exactly, or mat-vis-<suffix>-tst. Error message updated to name
   the actual convention.

2. Round-trip test fixture: pytest.skip when the scratch tag 404s or
   the tree is empty (tests/test_mat_vis_block_roundtrip.py).
   Contributors flipping HF_INTEGRATION=1 without first running a
   smoke bake now get a clear "re-bake the smoke slice" message
   pointing at the anvil-dev doc, instead of a bare AssertionError.

3. Vocab doc command: the probe script writes the JSON itself;
   don't redirect stdout (would truncate to empty)
   (docs/sources/metadata-vocabulary.md).

4. anvil-dev doc: flake-provided `nix develop` is now the primary
   dagger install path. curl|sh falls back when the multi-user nix
   daemon is wedged, with the diagnostic error explicitly named so
   operators know which branch to take
   (docs/development/running-on-anvil-dev.md).

Tests unchanged in number (310 + 5) — the skip path only triggers
when the fixture data is missing, which isn't the case now.
Dev hosts like 'anvil'/'anvil-dev' are private infra no contributor
has access to. Renaming / generalizing the committed docs so
references point at ``<your-remote-host>`` placeholders instead.

- docs/development/running-on-anvil-dev.md → running-bakes-on-a-
  remote-host.md, with the body rewritten to describe "any Linux
  host with podman + a shell account", nix-develop as the primary
  install path, curl|sh as the fallback.
- docs/observability/README.md + docker-compose.yml: replace the
  three remaining ``anvil`` references with generic placeholders.

No functional change. The anvil-specific runbook lived in a single
earlier commit on this branch (not merged to dev yet); this commit
cleans it up before the PR squash lands anything public.
@github-actions github-actions Bot added area:docs Documentation, README, guides area:testing Test infrastructure, BATS, pytest area:workspace Workspace tooling, justfile, templates area:baker Baker pipeline, Dagger, data fetchers labels Apr 21, 2026
See PR #178 body for full rationale. 320 tests pass (+10 for new cases).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:baker Baker pipeline, Dagger, data fetchers area:docs Documentation, README, guides area:testing Test infrastructure, BATS, pytest area:workspace Workspace tooling, justfile, templates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ADR-0011: mirror upstream metadata verbatim alongside normalized fields (hybrid index)

1 participant