test(recipe): enforce per-intent validation phase floor on overlay leaves#989
Conversation
Walk each leaf overlay's spec.base chain, resolve the merged ValidationConfig per pkg/recipe/metadata.go Merge semantics, classify by intent/service/platform, and assert the resolved validation contains the required phases. Per-intent floor: - Training (non-Kind) : deployment + conformance - Inference Dynamo/NIM (non-Kind) : deployment + conformance - Inference (plain) : deployment + conformance - Kind (any intent) : deployment + conformance - Performance is warn-only globally until #969 closes the data gap and Azure/OCI testbeds exist; AICR_VALIDATION_FLOOR_STRICT=1 promotes it to required. A knownGaps allowlist tracks the 7 overlays that fail today so the contract can land independently of #969. New overlays added without the floor will fail because they are not allowlisted. Criteria-wildcard overlays (intent: any or service: any) are excluded — they are cross-cutting fragments, not user-selectable leaves. Refs #970 Refs #969
This comment was marked as resolved.
This comment was marked as resolved.
Coverage Report ✅
Coverage BadgeMerging this branch will decrease overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
Address review feedback on #989: - Track which knownGaps entries actually downgrade a failure during the run. Fail the test if any entry never fires — that means the overlay has been fixed but the allowlist entry was not removed, which would silently mask future regressions. - Hard-fail when discoverLeafOverlays returns zero leaves. Without this guard, a regression in discovery (e.g., refactor breaks leaf detection) would let the gate pass trivially. Verified by injecting a bogus knownGaps entry — test fails with the expected 'stale knownGaps entries' message and lists the entry to remove. Refs #970
yuanchen8911
left a comment
There was a problem hiding this comment.
Cross-review by Claude Code + Codex + CodeRabbit. Three reviewers plus an integration-analysis pass converged on the same set of findings; cross-review verified each one against the PR head independently. Inline comments below cover the test-file findings. The headline issue is a production bug exposed by this test — since pkg/recipe/metadata.go is outside the diff, I'm raising it here.
1. Production bug: RecipeMetadataSpec.Merge aliases other.Validation and writes phase pointers through that alias
pkg/recipe/metadata.go#L460-L477
When s.Validation == nil, the function assigns s.Validation = other.Validation (pointer alias), then on later iterations the per-phase writes (s.Validation.Deployment = other.Validation.Deployment, etc.) go through that alias. Because the metadata store is cached and reused across recipe builds (metadata_store.go:65, BuildRecipeResult → mergeOverlayChains at metadata_store.go:674), resolving one recipe mutates an ancestor overlay for every later resolution.
Reproduction: building h100-gke-cos-inference-dynamo mutates cached gke-cos validation from conformance to deployment, performance, conformance. That pollution masks the missing deployment phase on h100-gke-cos-training-kubeflow, which is not in knownGaps. Running just go test -run 'TestOverlayValidationPhaseFloor/h100-gke-cos-training-kubeflow' shows the unpolluted result: performance, conformance — failing the floor.
Suggested fix: in Merge, deep-copy other.Validation (or copy each non-nil phase pointer into a newly allocated ValidationConfig) instead of aliasing. Fixing this in production removes the test order dependence as a side effect.
Severity ordering
- Blocker: production
Mergealiasing bug (above) - Should-fix in this PR: chain-only resolver (L128), global leaf discovery (L148), per-overlay
knownGapsgranularity (L222) - Latent: wildcard filter narrowness (L164)
yuanchen8911
left a comment
There was a problem hiding this comment.
There are some issues. PTAL
…llution
RecipeMetadataSpec.Merge aliased other.Validation when the destination
was nil, then later wrote phase pointers through that alias. Across
successive calls on the cached MetadataStore, this mutated whichever
overlay's cached ValidationConfig the alias happened to point at — so
`BuildRecipeResult` was order-dependent, and an overlay that genuinely
lacked a phase could appear compliant after a prior query planted that
phase via pollution.
Fix:
- Add cloneValidationConfig + cloneValidationPhase helpers that deep-
copy phases and their backing slices / NodeSelection.
- Clone in Merge whenever a phase pointer is assigned (both the
nil-destination branch and the per-phase overwrite branch).
- Clone in initBaseMergedSpec so the merged spec never aliases
s.Base.Spec.Validation.
Regression test in metadata_test.go: two sequential merges into a fresh
destination must not mutate the first source. Without the fix the test
detects Deployment leaking from the second merge into the first source's
ValidationConfig.
Refs #970
Address review feedback on PR #989: - Use store.BuildRecipeResult (the production code path) instead of an ad-hoc base-chain walk. The previous helper missed wildcard overlay contributions and mixins, and its leaf detection excluded concrete intermediate overlays (h100-gke-cos-training, eks-training, etc.) that production returns as maximal-leaf candidates for queries that do not narrow further. - Gateable set expanded from 17 to 56 overlays. Per-overlay queries exercise the same resolver path the CLI/API use. Wildcard-fragment overlays (criteria.intent or criteria.service set to 'any') remain excluded because their criteria are not meaningful user queries. - knownGaps is now map[string]map[string]bool keyed by (overlay, phase). An overlay allowlisted for one missing phase no longer silently suppresses a regression in a different phase. - Allowlist repopulated under production semantics: 39 (overlay, phase) pairs missing deployment. The companion fix in 5d4f4cad eliminates cross-subtest pollution, so the failure set is stable across full and isolated runs. Refs #970 Refs #969
yuanchen8911
left a comment
There was a problem hiding this comment.
All 6 review points are addressed with concrete code changes that go beyond the minimum.
Summary
Adds a Go test that walks each leaf overlay's
spec.basechain, resolves the mergedValidationConfigperpkg/recipe/metadata.goMerge semantics, and asserts the resolved validation contains the per-intent required phases. Closes the loophole that let leaf overlays drift to conformance-only without a CI gate.Motivation / Context
Today there is no test that asserts a leaf overlay's resolved validation meets a per-intent contract — so a new overlay can inherit only the service-root
conformance:block and ship with no GPU operator-health check, no NCCL gate, and noinference-perfgate. 27 of 41 GPU overlays already drifted this way.This is the static gate from #970. The runtime gate (signed evidence bundles from #751) only proves the declared validators ran — it can't catch a missing declaration. #970 forces the declaration to be complete; #751 forces it to actually run.
Fixes: #970
Related: #969, #751
Type of Change
Component(s) Affected
pkg/recipe)Implementation Notes
Per-intent floor enforced today:
inference-perf)Strict mode:
AICR_VALIDATION_FLOOR_STRICT=1globally promotes the recommended performance phase from warn-only to required. Default OFF until #969 closes the data gap and Azure/OCI performance testbeds land. When ready, flip this in CI to tighten enforcement.Known-gap allowlist: the test ships with a
knownGapsmap listing the 7 overlays that fail the floor today, so the contract can land independently of the data fix in #969. Entries downgradeErrorfto aLogfprefixed withKNOWN GAP (#969):. As #969 lands per-overlay fixes, drain this map. New overlays added without the floor will fail because they are not allowlisted.Wildcard fragment exclusion: overlays with
criteria.intent == anyorcriteria.service == any(b200-any-training,gb200-any-training,monitoring-hpa) are cross-cutting fragments — not user-selectable leaves — and are excluded from the gate. Seedocs/contributor/data.md#criteria-wildcard-overlays.No new CI wiring needed: runs under
make test, whichmake qualifyalready invokes; CI's existingpkg/recipe/...test job picks it up on every PR.Deferred (per scope):
validation: skip-deploymentescape hatch — wait for first legitimate casemetadata.annotationsopt-out — would require modifyingRecipeMetadataHeader;knownGapsserves the same purpose for nowaicr recipe lintCLI surface — not requestedTesting
Coverage:
pkg/recipe: 89.8% → 89.8% (test-only addition).Risk Assessment
knownGapsallowlist absorbs the existing 7 failures so CI stays green; the gate triggers only on new overlays that ship below the floor.Rollout notes: N/A. To enable strict performance enforcement after #969 closes and testbeds exist, set
AICR_VALIDATION_FLOOR_STRICT=1in the CI test step.Checklist
make testwith-race)make lint)conformance_test.goand the leaf-discovery pattern atmetadata_store_test.go:512)git commit -S)