Skip to content

fix(recipe): reject ComponentRefs with incoherent type/field combinations#1585

Merged
yuanchen8911 merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:fix/1584-componentref-coherence
Jul 2, 2026
Merged

fix(recipe): reject ComponentRefs with incoherent type/field combinations#1585
yuanchen8911 merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:fix/1584-componentref-coherence

Conversation

@yuanchen8911

@yuanchen8911 yuanchen8911 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Reject ComponentRefs whose deployment-shape fields are incoherent (e.g. Type: Helm that also carries a Kustomize tag/path) at every boundary that produces a RecipeResult — criteria resolution, file load, and external adoption.

Motivation / Context

Recipe resolution merges each ComponentRef field (Type, Version, Tag, Path, Source, Chart) independently and applies registry defaults without checking the result is coherent. The deployers do not trust the declared Type — Helm/Helmfile/ArgoCD drop it, and localformat.classify treats any ref carrying a tag/path as Kustomize — so an incoherent ref silently deploys as a different type than authored, and the signed attestation BOM (BuildAutoBOM, which reads the declared Type) records metadata that does not match what deploys. The primary exposure is externally-authored hydrated recipes (aicr bundle/validate -r recipe.yaml, POST /v1/bundle), which never pass through the resolver.

Surfaced during the cross-review of #1580.

Fixes: #1584
Related: #1580

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Component(s) Affected

  • Recipe engine / data (pkg/recipe)
  • pkg/client/v1 (adopt boundary)
  • Bundlers (pkg/bundler) — public DefaultBundler.Make boundary
  • API contract (api/aicr/v1/server.yaml)
  • Docs (docs/user/cli-reference.md)

Implementation Notes

  • New RecipeResult.ValidateCoherence() (pkg/recipe/metadata.go) is invoked at all four boundaries that produce or consume a RecipeResult, so no path is a bypass:
    • finalizeRecipeResult (criteria resolution), after applyRegistryDefaults populates Type;
    • LoadFromFileWithProvider (pkg/recipe/loader.go) — a hydrated recipe.yaml read from disk (covers aicr bundle/validate -r); and
    • the client adoptRecipe path (pkg/client/v1/bundle.go) — a decoded RecipeResult from POST /v1/bundle.
  • ComponentRef.coherenceProblem() encodes the rules, mirroring pkg/bundler/deployer/localformat:
    • a Helm ref must not carry tag/path;
    • a Kustomize ref needs a path;
    • a Kustomize tag needs a source;
    • a Kustomize ref must not also declare post-manifests (manifestFiles) — preManifestFiles remain supported;
    • an unsupported Type fails closed.
  • Type is matched case-insensitively and canonicalized. A lowercase helm/kustomize (the REST wire form / hand-authored recipes) is accepted and normalized to the exact ComponentType constant at each boundary before registry defaulting and before the result reaches the deployers — so it is not newly rejected AND does not deploy inconsistently (Flux rejects a lowercase helm; ArgoCD-Helm mis-handles a lowercase kustomize; the registry-defaults switch is exact-match).
  • Only enabled refs are checked (disabled stubs are excluded from the bundle and skipped); offenders are aggregated into one ErrCodeInvalidRequest; the receiver is nil-safe.
  • OpenAPI contract (api/aicr/v1/server.yaml): the componentRefs schema previously omitted type/source/chart/tag/path — added them (with the type enum [Helm, Kustomize] and a coherence-rule note), require only name (type is back-filled from the registry for known components, so requiring it in the schema would reject payloads the server accepts), and dropped the non-existent repository and phantom order fields (the decoded Go field is source; per-ref order is ignored — ordering is the top-level deploymentOrder). Fixed all example componentRefs to use source/chart and a canonical type. Also added type to the hydrated-recipe example in docs/user/cli-reference.md.

An earlier revision validated only at finalizeRecipeResult; cross-review correctly flagged that the load/adopt boundaries — the actual external surface — bypassed it (and that the stricter type check would reject the documented lowercase wire form), so the check now runs at all four (resolve, load, adopt, and the public bundler) and is case-insensitive.

Testing

go test -race ./pkg/recipe/... ./pkg/client/...   # ok
golangci-lint run -c .golangci.yaml ./pkg/recipe/... ./pkg/client/...   # 0 issues
yamllint api/aicr/v1/server.yaml   # ok

No in-tree recipe trips the new check (zero Kustomize components in the registry today), confirmed by the full pkg/recipe suite and the bundler/client resolution consumers still resolving cleanly. New tests cover the per-ref predicate (each rejected + coherent combination, post/pre manifests, empty/unknown/lowercase type), the aggregating validator (enabled-only, nil-safe, error code, names all offenders), and a REST/adopt-boundary test (adoptRecipe rejects an incoherent ref and canonicalizes a coherent lowercase-typed one), an HTTP decode-to-bundle test (POST /v1/bundle with an incoherent ref returns 400), and a direct-bundler rejection test (DefaultBundler.Make rejects an incoherent ref without mutating the caller's recipe).

Risk Assessment

  • Medium — Adds a hard failure at four boundaries (resolve, load, adopt, and the public DefaultBundler.Make). No in-tree recipe is incoherent (confirmed by the full suite), so no shipping recipe is affected; case-insensitive typing means the documented lowercase wire form still works; a genuinely incoherent external recipe that previously resolved will now fail closed with a clear message naming the offending ref — which is the intent.

Rollout notes: Fail-closed by design; the error names each offender and field combination. Easy to revert (one predicate + one method + four call sites + the OpenAPI schema additions).

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed (OpenAPI componentRefs schema + example)
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@yuanchen8911 yuanchen8911 added the theme/recipes Recipe expansion, overlays, mixins, and component registry label Jul 1, 2026
@github-actions github-actions Bot added the size/M label Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Recipe evidence check

No leaf overlays affected by this PR.

This gate is warning-only and never blocks merge.

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This change adds coherence validation for ComponentRef deployment-shape fields. coherenceProblem() checks Type, Tag, Path, Source, and ManifestFiles combinations, and RecipeResult.ValidateCoherence() aggregates enabled refs into one ErrCodeInvalidRequest. The validation now runs during recipe finalization, file loading, and client adoption. The OpenAPI schema and examples, plus CLI docs, were updated to reflect the typed component-ref shape, and tests cover the helper and boundary behavior.

Estimated code review effort: 3 (Moderate) | ~20 minutes

Related issues: 1584
Suggested labels: area/recipes, size/M

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the main change: rejecting incoherent ComponentRef type/field combinations.
Linked Issues check ✅ Passed The changes match #1584 by validating enabled refs at all RecipeResult boundaries and covering the required field rules.
Out of Scope Changes check ✅ Passed The docs, schema, and tests all support the same coherence-validation change, with no unrelated additions visible.
Description check ✅ Passed The description accurately matches the code changes, boundary validation, schema updates, and added tests.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/recipe/componentref_coherence_test.go (1)

25-91: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Missing coverage for ComponentTypeDirect (or any third supported type).

All negative test cases target Helm/Kustomize field misuse plus an empty/lowercase-string type. If ComponentTypeDirect is a genuinely supported type (see the companion comment on pkg/recipe/metadata.go's coherenceProblem default case), a coherent Direct-typed ref should be asserted as wantBad: false here — its current absence means this test suite would not have caught the potential regression.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/recipe/componentref_coherence_test.go` around lines 25 - 91, The
coherence test for ComponentRef is missing coverage for the supported
ComponentTypeDirect case, so add a coherent Direct-typed entry to
TestComponentRefCoherenceProblem and assert wantBad is false. Update the table
in TestComponentRefCoherenceProblem to include a valid ComponentRef using
ComponentTypeDirect (or the third supported type referenced by coherenceProblem)
so the test verifies the default-case behavior stays accepted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@pkg/recipe/componentref_coherence_test.go`:
- Around line 25-91: The coherence test for ComponentRef is missing coverage for
the supported ComponentTypeDirect case, so add a coherent Direct-typed entry to
TestComponentRefCoherenceProblem and assert wantBad is false. Update the table
in TestComponentRefCoherenceProblem to include a valid ComponentRef using
ComponentTypeDirect (or the third supported type referenced by coherenceProblem)
so the test verifies the default-case behavior stays accepted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 3639459e-1bd4-45ad-80b6-e977fd365709

📥 Commits

Reviewing files that changed from the base of the PR and between c0ab56e and c843ed0.

📒 Files selected for processing (5)
  • pkg/client/v1/bundle.go
  • pkg/recipe/componentref_coherence_test.go
  • pkg/recipe/loader.go
  • pkg/recipe/metadata.go
  • pkg/recipe/metadata_store.go

@yuanchen8911 yuanchen8911 force-pushed the fix/1584-componentref-coherence branch from c843ed0 to fce65a5 Compare July 1, 2026 21:47

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/aicr/v1/server.yaml (1)

943-984: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

fullRecipe example wasn't updated alongside simpleRecipe; it would now fail the new coherence check if POSTed as-is.

The simpleRecipe example at line 951 was updated to add type: Helm, but the sibling fullRecipe example's componentRefs entry (lines 980-984) still has no type field. Since adoptRecipe now calls ValidateCoherence() on any RecipeResult submitted to POST /v1/bundle, and an enabled ref with an empty Type fails closed with ErrCodeInvalidRequest, this documented example would be rejected if used literally as a request body.

📝 Proposed fix: add `type` to the fullRecipe example
                   componentRefs:
                     - name: gpu-operator
                       version: v25.3.3
+                      type: Helm
                       order: 1
                       repository: https://helm.ngc.nvidia.com/nvidia
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/aicr/v1/server.yaml` around lines 943 - 984, The fullRecipe example in
the server.yaml response body is missing the same componentRef type update that
was added to simpleRecipe, so it will fail ValidateCoherence() when used as a
POST /v1/bundle request. Update the fullRecipe section’s componentRefs entry in
the recipe example to include the Helm type, matching the gpu-operator reference
already shown in simpleRecipe, so adoptRecipe and its RecipeResult validation
accept the documented payload.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/aicr/v1/server.yaml`:
- Around line 1464-1471: The schema for the deployment `type` field in
`server.yaml` is too strict because the `enum` only allows exact-case values
while the contract is case-insensitive. Update the `type` definition to stop
using a case-sensitive `enum` and instead document the accepted values in the
description, keeping it aligned with `coherenceProblem` in
`pkg/recipe/metadata.go` and the `simpleRecipe` examples/tests that accept
lowercase forms.

---

Outside diff comments:
In `@api/aicr/v1/server.yaml`:
- Around line 943-984: The fullRecipe example in the server.yaml response body
is missing the same componentRef type update that was added to simpleRecipe, so
it will fail ValidateCoherence() when used as a POST /v1/bundle request. Update
the fullRecipe section’s componentRefs entry in the recipe example to include
the Helm type, matching the gpu-operator reference already shown in
simpleRecipe, so adoptRecipe and its RecipeResult validation accept the
documented payload.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 1fcc9d05-89af-4745-a3b9-33763e38e6b8

📥 Commits

Reviewing files that changed from the base of the PR and between c843ed0 and fce65a5.

📒 Files selected for processing (7)
  • api/aicr/v1/server.yaml
  • pkg/client/v1/aicr_internal_test.go
  • pkg/client/v1/bundle.go
  • pkg/recipe/componentref_coherence_test.go
  • pkg/recipe/loader.go
  • pkg/recipe/metadata.go
  • pkg/recipe/metadata_store.go

Comment thread api/aicr/v1/server.yaml Outdated
@yuanchen8911 yuanchen8911 force-pushed the fix/1584-componentref-coherence branch from fce65a5 to fef6655 Compare July 1, 2026 22:04
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
api/aicr/v1/server.yaml (1)

1472-1480: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Case-sensitive enum still contradicts the case-insensitive type contract (unresolved from prior review).

The description now explicitly says type is "Accepted case-insensitively and normalized to the canonical form," matching coherenceProblem()'s strings.EqualFold checks and the simpleRecipe/fullRecipe examples. But the enum: [Helm, Kustomize] is still case-sensitive — a strict JSON Schema validator or generated client would reject a lowercase "helm"/"kustomize" payload that the server actually accepts.

📝 Proposed fix: drop the case-sensitive enum
               type:
                 type: string
-                enum: [Helm, Kustomize]
                 description: >-
                   Deployment type. Accepted case-insensitively and normalized to
                   the canonical form. A Helm ref must not carry `tag`/`path`; a
                   Kustomize ref requires `path` (and, when it sets `tag`, a
                   `source`). Incoherent or missing types on an enabled ref are
                   rejected with 400.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/aicr/v1/server.yaml` around lines 1472 - 1480, The deployment type schema
in the server spec still advertises a case-sensitive enum, which conflicts with
the case-insensitive behavior in coherenceProblem() and the documented contract.
Update the type field definition in server.yaml to remove the strict enum
constraint (or replace it with a validation approach that accepts lowercase
variants) so generated clients and schema validators do not reject values the
API accepts; keep the description aligned with the Helm/Kustomize normalization
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@api/aicr/v1/server.yaml`:
- Around line 1472-1480: The deployment type schema in the server spec still
advertises a case-sensitive enum, which conflicts with the case-insensitive
behavior in coherenceProblem() and the documented contract. Update the type
field definition in server.yaml to remove the strict enum constraint (or replace
it with a validation approach that accepts lowercase variants) so generated
clients and schema validators do not reject values the API accepts; keep the
description aligned with the Helm/Kustomize normalization behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: a095ad76-8406-4c82-bb19-1237b2750b7a

📥 Commits

Reviewing files that changed from the base of the PR and between fce65a5 and fef6655.

📒 Files selected for processing (9)
  • api/aicr/v1/server.yaml
  • docs/user/cli-reference.md
  • pkg/client/v1/aicr_internal_test.go
  • pkg/client/v1/bundle.go
  • pkg/recipe/componentref_coherence_test.go
  • pkg/recipe/loader.go
  • pkg/recipe/metadata.go
  • pkg/recipe/metadata_store.go
  • pkg/server/bundle_handler_test.go

@yuanchen8911 yuanchen8911 force-pushed the fix/1584-componentref-coherence branch from fef6655 to c008641 Compare July 1, 2026 22:11
@yuanchen8911 yuanchen8911 marked this pull request as ready for review July 1, 2026 22:15
@yuanchen8911 yuanchen8911 requested a review from a team as a code owner July 1, 2026 22:15
@yuanchen8911 yuanchen8911 requested a review from mchmarny July 1, 2026 22:15

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/recipe/metadata.go`:
- Around line 297-333: The canonicalize-then-validate sequence is duplicated
across finalizeRecipeResult, LoadFromFileWithProvider, and adoptRecipe, which
makes it easy for a future boundary to call only one step or call
ValidateCoherence before CanonicalizeComponentTypes. Add a small helper that
performs both operations in order on RecipeResult, and update those call sites
to use it so the pairing is enforced consistently and no future edit can bypass
the canonicalization step.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 553ce33b-8832-4cac-a717-0dea6f3873f7

📥 Commits

Reviewing files that changed from the base of the PR and between fef6655 and c008641.

📒 Files selected for processing (9)
  • api/aicr/v1/server.yaml
  • docs/user/cli-reference.md
  • pkg/client/v1/aicr_internal_test.go
  • pkg/client/v1/bundle.go
  • pkg/recipe/componentref_coherence_test.go
  • pkg/recipe/loader.go
  • pkg/recipe/metadata.go
  • pkg/recipe/metadata_store.go
  • pkg/server/bundle_handler_test.go

Comment thread pkg/recipe/metadata.go Outdated
@yuanchen8911 yuanchen8911 force-pushed the fix/1584-componentref-coherence branch from c008641 to de654ea Compare July 1, 2026 23:03
@yuanchen8911 yuanchen8911 enabled auto-merge (squash) July 1, 2026 23:04

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix — the core premise checks out: localformat.classify returns Kustomize whenever Tag/Path is set and ignores the declared Type, so an incoherent Helm+tag/path ref really would deploy as the wrong type and mismatch the attestation BOM. Enforcing at all three RecipeResult boundaries (resolve/load/adopt), case-insensitive canonicalization before defaulting, enabled-only + aggregated errors, and the OpenAPI/repositorysource cleanup are all well done. Test coverage is thorough (per-ref table + REST/adopt + HTTP decode).

One medium point worth confirming before merge (inline on loader.go): the load/adopt boundaries don't run registry defaulting, so this effectively makes type required for hand-authored -r/POST /v1/bundle recipes — a type-less ref that resolves fine via criteria is rejected here. Likely intended given the OpenAPI required: [name, type] change, but flagging so it's a conscious call vs. back-filling empty type from the registry.

Non-blocking. CI checks were still in_progress at review time — confirm green before merge.

Comment thread pkg/recipe/loader.go
// unchecked, and a lowercase type would deploy inconsistently. See #1584.
canonicalizeComponentTypes(rec.ComponentRefs)
if err := rec.ValidateCoherence(); err != nil {
return nil, err

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gate makes type effectively required on every enabled ref at load (same in adoptRecipe) — which is broader than rejecting incoherent refs. Unlike finalizeRecipeResult, this boundary never runs applyRegistryDefaults, so a hand-authored -r recipe.yaml with - name: gpu-operator (no type/tag/path) — today classified as UpstreamHelm by localformat.classify and bundled fine — now hits coherenceProblem's default case and is rejected as "unsupported type". The same refs resolved via criteria get Type: Helm back-filled from the registry (ApplyRegistryDefaults), so identical content is valid on one path and rejected on another.

If requiring type on hand-authored recipes is intended, this is fine — OpenAPI now marks it required and the docs example adds it. If not, consider back-filling empty type via the bound DataProvider's registry defaults here (and in adoptRecipe) before ValidateCoherence, so the gate only fails on genuine incoherence (Helm + tag/path) and preserves back-compat for type-less hand-authored recipes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right, and it was breaking E2E (a type-less gpu-operator ref rejected on the -r load path). Fixed in 80cd94e by back-filling an empty type from the registry at the load and adopt boundaries (new RecipeResult.BackfillComponentTypes, mirroring ApplyRegistryDefaults on the resolve path) before ValidateCoherence. So a type-less registry ref now resolves the same on all three paths, and the gate only fails on genuine incoherence (Helm + tag/path) or a type-less non-registry component. Kept type: required in the OpenAPI schema as the documented contract.

@yuanchen8911 yuanchen8911 force-pushed the fix/1584-componentref-coherence branch 3 times, most recently from e5af077 to e30b951 Compare July 2, 2026 00:57
@github-actions github-actions Bot added size/XL and removed size/L labels Jul 2, 2026
@yuanchen8911 yuanchen8911 force-pushed the fix/1584-componentref-coherence branch 3 times, most recently from 39ce101 to 40426e4 Compare July 2, 2026 01:32
@yuanchen8911 yuanchen8911 force-pushed the fix/1584-componentref-coherence branch from 40426e4 to 29fe635 Compare July 2, 2026 01:45
@yuanchen8911 yuanchen8911 requested a review from mchmarny July 2, 2026 01:57
…ions (NVIDIA#1584)

Recipe resolution merged each ComponentRef field independently and applied
registry defaults without checking the resulting deployment-shape fields
were coherent, so a recipe could carry e.g. Type: Helm that also sets a
Kustomize tag/path. The deployers do not trust the declared Type —
Helm/Helmfile/ArgoCD drop it and localformat classifies any ref with a
tag/path as Kustomize — so such a ref silently deploys as a different type
than authored, and the signed attestation BOM (which reads the declared
Type) records metadata that does not match what deploys.

Validate coherence at every boundary that produces a RecipeResult, via a
shared RecipeResult.ValidateCoherence():
  - finalizeRecipeResult (criteria resolution), after registry defaults;
  - LoadFromFileWithProvider (a hydrated recipe.yaml read from disk); and
  - the client adoptRecipe path (POST /v1/bundle decodes a RecipeResult).
The load and adopt boundaries are the primary external surface — a
hand-authored or decoded hydrated recipe never passes through the resolver.

Rules mirror the deployer requirements in pkg/bundler/deployer/localformat:
a Helm ref must not carry tag/path; a Kustomize ref needs a path; a tag is
only meaningful with a source; a Kustomize ref must not also declare
post-manifests (manifestFiles) — pre-manifests remain supported. An
unsupported Type fails closed. Only enabled refs are checked; disabled
stubs are skipped. Offenders are aggregated into one ErrCodeInvalidRequest.

Types are matched case-insensitively AND canonicalized to the exact
ComponentType constant at each boundary (canonicalizeComponentTypes), so a
lowercase "helm"/"kustomize" from the REST wire form or a hand-authored
recipe is normalized before registry defaulting (exact-type switch) and
before the result reaches the deployers (Flux rejects a lowercase "helm";
ArgoCD-Helm mis-handles a lowercase "kustomize") — no downstream
inconsistency.

Documents the componentRefs wire shape in the OpenAPI spec
(api/aicr/v1/server.yaml): adds type (enum Helm/Kustomize), source, chart,
tag, path; marks name/type required; drops the non-existent "repository"
field; fixes the examples to use source/chart and a canonical type. Updates
the hydrated-recipe example in docs/user/cli-reference.md to include type.

No in-tree recipe trips the check (zero Kustomize components in the registry
today); the full pkg/recipe suite and the bundler/client resolution
consumers all still resolve cleanly. Adds table-driven tests for the per-ref
predicate (each rejected + coherent combination, post/pre manifests,
empty/unknown/lowercase type), the aggregating validator, a REST/adopt-
boundary test (adoptRecipe rejects an incoherent ref and canonicalizes a
lowercase-typed one), and an HTTP decode-to-bundle test (POST /v1/bundle
with an incoherent ref returns 400).

Back-fill missing types from the registry at the load and adopt boundaries
(RecipeResult.BackfillComponentTypes) before validating. Those boundaries do
not run ApplyRegistryDefaults, so a type-less ref for a registry component —
valid before this change, since the deployers derive the type from the ref's
fields — was wrongly rejected as "unsupported type" while the same ref
resolved via criteria got its type back-filled. Registry components now
resolve consistently on all paths; only genuinely incoherent refs (and
type-less non-registry components) fail closed.

Boundary hardening (cross-review):
- adoptRecipe now registers as an in-flight cache user (inflight.Add under
  the read lock, defer Done) because PrepareAndValidate reads the component
  registry; without it a concurrent Close could evict the cache and adoption
  would repopulate it and return a result owned by a closed Client.
- BackfillComponentTypes pre-scans for missing types (skipping the registry
  entirely when none need back-filling) and propagates a registry
  load/parse/timeout failure as ErrCodeInternal through PrepareAndValidate,
  instead of swallowing it into a misleading non-retryable unsupported-type
  rejection.
- OpenAPI: document that a missing type on a known registry component is
  back-filled (unknown component still 400).
- Adds a deterministic Close/adopt drain test and a registry-error
  propagation test.

The back-fill pre-scan and loop also skip disabled refs (Type=="" && IsEnabled): ValidateCoherence skips disabled refs, so a type-less disabled legacy stub must not force a registry load that could fail an otherwise-usable recipe on a registry error. Covered by a failing-provider test.

Also guard the public pkg/bundler entry point: DefaultBundler.Make now
validates a provider-preserving defensive copy of the RecipeResult before
generating, so a direct bundler caller (a documented public API whose Quick
Start calls Make directly) cannot bypass the coherence gate the CLI/server
boundaries apply. Adds a direct-bundler rejection test, a back-fill-loop
disabled-skip test, and corrects the PrepareAndValidate/CanonicalizeComponentTypes
and BackfillComponentTypes godoc (enabled-only back-fill; adoption calls the
composite helper).

OpenAPI accuracy (cross-review): the componentRefs item schema requires only
name, not type — the server back-fills a missing type for a known registry
component, so requiring type would make a spec-strict client reject a payload
the server accepts. Also drop the phantom per-ref order field from the schema
and examples: ComponentRef has no order member (ordering is the top-level
deploymentOrder), so a submitted order is silently discarded.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
@yuanchen8911 yuanchen8911 force-pushed the fix/1584-componentref-coherence branch from 29fe635 to dbab371 Compare July 2, 2026 02:15
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jul 2, 2026
…oc (NVIDIA#1588)

Follow-ups surfaced by the cross-review of NVIDIA#1585.

ComponentRef.Patches is carried through resolution (deep-copied, overlay-
merged) but applied by NO deployer — localformat's Component has no patches
field and every generator omits it — so a recipe that declares `patches:`
silently produces an unpatched bundle. Reject any ref that declares patches
in coherenceProblem (fail loud) rather than dropping them silently, and mark
the field's godoc as unapplied. No recipe uses `patches` in-tree or in the
internal recipes repo, so this only turns a latent silent-drop into a clear
error; implementing patch application (or removing the field) is the
alternative, tracked in NVIDIA#1588.

Also correct the coherenceProblem godoc: it over-generalized that "the
deployers do not trust the declared Type." That holds for the
field-classifying deployers (localformat and the Helm/Helmfile/ArgoCD
generators built on it), but the Flux generator is Helm-only and switches on
Type (it already rejects a non-Helm ref with a clear error). Clarify this.

Fixes: NVIDIA#1588
Related: NVIDIA#1585

Document the rejected field for authors: add patches to the OpenAPI
componentRefs schema (marked unsupported/rejected) and a note in
docs/integrator/recipe-development.md, and clarify the Go field comment that
only ENABLED refs are rejected (disabled are skipped). Also finish the
type-handling wording: the godoc and the Helm-ref error no longer over-claim
that all deployers classify by tag/path — the field-classifying deployers do
(building a mismatched Helm ref as Kustomize), while the Flux deployer
switches on canonical Type and rejects a non-Helm ref outright.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jul 2, 2026
…oc (NVIDIA#1588)

Follow-ups surfaced by the cross-review of NVIDIA#1585.

ComponentRef.Patches is carried through resolution (deep-copied, overlay-
merged) but applied by NO deployer — localformat's Component has no patches
field and every generator omits it — so a recipe that declares `patches:`
silently produces an unpatched bundle. Reject any ref that declares patches
in coherenceProblem (fail loud) rather than dropping them silently, and mark
the field's godoc as unapplied. No recipe uses `patches` in-tree or in the
internal recipes repo, so this only turns a latent silent-drop into a clear
error; implementing patch application (or removing the field) is the
alternative, tracked in NVIDIA#1588.

Also correct the coherenceProblem godoc: it over-generalized that "the
deployers do not trust the declared Type." That holds for the
field-classifying deployers (localformat and the Helm/Helmfile/ArgoCD
generators built on it), but the Flux generator is Helm-only and switches on
Type (it already rejects a non-Helm ref with a clear error). Clarify this.

Fixes: NVIDIA#1588
Related: NVIDIA#1585

Document the rejected field for authors: add patches to the OpenAPI
componentRefs schema (marked unsupported/rejected) and a note in
docs/integrator/recipe-development.md, and clarify the Go field comment that
only ENABLED refs are rejected (disabled are skipped). Also finish the
type-handling wording: the godoc and the Helm-ref error no longer over-claim
that all deployers classify by tag/path — the field-classifying deployers do
(building a mismatched Helm ref as Kustomize), while the Flux deployer
switches on canonical Type and rejects a non-Helm ref outright.

Review nits: correct the componentRefs schema description (order is the
top-level deploymentOrder, not per-ref), add chart to the cli-reference Helm
example for parity with the API examples, and make the blocking test
DataProvider honor context cancellation.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jul 2, 2026
…oc (NVIDIA#1588)

Follow-ups surfaced by the cross-review of NVIDIA#1585.

ComponentRef.Patches is carried through resolution (deep-copied, overlay-
merged) but applied by NO deployer — localformat's Component has no patches
field and every generator omits it — so a recipe that declares `patches:`
silently produces an unpatched bundle. Reject any ref that declares patches
in coherenceProblem (fail loud) rather than dropping them silently, and mark
the field's godoc as unapplied. No recipe uses `patches` in-tree or in the
internal recipes repo, so this only turns a latent silent-drop into a clear
error; implementing patch application (or removing the field) is the
alternative, tracked in NVIDIA#1588.

Also correct the coherenceProblem godoc: it over-generalized that "the
deployers do not trust the declared Type." That holds for the
field-classifying deployers (localformat and the Helm/Helmfile/ArgoCD
generators built on it), but the Flux generator is Helm-only and switches on
Type (it already rejects a non-Helm ref with a clear error). Clarify this.

Fixes: NVIDIA#1588
Related: NVIDIA#1585

Document the rejected field for authors: add patches to the OpenAPI
componentRefs schema (marked unsupported/rejected) and a note in
docs/integrator/recipe-development.md, and clarify the Go field comment that
only ENABLED refs are rejected (disabled are skipped). Also finish the
type-handling wording: the godoc and the Helm-ref error no longer over-claim
that all deployers classify by tag/path — the field-classifying deployers do
(building a mismatched Helm ref as Kustomize), while the Flux deployer
switches on canonical Type and rejects a non-Helm ref outright.

Review nits: correct the componentRefs schema description (order is the
top-level deploymentOrder, not per-ref), add chart to the cli-reference Helm
example for parity with the API examples, and make the blocking test
DataProvider honor context cancellation.

Declare the top-level deploymentOrder property in the RecipeResponse schema
(array of component names) and include it in the /v1/bundle example, so a
GET->POST round trip through a generated client preserves deployment
sequencing instead of dropping it. Correct the remaining Flux wording: a
Helm ref carrying tag/path builds as Kustomize under the field-classifying
deployers but as Helm under Flux (which switches on the declared Type) — it
is not rejected by Flux; only an explicitly non-Helm type is. Fixed in the
godoc bullet, the Helm-ref error, and the case-insensitive/default-branch
comments.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jul 2, 2026
…oc (NVIDIA#1588)

Follow-ups surfaced by the cross-review of NVIDIA#1585.

ComponentRef.Patches is carried through resolution (deep-copied, overlay-
merged) but applied by NO deployer — localformat's Component has no patches
field and every generator omits it — so a recipe that declares `patches:`
silently produces an unpatched bundle. Reject any ref that declares patches
in coherenceProblem (fail loud) rather than dropping them silently, and mark
the field's godoc as unapplied. No recipe uses `patches` in-tree or in the
internal recipes repo, so this only turns a latent silent-drop into a clear
error; implementing patch application (or removing the field) is the
alternative, tracked in NVIDIA#1588.

Also correct the coherenceProblem godoc: it over-generalized that "the
deployers do not trust the declared Type." That holds for the
field-classifying deployers (localformat and the Helm/Helmfile/ArgoCD
generators built on it), but the Flux generator is Helm-only and switches on
Type (it already rejects a non-Helm ref with a clear error). Clarify this.

Fixes: NVIDIA#1588
Related: NVIDIA#1585

Document the rejected field for authors: add patches to the OpenAPI
componentRefs schema (marked unsupported/rejected) and a note in
docs/integrator/recipe-development.md, and clarify the Go field comment that
only ENABLED refs are rejected (disabled are skipped). Also finish the
type-handling wording: the godoc and the Helm-ref error no longer over-claim
that all deployers classify by tag/path — the field-classifying deployers do
(building a mismatched Helm ref as Kustomize), while the Flux deployer
switches on canonical Type and rejects a non-Helm ref outright.

Review nits: correct the componentRefs schema description (order is the
top-level deploymentOrder, not per-ref), add chart to the cli-reference Helm
example for parity with the API examples, and make the blocking test
DataProvider honor context cancellation.

Declare the top-level deploymentOrder property in the RecipeResponse schema
(array of component names) and include it in the /v1/bundle example, so a
GET->POST round trip through a generated client preserves deployment
sequencing instead of dropping it. Correct the remaining Flux wording: a
Helm ref carrying tag/path builds as Kustomize under the field-classifying
deployers but as Helm under Flux (which switches on the declared Type) — it
is not rejected by Flux; only an explicitly non-Helm type is. Fixed in the
godoc bullet, the Helm-ref error, and the case-insensitive/default-branch
comments.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jul 2, 2026
…oc (NVIDIA#1588)

Follow-ups surfaced by the cross-review of NVIDIA#1585.

ComponentRef.Patches is carried through resolution (deep-copied, overlay-
merged) but applied by NO deployer — localformat's Component has no patches
field and every generator omits it — so a recipe that declares `patches:`
silently produces an unpatched bundle. Reject any ref that declares patches
in coherenceProblem (fail loud) rather than dropping them silently, and mark
the field's godoc as unapplied. No recipe uses `patches` in-tree or in the
internal recipes repo, so this only turns a latent silent-drop into a clear
error; implementing patch application (or removing the field) is the
alternative, tracked in NVIDIA#1588.

Also correct the coherenceProblem godoc: it over-generalized that "the
deployers do not trust the declared Type." That holds for the
field-classifying deployers (localformat and the Helm/Helmfile/ArgoCD
generators built on it), but the Flux generator is Helm-only and switches on
Type (it already rejects a non-Helm ref with a clear error). Clarify this.

Fixes: NVIDIA#1588
Related: NVIDIA#1585

Document the rejected field for authors: add patches to the OpenAPI
componentRefs schema (marked unsupported/rejected) and a note in
docs/integrator/recipe-development.md, and clarify the Go field comment that
only ENABLED refs are rejected (disabled are skipped). Also finish the
type-handling wording: the godoc and the Helm-ref error no longer over-claim
that all deployers classify by tag/path — the field-classifying deployers do
(building a mismatched Helm ref as Kustomize), while the Flux deployer
switches on canonical Type and rejects a non-Helm ref outright.

Review nits: correct the componentRefs schema description (order is the
top-level deploymentOrder, not per-ref), add chart to the cli-reference Helm
example for parity with the API examples, and make the blocking test
DataProvider honor context cancellation.

Declare the top-level deploymentOrder property in the RecipeResponse schema
(array of component names) and include it in the /v1/bundle example, so a
GET->POST round trip through a generated client preserves deployment
sequencing instead of dropping it. Correct the remaining Flux wording: a
Helm ref carrying tag/path builds as Kustomize under the field-classifying
deployers but as Helm under Flux (which switches on the declared Type) — it
is not rejected by Flux; only an explicitly non-Helm type is. Fixed in the
godoc bullet, the Helm-ref error, and the case-insensitive/default-branch
comments.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
@yuanchen8911 yuanchen8911 merged commit 1ee6eeb into NVIDIA:main Jul 2, 2026
160 checks passed
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request Jul 2, 2026
…oc (NVIDIA#1588)

Follow-ups surfaced by the cross-review of NVIDIA#1585.

ComponentRef.Patches is carried through resolution (deep-copied, overlay-
merged) but applied by NO deployer — localformat's Component has no patches
field and every generator omits it — so a recipe that declares `patches:`
silently produces an unpatched bundle. Reject any ref that declares patches
in coherenceProblem (fail loud) rather than dropping them silently, and mark
the field's godoc as unapplied. No recipe uses `patches` in-tree or in the
internal recipes repo, so this only turns a latent silent-drop into a clear
error; implementing patch application (or removing the field) is the
alternative, tracked in NVIDIA#1588.

Also correct the coherenceProblem godoc: it over-generalized that "the
deployers do not trust the declared Type." That holds for the
field-classifying deployers (localformat and the Helm/Helmfile/ArgoCD
generators built on it), but the Flux generator is Helm-only and switches on
Type (it already rejects a non-Helm ref with a clear error). Clarify this.

Fixes: NVIDIA#1588
Related: NVIDIA#1585

Document the rejected field for authors: add patches to the OpenAPI
componentRefs schema (marked unsupported/rejected) and a note in
docs/integrator/recipe-development.md, and clarify the Go field comment that
only ENABLED refs are rejected (disabled are skipped). Also finish the
type-handling wording: the godoc and the Helm-ref error no longer over-claim
that all deployers classify by tag/path — the field-classifying deployers do
(building a mismatched Helm ref as Kustomize), while the Flux deployer
switches on canonical Type and rejects a non-Helm ref outright.

Review nits: correct the componentRefs schema description (order is the
top-level deploymentOrder, not per-ref), add chart to the cli-reference Helm
example for parity with the API examples, and make the blocking test
DataProvider honor context cancellation.

Declare the top-level deploymentOrder property in the RecipeResponse schema
(array of component names) and include it in the /v1/bundle example, so a
GET->POST round trip through a generated client preserves deployment
sequencing instead of dropping it. Correct the remaining Flux wording: a
Helm ref carrying tag/path builds as Kustomize under the field-classifying
deployers but as Helm under Flux (which switches on the declared Type) — it
is not rejected by Flux; only an explicitly non-Helm type is. Fixed in the
godoc bullet, the Helm-ref error, and the case-insensitive/default-branch
comments.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api area/bundler area/docs size/XL theme/recipes Recipe expansion, overlays, mixins, and component registry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resolver does not reject ComponentRefs with incoherent type/field combinations

2 participants