Skip to content

feat: os-talos mixin + bundler preManifestFiles support#856

Merged
mchmarny merged 21 commits into
mainfrom
feat/talos_namespace_mixin
May 13, 2026
Merged

feat: os-talos mixin + bundler preManifestFiles support#856
mchmarny merged 21 commits into
mainfrom
feat/talos_namespace_mixin

Conversation

@ayuskauskas
Copy link
Copy Markdown
Contributor

Summary

Adds an opt-in recipes/mixins/os-talos.yaml mixin that overrides install namespaces to privileged-<component> and bundles PSA-privileged Namespace manifests for five components, plus the bundler change that makes those Namespace manifests apply before the chart instead of after.

Motivation / Context

Talos Linux enforces Pod Security Admission (PSA) restricted by default. AICR components that need privileged access (host filesystem, host network, root user) get rejected. The Talos collector PR (#762) addressed the snapshot agent's pod-shape; this PR addresses the downstream components.

Fixes part of #565.
Related: #762 (Talos local-test harness + snapshot agent OS=talos branch).

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: recipes/components/<name>/talos/namespace.yaml × 5, recipes/mixins/os-talos.yaml, site/.vitepress/config.ts

Implementation Notes

recipes/mixins/os-talos.yaml (new)

A new mixin (kind: RecipeMixin) that recipe authors opt into via spec.mixins: [os-talos]. For each of five components — gpu-operator, network-operator, nvsentinel, nvidia-dra-driver-gpu, nodewright-operator — it overrides the install namespace to privileged-<name> and attaches a per-component talos/namespace.yaml via the new preManifestFiles field. Constraint OS.release.ID == talos guards against accidental non-Talos installs.

Excluded by design (deferred follow-up tracked under #565): cloud-specific drivers (aws-ebs-csi-driver, aws-efa, gke-nccl-tcpxo), nodewright-customizations (per-node privileged story not settled), kube-prometheus-stack (only node-exporter needs privileged; whole-chart move is too coarse).

Per-component Namespace manifests

Five new files at recipes/components/<name>/talos/namespace.yaml. Each carries pod-security.kubernetes.io/{enforce,audit,warn}=privileged plus AICR-managed selectors (app.kubernetes.io/managed-by: aicr, app.kubernetes.io/component: <name>, aicr.nvidia.com/os: talos) so fleet-wide audits can identify them.

Bundler preManifestFiles support

Mechanical extension of the existing ManifestFiles post-injection path:

  • Schema (pkg/recipe/metadata.go): new ComponentRef.PreManifestFiles []string field; additive merge mirrors the existing ManifestFiles merge.
  • Manifest collection (pkg/bundler/bundler.go): a phase-parameterized collectComponentManifestsByPhase shared by collectComponentManifests (post, preserved as the original entry point) and the new collectComponentPreManifests. One body, two thin wrappers.
  • Bundler plumbing (pkg/bundler/bundler.go + pkg/bundler/deployer/{argocd,argocdhelm,helm}/): each Generator now carries ComponentPreManifests and ComponentPostManifests. Pre/post error paths are differentiated and route through errors.PropagateOrWrap.
  • Writer (pkg/bundler/deployer/localformat/writer.go): new Options.injectAuxiliaryFolder(idx, c, phase) helper used by both pre and post emission — single writeLocalHelmFolder call site for auxiliary folders. Pre-injection fires before classify(...) so it works regardless of primary kind (upstream-helm, local-helm, kustomize, or vendored). Folder upper-bound raised from 2*N to 3*N. Collision check covers both <name>-pre and <name>-post. The pre path coexists with VendorCharts; the post path is skipped under vendoring (matches the pre-existing "vendored mode collapses mixed into one folder" contract).

Documentation

  • New page docs/integrator/talos-integration.md documents what the mixin does, the five privileged namespaces, the PSA label set, apply ordering, and what's out of scope.
  • One-sentence cross-link added to docs/integrator/recipe-development.md so recipe authors find the new page when they pick os: talos.
  • Sidebar entry in site/.vitepress/config.ts (make check-docs-sidebar enforces).

Testing

unset GITLAB_TOKEN
go build ./...                                                  # clean
go test -race -count=1 ./pkg/recipe/... ./pkg/bundler/...       # 16 packages, all pass
golangci-lint run -c .golangci.yaml ./pkg/recipe/... ./pkg/bundler/...
make lint-yaml
make check-docs-sidebar
make check-agents-sync

Golden coverage for the new code path:

  • pkg/bundler/deployer/helm/testdata/mixed_with_pre/preManifestFiles-only scenario. Bundle = 001-foo-pre/ (local-helm wrapping Namespace) + 002-foo/ (upstream chart). Locks in lexicographic ordering for the deploy.sh [0-9][0-9][0-9]-*/ glob.
  • pkg/bundler/deployer/helm/testdata/mixed_gpu_operator/ — extended scenario covering pre + primary + post in one bundle (001-gpu-operator-pre/, 002-gpu-operator/, 003-gpu-operator-post/).
  • pkg/bundler/deployer/argocd/testdata/mixed_with_pre/ — Argo mirror. Sync-waves 0, 1, 2 for pre/primary/post; verifies the wave-from-folder-index allocation in argocd.go.

Existing 5 helm and 5 argocd-style goldens remain byte-identical: pre-injection is a no-op for any recipe that doesn't set PreManifestFiles.

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rollout notes: Pure additive. No existing recipe sets preManifestFiles, so every shipping bundle's output is byte-identical (verified by the unchanged 10 pre-existing goldens). The mixin is opt-in via leaf-overlay spec.mixins: [os-talos]; no leaf overlay is changed in this PR (recipe-author territory). Revert is a single git revert of the merge commit.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

ManifestFiles applies after the primary chart (sync-wave N+1).
PreManifestFiles applies before (sync-wave N-1) so resources the chart
depends on — like a Namespace with PSS-privileged labels — can land
ahead of the pods that need them.

Merge semantics mirror ManifestFiles exactly: union the base and
overlay lists, deduplicate by path, preserve order. Used by the
os-talos mixin (follow-up commits) to attach per-component
talos/namespace.yaml files in front of each privileged chart.
Introduce manifestPhase enum (pre/post) and a shared
collectComponentManifestsByPhase body so adding pre-manifest support
in follow-up commits doesn't fork the loader. Existing
collectComponentManifests is now a thin wrapper over the post path;
all current call sites stay identical. The new
collectComponentPreManifests companion is unused in this commit and
gets wired into the bundler entry points in the next commit.
localformat.Options, the three deployer Generators (helm, argocd,
argocdhelm), and bundler.go's three call sites all now distinguish
ComponentPreManifests from ComponentPostManifests. The writer keeps
reading only from the post map for now — Task 4 wires the pre-injection
branch — so output is unchanged because the pre map stays empty for
every existing recipe (none sets PreManifestFiles).

Also folds in two cleanups from the Task 2 review:

- iota order swapped so phasePostManifests is the zero value
  (defensive: a future unspecified phase falls through to legacy
  post behavior, not pre).
- A default arm on the switch in collectComponentManifestsByPhase
  returns ErrCodeInternal rather than silently empty results.
- Removed the //nolint:unused on collectComponentPreManifests —
  it's now wired into all three deployer call sites.
Address two findings from the Task 3 code review:

* Both pre- and post-manifest collection paths used the same error
  message ("failed to collect component manifests"), so a stack trace
  couldn't tell which phase failed. Distinct messages now name the
  phase ("...pre-manifests" / "...post-manifests").

* The six error-wrap blocks across the three deployer call sites in
  buildDeployer reproduced what pkg/errors.PropagateOrWrap already
  does. Collapsed each onto a single PropagateOrWrap call. stderrors
  is no longer imported by bundler.go where applicable.

Pure refactor of the error-handling shape — no behavior change beyond
the error message text.
…lder

Add an injectionPhase enum (pre/post) and an Options.injectAuxiliaryFolder
helper used by both phases. Refactored the existing post-injection block
inside KindUpstreamHelm to use the helper (byte-identical output verified
via existing goldens) and added the parallel pre-injection call before
the primary folder emission in every per-component iteration.

The collision check is extended: a component with preManifestFiles
cannot coexist with an explicitly-declared <name>-pre component
(symmetric with the existing <name>-post check). The folder upper
bound rises from 2*N to 3*N to account for the new third folder per
component.

Sync-wave allocation in argocd.go is unchanged — wave = folder index —
so an emitted pre folder lands at wave N-1, primary at N, post at N+1
for free without any argocd code change. Existing recipes don't set
PreManifestFiles, so all current bundles are byte-identical.
…e collision

Address two optional one-line suggestions from the Task 4 code review:

- Note on injectionPhase: explain why it's string-typed (folder-name +
  error-message embedding) given that the sibling manifestPhase in
  pkg/bundler/bundler.go uses int-iota.
- Note on the pre collision check: spell out why there's no
  Repository-guard there (pre injection runs regardless of primary
  kind), so the asymmetry with the post block is intentional rather
  than missing.
TestBundleGolden_MixedWithPre constructs a single-component Generator
with ComponentPreManifests set (a Namespace YAML with PSS-privileged
labels) and an upstream Helm chart. Verifies the bundle emits:

  001-foo-pre/   (local-helm wrapping the namespace manifest)
  002-foo/       (upstream-helm primary)

Locks in the invariant that the pre folder lands BEFORE the primary
(opposite of -post). Pairs with the writer change in commit 1b290883
which added the pre-injection branch via injectAuxiliaryFolder.

Existing 5 helm goldens (upstream_helm_only, manifest_only,
mixed_gpu_operator, kai_scheduler_present, nodewright_present) remain
byte-identical — pre-only scenarios don't perturb them.
…st doc

The helm deployer sequences via filesystem lexicographic order, not
Argo CD sync-waves. Code-quality review of commit 6d1a374b flagged
the cross-deployer 'sync-wave' framing as potentially misleading in
this file. One-line trim of the doc comment on
TestBundleGolden_MixedWithPre.
Extend TestBundleGolden_MixedGPUOperator to also set
ComponentPreManifests (a privileged-gpu-operator Namespace YAML),
so the scenario now exercises the full three-phase emission:

  001-gpu-operator-pre/    (Namespace manifest, NEW)
  002-gpu-operator/         (upstream chart, index shifted from 001)
  003-gpu-operator-post/    (dcgm-exporter, index shifted from 002)

Locks in the lexicographic ordering that deploy.sh's
[0-9][0-9][0-9]-*/ glob iterates at install time. The Argo deployer
will see the same shift via folder-index → sync-wave; that mirror
scenario lands in the next commit.

Existing 5 helm goldens stay byte-identical — only this scenario's
expected output changes.
TestBundleGolden_MixedWithPre freezes the Argo CD bundle output for a
mixed component with both ComponentPreManifests and
ComponentPostManifests set. The bundler emits three folders:

  001-gpu-operator-pre/    (Namespace manifest, sync-wave 0)
  002-gpu-operator/         (upstream chart, sync-wave 1)
  003-gpu-operator-post/    (dcgm-exporter config, sync-wave 2)

Sync-wave comes from folder index (argocd.go:472), so the pre folder
naturally lands at wave 0 ahead of the chart at wave 1 — no argocd
deployer change needed. Locks in the wave-from-index invariant against
future writer refactors.
Five new Namespace manifests under <component>/talos/namespace.yaml,
each carrying pod-security.kubernetes.io/{enforce,audit,warn}=privileged
plus AICR-managed selectors. Referenced from the upcoming os-talos
mixin via componentRefs[].preManifestFiles so they land at
sync-wave N-1 ahead of each chart at wave N.

Components covered: gpu-operator, network-operator, nvsentinel,
nvidia-dra-driver-gpu, nodewright-operator. Out of scope: cloud-
specific drivers (aws-*, gke-*) and nodewright-customizations —
deferred per docs/superpowers/specs/2026-05-11-talos-namespace-mixin-design.md.
Redirects gpu-operator, network-operator, nvsentinel,
nvidia-dra-driver-gpu, and nodewright-operator to their
privileged-<name> namespaces and bundles each component's
talos/namespace.yaml via preManifestFiles. Opted into via a leaf
overlay's spec.mixins: [os-talos]; no leaf overlay change in this
commit (recipe-author territory).
Standalone page documenting what the mixin does, the five
privileged-<name> namespaces it generates, the PSA label set,
apply ordering, and what's out of scope. Cross-linked from
recipe-development.md so recipe authors find it when picking
os: talos. Sidebar entry added in site/.vitepress/config.ts so
check-docs-sidebar passes.
…loy.sh template

The undeploy.sh template on main grew new pre-flight extra-CRD cleanup
logic after this branch's cherry-picks were authored, so the
mixed_with_pre golden's bundled undeploy.sh was 16 lines short. Pure
regenerated-golden delta; no test or production code changed.
…name

Renaming ComponentManifests → ComponentPostManifests in the Test_Mixed
struct literal threw the column alignment off; gofmt straightens it.
Pure formatting; no behavior change.
@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

Coverage Report ✅

Metric Value
Coverage 77.0%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-77.0%25-green)

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/aicr/pkg/bundler 62.36% (+1.04%) 👍
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocd 80.30% (ø)
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocdhelm 80.85% (ø)
github.com/NVIDIA/aicr/pkg/bundler/deployer/helm 84.21% (ø)
github.com/NVIDIA/aicr/pkg/bundler/deployer/localformat 73.83% (+0.97%) 👍
github.com/NVIDIA/aicr/pkg/recipe 89.50% (+0.11%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/aicr/pkg/bundler/bundler.go 58.35% (+1.46%) 413 (+7) 241 (+10) 172 (-3) 👍
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocd/argocd.go 80.30% (ø) 132 106 26
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocdhelm/argocdhelm.go 80.85% (ø) 355 287 68
github.com/NVIDIA/aicr/pkg/bundler/deployer/helm/helm.go 84.21% (ø) 114 96 18
github.com/NVIDIA/aicr/pkg/bundler/deployer/localformat/doc.go 0.00% (ø) 0 0 0
github.com/NVIDIA/aicr/pkg/bundler/deployer/localformat/local_helm.go 73.21% (ø) 56 41 15
github.com/NVIDIA/aicr/pkg/bundler/deployer/localformat/vendor_folder.go 70.53% (ø) 95 67 28
github.com/NVIDIA/aicr/pkg/bundler/deployer/localformat/writer.go 74.11% (+3.64%) 197 (+48) 146 (+41) 51 (+7) 👍
github.com/NVIDIA/aicr/pkg/recipe/metadata.go 96.55% (+0.12%) 232 (+8) 224 (+8) 8 👍
github.com/NVIDIA/aicr/pkg/recipe/metadata_store.go 84.80% (+0.49%) 342 (+17) 290 (+16) 52 (+1) 👍

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely structured refactor — the phase-parameterized collector and the shared injectAuxiliaryFolder helper keep the writer footprint small, and the goldens lock the lexicographic install order. Mixin authoring is clean, and the docs page is well-scoped.

One blocking issue: the helm-deployer pre-folder pattern (helm install … --namespace privileged-<x> --create-namespace against a chart whose only template is Namespace/privileged-<x>) is a known Helm 3 install conflict — Helm rejects the rendered Namespace as "already exists, missing managed-by=Helm". Every Talos component will hit this on first install via the helm deployer. Argo CD path may tolerate it; Helm path will not. See the comment on 001-foo-pre/install.sh for fix options.

Secondary asks: an e2e test that resolves the os-talos mixin end-to-end, a vendor + pre combination test, and stronger coupling between ComponentRef.namespace and the pre-manifest's Namespace.metadata.name.

CI is green and the goldens cover what they claim — the gap is install-time, not file-shape. Marking as request-changes because the production scenario is the one the goldens don't exercise.

Comment thread pkg/bundler/deployer/localformat/writer.go
Comment thread recipes/mixins/os-talos.yaml
Comment thread recipes/mixins/os-talos.yaml
Comment thread pkg/bundler/deployer/localformat/writer.go
Comment thread pkg/recipe/metadata.go
… only

ADR-005 originally rejected any mixin componentRef whose name already
exists in the inheritance chain. That blocked the os-talos mixin from
contributing namespace + preManifestFiles overrides to components like
gpu-operator that are already declared upstream — exactly what the
mixin needs to do for OS-conditional PSA bookkeeping on Talos.

Relax the rule to a field-scoped carve-out: a mixin componentRef whose
name collides with the chain is allowed if and only if the mixin sets
nothing beyond the additive set {Namespace, ManifestFiles,
PreManifestFiles}. Identity / sourcing fields (Chart, Type, Source,
Version, Tag, Path, ValuesFile, Overrides, Patches, DependencyRefs,
Cleanup, ExpectedResources, HealthCheckAsserts) still produce a hard
error — those are the fields ADR-005's 'Silent constraint override'
risk row was protecting, and the mitigation stays intact for them.

Implementation:
- New mixinComponentRefSafeForMerge() helper inspects an incoming
  mixin entry and returns the first offending identity field, so the
  resolver's error message names the violation precisely.
- mergeMixins() drops the blanket component-name collision check and
  calls the helper on name collisions; constraint name collision check
  is unchanged.

Tests:
- TestMixinComponentRefSafeForMerge: 17 cases covering every additive-
  safe field combination plus every identity field individually.
- TestMixinOSTalos_AppliesPrivilegedNamespacesAndPreManifests: e2e
  check that loading the os-talos mixin from the embedded data and
  applying it to a synthetic spec with concrete chart identities
  produces the five privileged-<name> namespaces and the matching
  talos/namespace.yaml entries on each component's PreManifestFiles.

Docs:
- ADR-005 Phase 3 step 7, the exit criteria, and the Risk Table all
  updated to spell out the carve-out and preserve the rationale.
…>/manifests/

Move the five per-component talos namespace manifests from
recipes/components/<n>/talos/namespace.yaml to
recipes/components/<n>/manifests/talos-namespace.yaml. Aligns with the
established 'manifests/' convention for component-attached YAML and
reuses the existing components/*/manifests/*.yaml embed glob, dropping
the components/*/talos/*.yaml entry I added earlier in the branch.

The flat 'talos-namespace.yaml' naming (rather than a 'manifests/talos/'
sub-directory) avoids growing the embed surface until there's actually
more than one Talos-conditional manifest per component to justify it.

Updated:
- 5 file moves (git tracks as renames; content unchanged)
- recipes/data.go embed glob (drops components/*/talos/*.yaml)
- recipes/mixins/os-talos.yaml (5 preManifestFiles paths)
- docs/integrator/talos-integration.md (manifest-source column + prose)
- pkg/recipe/metadata_store_test.go (TestMixinOSTalos_* path expectations)
- pkg/bundler/deployer/helm/helm_test.go (golden scenarios)
- pkg/bundler/deployer/argocd/argocd_test.go (golden scenario assertion path)
- All three golden trees (mixed_with_pre helm + argocd, extended
  mixed_gpu_operator) regenerated.

Verified end-to-end with the local mydata leaf overlay:
  aicr recipe --data ./mydata --service eks --accelerator h100     --intent inference --os talos -o /tmp/talos.yaml   # 16 components
  aicr bundle -r /tmp/talos.yaml -o bundle             # 111 files
  ls bundle/ | grep gpu-operator
    016-gpu-operator-pre   # Namespace + PSS labels (sync-wave N-1)
    017-gpu-operator        # upstream chart
    018-gpu-operator-post   # dcgm-exporter / etc.
…uard, undeploy -pre mapping

Addresses PR #856 review feedback.

Pre-folder install.sh no longer passes --create-namespace to helm.
The pre-injection chart's Namespace template owns namespace creation;
when --create-namespace ran first the API created the namespace
without Helm-ownership annotations and helm 3 refused to import it
("missing app.kubernetes.io/managed-by=Helm"). The install template
now conditionally emits --create-namespace via a new CreateNamespace
flag; pre folders pass false, primary and post pass true. Vendored
primary folders thread the flag too.

Add bundle-time drift guard: any Namespace doc in a pre-manifest must
target ComponentRef.Namespace. Without this, a typo in the mixin
manifest produces a bundle that installs cleanly past type checks but
blows up at helm install with an opaque "namespace not found".

Wire the helm-deployer undeploy.sh template to map "<name>-pre"
folders to their parent namespace. Previously only primary and
"-post" had mappings, so foo-pre's ns would resolve to "" and the
release was silently skipped, leaving the privileged-namespace
release orphaned.

Generalize the ServerSideApply rationale comment in the Argo CD
Application template so it covers Applications that don't need the
262144-byte annotation-cap workaround (pre-folder Namespaces,
post-folder ConfigMaps) — keeping it uniform across every
Application avoids per-Application template branching.

Other review items:
- collectComponentManifestsByPhase switches to errors.PropagateOrWrap
  so a future provider returning a more specific code (e.g.
  ErrCodeNotFound from a layered provider) isn't masked.
- PreManifestFiles godoc calls out the ".." traversal guarantees from
  the data provider so a recipe author understands the path contract.
- Stale "the writer threads the map through but does not yet emit
  pre-phase folders" comment removed from localformat/doc.go.
- talos-integration.md drops the leading HTML license comments (Fern
  CI rejects them) and labels its layout fence as `text`.
- recipe-development.md clarifies that os-talos appends to spec.mixins
  rather than replaces, and explains that the mixin supplies
  Namespaces via componentRefs[].preManifestFiles.

Tests:
- TestWrite_PreFolderInstallOmitsCreateNamespace pins the install.sh
  flag contract for pre / primary / post folders.
- TestWrite_VendorCharts_PreManifests covers VendorCharts=true with
  ComponentPreManifests so the pre folder is emitted ahead of the
  vendored primary and not recorded in the vendor audit.
- TestWrite_PreManifestNamespaceDrift exercises the new drift guard.
- mixed_gpu_operator fixture namespace updated to
  privileged-gpu-operator to match its Namespace metadata.name; all
  helm/argocd/argocdhelm goldens regenerated.

make qualify passes locally.
coderabbitai[bot]

This comment was marked as resolved.

…S-mixin example

Addresses additional CodeRabbit comments on PR #856.

writer.go: replace the 3*len(Components) > 999 fail-fast with a precise
count of emitted folders using the same pre/primary/post rules the main
loop applies. The old bound rejected valid bundles once a recipe had
more than 333 components even when most components produced only their
primary folder — e.g., 500 plain upstream-Helm components emit 500
NNN-* directories but the old check tripped before any work happened.
Add TestWrite_FolderLimit_CountsEmissionsNotComponents pinning that 400
primary-only components stay under the cap and 1000 exceed it.

recipe-development.md: drop the misleading "[os-ubuntu, os-talos]"
example. OS-scoped mixins are mutually exclusive — combining two is a
recipe authoring error, not a supported composition — so the example
now uses "[platform-kubeflow, os-talos]" alongside an explicit note
that os-ubuntu + os-talos in one overlay is not valid.
coderabbitai[bot]

This comment was marked as resolved.

Addresses CodeRabbit comments on PR #856.

writer_test.go: TestWrite_PreManifestNamespaceDrift's docstring was
sitting above TestWrite_FolderLimit_CountsEmissionsNotComponents after
that test got inserted between the comment and its function. Move the
comment back to its function.

writer.go: validatePreManifestNamespace breaks the inner YAML decode
loop on parse error, which skips any subsequent docs in the same
multi-doc file. Document this in the function comment alongside the
escape hatch (swap break for continue) so a future change to allow
multiple Namespace docs per pre-manifest file has the breadcrumb.
@mchmarny mchmarny merged commit 7f052d3 into main May 13, 2026
57 checks passed
@mchmarny mchmarny deleted the feat/talos_namespace_mixin branch May 13, 2026 10:03
mchmarny added a commit that referenced this pull request May 13, 2026
…pace

KWOK e2e for the GB200 recipes (#868) surfaced a latent bug in PR #856:
pre-injection install.sh unconditionally omits --create-namespace, which
deadlocks any recipe that ships a bare-resource manifest (ConfigMap,
ResourceQuota, etc.) in preManifestFiles. The target namespace doesn't
exist yet — the primary chart's helm install --create-namespace hasn't
run — so the pre-pass install fails with 'namespaces "<n>" not found'.

The original 'pre = no --create-namespace' rule was correct for the
Talos case (chart ships a Namespace template; --create-namespace would
collide with Helm 3 ownership annotations) but wrong as a blanket rule.

Move the decision into writeLocalHelmFolder: scan rendered manifests
for a top-level 'kind: Namespace'; if one is present, downgrade
createNamespace to false; otherwise honor the caller's request. Callers
(injectAuxiliaryFolder, the primary local-helm path) now uniformly ask
for createNamespace=true and let detection decide.

Both call shapes are exercised:
- Talos: ships a Namespace → detection downgrades → install.sh omits
  --create-namespace (unchanged behavior; existing goldens pass)
- GB200 kernel-module-params: ships only a ConfigMap → detection leaves
  it true → install.sh creates 'gpu-operator' namespace before the
  ConfigMap, ahead of the primary chart's own --create-namespace which
  becomes a no-op against the existing namespace

Refs #859, follow-up to #856
@ayuskauskas ayuskauskas mentioned this pull request May 13, 2026
1 task
@coderabbitai coderabbitai Bot mentioned this pull request May 19, 2026
25 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants