fix(bundler): synthesize GKE ResourceQuota for critical-priority pods#921
Conversation
GKE Standard ships a kube-system ResourceQuota scoped to system-*-critical PriorityClasses. Per the Kubernetes spec, once any cluster-wide quota scopes by PriorityClass for those values, pods that request a matching priority class can only be created in namespaces that have a matching quota. AICR bundles emit gpu-operator (and any other chart that defaults priorityClassName to system-node-critical) into per-component namespaces, so admission blocks every pod and `helmfile apply --wait` times out after ~10 minutes with an opaque "Replicas: 0/1" error. Fix: add a `gkeCriticalPriority` field to the registry schema. When the recipe's criteria.service is "gke" and a referenced component declares it, the bundler synthesizes a permissive ResourceQuota (pods = max(criteria.Nodes × 32, 32)) into the component's namespace via PreManifestFiles so the quota exists before the chart's pods attempt admission. Marks gpu-operator today; supersedes the overlay-driven manifestFiles entry in gke-cos.yaml (deleted). Demo cuj1-gke-config.md: drops the manual kubectl apply workaround. Fixes #915
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR automates GKE ResourceQuota injection for components that declare Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/bundler/bundler.go`:
- Around line 1305-1330: The current renderGKECriticalPriorityQuota builds the
quota map and calls yaml.Marshal(quota) which yields non-deterministic YAML;
replace that call with serializer.MarshalYAMLDeterministic(quota) and update
imports to include the serializer package used elsewhere in the repo (add the
serializer import to the file's import block). Keep the same quota variable and
return signature so only the call site and imports change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: b43d9ccd-bb65-4a69-bb8d-a48cfb386836
📒 Files selected for processing (7)
demos/cuj1-gke-config.mdpkg/bundler/bundler.gopkg/bundler/bundler_gke_quota_test.gopkg/recipe/components.gorecipes/components/gpu-operator/manifests/gke-resource-quota.yamlrecipes/overlays/gke-cos.yamlrecipes/registry.yaml
💤 Files with no reviewable changes (2)
- recipes/components/gpu-operator/manifests/gke-resource-quota.yaml
- demos/cuj1-gke-config.md
Coverage Report ✅
Coverage BadgeMerging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
Per CodeRabbit review on #921. Switches the synthesized ResourceQuota from yaml.Marshal to serializer.MarshalYAMLDeterministic so the bytes are stable across runs — the manifest lands in the bundle artifact (checksummed and optionally attested), and yaml.v3 walks randomized Go map order. Adds TestRenderGKECriticalPriorityQuota_Deterministic to guard against accidental regression.
…al-aware dependsOn The flux deployer never consumed `ComponentPreManifests`, so the synthesized GKE critical-priority `ResourceQuota` from PR NVIDIA#921 was silently dropped on `--deployer flux` bundles and the original symptom from NVIDIA#915 reproduced. The same gap also blocked the os-talos mixin from working on flux. Add `ComponentPreManifests` to `flux.Generator`, wire it through `bundler.buildDeployer`, and emit a `<name>-pre` HelmRelease ahead of the primary when pre-manifests exist. Rewire the primary's `dependsOn` to point at `<name>-pre` so the chain is `previous → <name>-pre → <name> → <name>-post → next`. Also fix a pre-existing ordering bug: the next component used to depend on the previous component's primary name, not its terminal (`<prev>-post` for mixed components). New `terminalReleaseNameFor` helper resolves the correct tail; `buildPrimaryDependsOn` and the README renderer both use it. Without this fix, Flux could reconcile the next component in parallel with the previous component's post manifests. Add a `<name>-pre` collision guard mirroring the rule in `pkg/bundler/deployer/localformat/writer.go`. Tests: pre-only, pre+post with terminal-aware next-component dependsOn (regression guard), collision rejection, and a refreshed `TestBuildPrimaryDependsOn` covering mixed, manifest-only, and chart-only previous components. Docs: `pkg/bundler/deployer/flux/doc.go` describes the chain and the example tree now includes `<name>-pre/`; `docs/user/cli-reference.md` updates the Flux Deployment Order bullet. Fixes: NVIDIA#923
…al-aware dependsOn The flux deployer never consumed `ComponentPreManifests`, so the synthesized GKE critical-priority `ResourceQuota` from PR NVIDIA#921 was silently dropped on `--deployer flux` bundles and the original symptom from NVIDIA#915 reproduced. The same gap also blocked the os-talos mixin from working on flux. Add `ComponentPreManifests` to `flux.Generator`, wire it through `bundler.buildDeployer`, and emit a `<name>-pre` HelmRelease ahead of the primary when pre-manifests exist. Rewire the primary's `dependsOn` to point at `<name>-pre` so the chain is `previous → <name>-pre → <name> → <name>-post → next`. Also fix a pre-existing ordering bug: the next component used to depend on the previous component's primary name, not its terminal (`<prev>-post` for mixed components). New `terminalReleaseNameFor` helper resolves the correct tail; `buildPrimaryDependsOn` and the README renderer both use it. Without this fix, Flux could reconcile the next component in parallel with the previous component's post manifests. Add a `<name>-pre` collision guard mirroring the rule in `pkg/bundler/deployer/localformat/writer.go`. Tests: pre-only, pre+post with terminal-aware next-component dependsOn (regression guard), collision rejection, and a refreshed `TestBuildPrimaryDependsOn` covering mixed, manifest-only, and chart-only previous components. Docs: `pkg/bundler/deployer/flux/doc.go` describes the chain and the example tree now includes `<name>-pre/`; `docs/user/cli-reference.md` updates the Flux Deployment Order bullet. Fixes: NVIDIA#923
…al-aware dependsOn The flux deployer never consumed `ComponentPreManifests`, so the synthesized GKE critical-priority `ResourceQuota` from PR NVIDIA#921 was silently dropped on `--deployer flux` bundles and the original symptom from NVIDIA#915 reproduced. The same gap also blocked the os-talos mixin from working on flux. Add `ComponentPreManifests` to `flux.Generator`, wire it through `bundler.buildDeployer`, and emit a `<name>-pre` HelmRelease ahead of the primary when pre-manifests exist. Rewire the primary's `dependsOn` to point at `<name>-pre` so the chain is `previous → <name>-pre → <name> → <name>-post → next`. Also fix a pre-existing ordering bug: the next component used to depend on the previous component's primary name, not its terminal (`<prev>-post` for mixed components). New `terminalReleaseNameFor` helper resolves the correct tail; `buildPrimaryDependsOn` and the README renderer both use it. Without this fix, Flux could reconcile the next component in parallel with the previous component's post manifests. Add a `<name>-pre` collision guard mirroring the rule in `pkg/bundler/deployer/localformat/writer.go`. Tests: pre-only, pre+post with terminal-aware next-component dependsOn (regression guard), collision rejection, and a refreshed `TestBuildPrimaryDependsOn` covering mixed, manifest-only, and chart-only previous components. Docs: `pkg/bundler/deployer/flux/doc.go` describes the chain and the example tree now includes `<name>-pre/`; `docs/user/cli-reference.md` updates the Flux Deployment Order bullet. Fixes: NVIDIA#923
…al-aware dependsOn The flux deployer never consumed `ComponentPreManifests`, so the synthesized GKE critical-priority `ResourceQuota` from PR NVIDIA#921 was silently dropped on `--deployer flux` bundles and the original symptom from NVIDIA#915 reproduced. The same gap also blocked the os-talos mixin from working on flux. Add `ComponentPreManifests` to `flux.Generator`, wire it through `bundler.buildDeployer`, and emit a `<name>-pre` HelmRelease ahead of the primary when pre-manifests exist. Rewire the primary's `dependsOn` to point at `<name>-pre` so the chain is `previous → <name>-pre → <name> → <name>-post → next`. Also fix a pre-existing ordering bug: the next component used to depend on the previous component's primary name, not its terminal (`<prev>-post` for mixed components). New `terminalReleaseNameFor` helper resolves the correct tail; `buildPrimaryDependsOn` and the README renderer both use it. Without this fix, Flux could reconcile the next component in parallel with the previous component's post manifests. Add a `<name>-pre` collision guard mirroring the rule in `pkg/bundler/deployer/localformat/writer.go`. Tests: pre-only, pre+post with terminal-aware next-component dependsOn (regression guard), collision rejection, and a refreshed `TestBuildPrimaryDependsOn` covering mixed, manifest-only, and chart-only previous components. Docs: `pkg/bundler/deployer/flux/doc.go` describes the chain and the example tree now includes `<name>-pre/`; `docs/user/cli-reference.md` updates the Flux Deployment Order bullet. Fixes: NVIDIA#923
…al-aware dependsOn The flux deployer never consumed `ComponentPreManifests`, so the synthesized GKE critical-priority `ResourceQuota` from PR NVIDIA#921 was silently dropped on `--deployer flux` bundles and the original symptom from NVIDIA#915 reproduced. The same gap also blocked the os-talos mixin from working on flux. Add `ComponentPreManifests` to `flux.Generator`, wire it through `bundler.buildDeployer`, and emit a `<name>-pre` HelmRelease ahead of the primary when pre-manifests exist. Rewire the primary's `dependsOn` to point at `<name>-pre` so the chain is `previous → <name>-pre → <name> → <name>-post → next`. Also fix a pre-existing ordering bug: the next component used to depend on the previous component's primary name, not its terminal (`<prev>-post` for mixed components). New `terminalReleaseNameFor` helper resolves the correct tail; `buildPrimaryDependsOn` and the README renderer both use it. Without this fix, Flux could reconcile the next component in parallel with the previous component's post manifests. Add a `<name>-pre` collision guard mirroring the rule in `pkg/bundler/deployer/localformat/writer.go`. Tests: pre-only, pre+post with terminal-aware next-component dependsOn (regression guard), collision rejection, and a refreshed `TestBuildPrimaryDependsOn` covering mixed, manifest-only, and chart-only previous components. Docs: `pkg/bundler/deployer/flux/doc.go` describes the chain and the example tree now includes `<name>-pre/`; `docs/user/cli-reference.md` updates the Flux Deployment Order bullet. Fixes: NVIDIA#923
Summary
GKE Standard ships a kube-system
ResourceQuotascoped tosystem-*-criticalPriorityClasses. Per the Kubernetes spec, once any cluster-wide quota scopes byPriorityClassfor those values, pods that request a matching priority class can only be created in namespaces that have a matching quota. AICR bundles emitgpu-operator(and any chart that defaultspriorityClassNametosystem-node-critical) into per-component namespaces, so admission blocks every pod andhelmfile apply --waittimes out after ~10 minutes with an opaqueReplicas: 0/1error.Fixes: #915
Related: N/A
Type of Change
Component(s) Affected
pkg/bundler,pkg/component/*)pkg/recipe)docs/,examples/)Implementation Notes
gkeCriticalPriorityfield on registry component entries (pkg/recipe/components.go). gpu-operator is marked today; future components that emitsystem-*-criticalpods opt in by setting the same field.pkg/bundler/bundler.go::injectGKECriticalPriorityQuotasruns insidecollectComponentPreManifests, so every deployer (helm, helmfile, argocd, argocd-helm, flux) benefits without per-deployer branching. The synthesized YAML is added under filenameaicr/synthesized/gke-critical-pods-quota.yamlin the per-component pre-manifest map; the directory prefix prevents collision with any realPreManifestFilespath declared by an overlay.max(criteria.Nodes × 32, 32). The 32-per-node multiplier covers gpu-operator's ~8–10 critical-priority DaemonSets per GPU node (driver, toolkit, device-plugin, GFD, DCGM, DCGM exporter, MIG manager, validator) plus the controller Deployment, with headroom for rolling-update churn. The 32-pod floor handles recipes that did not declare--nodes(e.g., demo configs). For a 2,000-node cluster the cap is 64,000 — still an admission allowlist, not a real capacity cap.recipes/components/gpu-operator/manifests/gke-resource-quota.yamlwas in ManifestFiles, applied AFTER the chart — that's why the demo workaround had userskubectl applythe quota manually beforehelmfile apply.)injectGKECriticalPriorityQuotasshort-circuits whencriteria.Service \!= gke, so EKS / AKS / OKE / bare-metal / kind bundles never see the synthesized manifest.recipes/components/gpu-operator/manifests/gke-resource-quota.yaml(file) and the correspondingmanifestFilesentry inrecipes/overlays/gke-cos.yaml. The bundler now handles this for every GKE overlay (including futuregke-ubuntuetc.), not just gke-cos.Testing
New tests in
pkg/bundler/bundler_gke_quota_test.go:TestComputeGKECriticalPriorityQuotaPods(6 sub-cases): floor for 0/negative, identity at 1 node, computed for 8/100/2000 nodes.TestRenderGKECriticalPriorityQuota: shape pin on apiVersion/kind/metadata/spec/scopeSelector.TestInjectGKECriticalPriorityQuotas(6 sub-cases): gke+marked+nodes → cap; gke+marked+zero → floor; non-gke+marked → no-op; gke+unmarked → no-op; gke+marked+empty-namespace → skipped with warning; gke+mixed → only marked component synthesized.TestInjectGKECriticalPriorityQuotas_CoexistsWithExistingPreManifests: additive merge with existingPreManifestFiles.TestInjectGKECriticalPriorityQuotas_NilInputs: nil-tolerant contract.Coverage on
pkg/bundler: stable, full project coverage 76.4% (over 75% threshold).Risk Assessment
Rollout notes: Additive on non-GKE recipes (short-circuits). For GKE recipes, the synthesized quota is applied before the chart, fixing the previously-broken default flow. No new CLI flags, no API surface changes. Existing GKE-COS overlays no longer need the manual
manifestFilesentry — removed in the same PR.Checklist
make testwith-race)make lint)git commit -S)