Skip to content

feat(bundler): configurable parent Application name (--app-name)#1036

Merged
mchmarny merged 4 commits into
mainfrom
feat/configurable-app-name-1011
May 26, 2026
Merged

feat(bundler): configurable parent Application name (--app-name)#1036
mchmarny merged 4 commits into
mainfrom
feat/configurable-app-name-1011

Conversation

@mchmarny
Copy link
Copy Markdown
Member

Summary

Operators can now deploy multiple non-overlapping AICR bundles to the same Argo CD namespace by passing a distinct --app-name (or app-name query parameter) per bundle. The flag flows through CLI, API, and config-file surfaces and is honored by both the argocd and argocd-helm deployers.

Motivation / Context

The argocd and argocd-helm deployers hardcoded the parent Argo Application name (nvidia-stack and aicr-stack). Two bundles installed into the same Argo CD namespace would each create a parent Application with the same name — the second overwrote the first, orphaning every child Application that the first had been managing. This blocks the multi-bundle topology intended for clusters that consume more than one recipe.

Fixes: #1011

Type of Change

  • New feature (non-breaking change that adds functionality)

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Docs/examples (docs/, examples/)

Implementation Notes

Plumbing: appName flows through pkg/bundler/config.Config (private field, AppName() getter, WithAppName option) and pkg/config/DeploymentSpec.AppName (YAML/JSON appName). BundleSpec.Resolve validates the value as a DNS-1123 subdomain at the wire-to-typed boundary.

Validation: config.ValidateAppName (in pkg/bundler/config) returns ErrCodeInvalidRequest for non-DNS-1123 input. Empty is allowed and means "use the deployer's default." CLI/API both call it before constructing the bundler config.

Deployer behavior:

  • argocd-helm: parent App template reads {{ .Values.appName | default "aicr-stack" | quote }}. When --app-name is supplied, the value is written into the bundle's root values.yaml so it is the chart default; helm install --set appName=... still wins. This matches the URL-portable contract that argocd-helm already follows for repoURL/targetRevision.
  • argocd: the value is threaded into AppOfAppsData.AppName and ReadmeData.AppName, baked into the rendered app-of-apps.yaml and the README's argocd app get/sync examples at bundle time. This deployer materializes a static manifest, not a chart, so the choice cannot be deferred to apply time.

Strictness: --app-name / app-name is rejected on --deployer helm and --deployer flux (CLI returns ErrCodeInvalidRequest; API returns 400). Silently accepting it would mislead operators expecting their flag to take effect.

Template filename: kept fixed at templates/aicr-stack.yaml for argocd-helm even when --app-name is overridden. The rendered metadata.name is dynamic via .Values.appName; the filename is bundle-internal and stable across install-time --set overrides.

Out of scope (per #1011 thread):

Testing

make qualify   # tests + lint + e2e + scan + license headers — all passed

New tests:

  • config.TestWithAppName, config.TestValidateAppName — option round-trip + DNS-1123 validation table
  • argocdhelm.TestHelmTemplate_AppNameOverride — live helm template covering default, bundle-time AppName, install-time --set appName, and bundle-time + install-time-override combo
  • argocd.TestGenerate_AppName — default vs explicit AppName flows into both app-of-apps.yaml and README
  • cli.TestParseBundleCmdOptions_AppName — flag parsing across argocd, argocd-helm, helm, flux deployers + invalid DNS rejection
  • bundler.TestBundleRequestAppNameParam — API query-param accepted on argocd/argocd-helm, 400 on helm + invalid names
  • Argocdhelm goldens regenerated for the new .Values.appName template line

Risk Assessment

  • Low — Defaults preserve existing behavior verbatim (aicr-stack / nvidia-stack still in the rendered output for any caller that omits --app-name). The flag is gated by deployer kind so a typo on a helm/flux bundle fails loudly. Easy to revert.

Rollout notes: No migration required. Existing recipes and config files continue to bundle identically. Adopters with two bundles in one Argo CD namespace can now set distinct --app-name values.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed (CLI reference, API reference, OpenAPI spec)
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

The argocd and argocd-helm deployers hardcoded the parent Argo
Application name (`aicr-stack` and `nvidia-stack`), so two AICR bundles
deployed to the same Argo CD namespace silently overwrote each other —
the second bundle replaced the first parent Application and orphaned
the first bundle's children. This blocks the multi-bundle topology
intended for clusters that consume more than one recipe.

This change adds a new `appName` plumbing surface that flows through
the bundler config:

  - CLI: `aicr bundle --app-name <name>`
  - API: `POST /v1/bundle?app-name=<name>`
  - Config: `spec.bundle.deployment.appName`

Defaults (`aicr-stack` for argocd-helm, `nvidia-stack` for argocd) are
preserved for backward compatibility. The name is validated as a
DNS-1123 subdomain at the parse boundary so an invalid value fails fast
in `aicr bundle` / 400 from the API, not at apiserver admission time.
Rejected on `--deployer helm` / `--deployer flux` so a misplaced flag
fails loudly.

For `--deployer argocd-helm`, the bundle-time value is written into
the chart's root values.yaml and the parent App template reads from
`{{ .Values.appName | default "aicr-stack" | quote }}`, so an operator
can still override at install time:

    helm install gpu-stack . --set appName=ops-runtime ...

For `--deployer argocd`, the value is baked into the rendered
app-of-apps.yaml — that deployer materializes a static manifest, not a
chart, so the choice cannot be deferred to apply time. The generated
README's `argocd app get/sync` examples are templated to match.

Out of scope (per #1011 thread):
  - Flux template parent-Kustomization name (separate doc-only fix)
  - argocd-helm chart name `aicr-bundle` (already fixed in #1032)

Fixes #1011
@github-actions
Copy link
Copy Markdown
Contributor

@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Coverage Report ✅

Metric Value
Coverage 77.0%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-77.0%25-green)

Merging this branch changes the coverage (1 decrease, 5 increase)

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/aicr/pkg/bundler 69.34% (+1.50%) 👍
github.com/NVIDIA/aicr/pkg/bundler/config 97.69% (+0.09%) 👍
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocd 81.21% (+0.71%) 👍
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocdhelm 81.82% (+0.20%) 👍
github.com/NVIDIA/aicr/pkg/cli 65.20% (+0.22%) 👍
github.com/NVIDIA/aicr/pkg/config 92.68% (-0.77%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/aicr/pkg/bundler/bundler.go 67.24% (+1.16%) 519 349 (+6) 170 (-6) 👍
github.com/NVIDIA/aicr/pkg/bundler/config/config.go 97.31% (+0.12%) 186 (+8) 181 (+8) 5 👍
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocd/argocd.go 81.21% (+0.71%) 165 (+6) 134 (+6) 31 👍
github.com/NVIDIA/aicr/pkg/bundler/deployer/argocdhelm/argocdhelm.go 81.82% (+0.20%) 374 (+4) 306 (+4) 68 👍
github.com/NVIDIA/aicr/pkg/bundler/handler.go 76.92% (+2.47%) 143 (+6) 110 (+8) 33 (-2) 👍
github.com/NVIDIA/aicr/pkg/cli/bundle.go 47.31% (+2.58%) 167 (+6) 79 (+7) 88 (-1) 👍
github.com/NVIDIA/aicr/pkg/config/config.go 0.00% (ø) 0 0 0
github.com/NVIDIA/aicr/pkg/config/resolve.go 94.79% (-1.48%) 192 (+4) 182 (+1) 10 (+3) 👎

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

@mchmarny mchmarny enabled auto-merge (squash) May 26, 2026 19:45
CodeRabbit review on #1036:

1. cli-reference.md `--app-name` entry omitted `helmfile` from the
   "rejected on" list — corrected to mention all non-Argo deployers
   (matches the API docs' "Rejected on other deployers" wording).

2. argocd.Generator.Generate / argocdhelm.Generator.Generate now
   validate AppName at the deployer boundary via
   bundlercfg.ValidateAppName. Direct library callers (bypassing the
   CLI/API validation layer) can no longer ship a manifest whose name
   would only be rejected at apiserver admission. Empty still resolves
   to the deployer's default. New TestGenerate_AppNameValidatedAtBoundary
   pins this in both deployers.

3. `--app-name` was missing from runBundleCmd's validateSingleValueFlags
   list — added so repeated `--app-name first --app-name second` is
   rejected. New chainsaw step in tests/chainsaw/cli/duplicate-flags
   exercises the integration path end-to-end.
@yuanchen8911 yuanchen8911 self-requested a review May 26, 2026 19:58
@mchmarny mchmarny requested a review from lalitadithya May 26, 2026 20:00
@mchmarny mchmarny merged commit 971024e into main May 26, 2026
33 checks passed
@mchmarny mchmarny deleted the feat/configurable-app-name-1011 branch May 26, 2026 20:04
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
The argocd-helm-oci wrapper script was passing the FULL bundle URL to
`helm install --set repoURL=…` (including the per-recipe chart name
at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent
Application's `source.chart` was hardcoded to `aicr-bundle`.

PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template
to expect the parent-namespace-only repoURL and to append .Chart.Name
itself via the separate `source.chart` field. The wrapper script
wasn't updated to match. Result on every PR with argocd-helm-oci
Tier-1 KWOK coverage: the parent App resolves to
`oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`,
the OCI artifact lookup 404s, gpu-operator-post's Application can
never sync, and the whole stack times out on
`GitOps sync timeout strike 1/3`.

The failure was masked on `main` because the most-recent KWOK
Cluster Validation run on `main` (#26469449378 at 0d3e62d, success)
ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed
on main, no fresh KWOK run has triggered on `main` yet — but the
next one will fail the same way every open PR's argocd-helm-oci
Tier-1 jobs are currently failing.

Fix is a one-line drop of the per-recipe suffix from
OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle.
The flux branch keeps the per-recipe suffix because flux's
OCIRepository CR consumes the FULL artifact URL (recipe segment
included). Updated the surrounding comment to point at the
post-NVIDIA#1032 contract so the next reader understands the asymmetry.

End-to-end check (verified from PR NVIDIA#1030's debug artifact at
b3f2296): repo-server log shows
`registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`,
caused by the same double-append. With the recipe suffix dropped,
Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed
artifact at `oci://…/aicr/<recipe>:<tag>`.

Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change),
PR NVIDIA#1035 (parent App template enforcement).
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
The argocd-helm-oci wrapper script was passing the FULL bundle URL to
`helm install --set repoURL=…` (including the per-recipe chart name
at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent
Application's `source.chart` was hardcoded to `aicr-bundle`.

PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template
to expect the parent-namespace-only repoURL and to append .Chart.Name
itself via the separate `source.chart` field. The wrapper script
wasn't updated to match. Result on every PR with argocd-helm-oci
Tier-1 KWOK coverage: the parent App resolves to
`oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`,
the OCI artifact lookup 404s, gpu-operator-post's Application can
never sync, and the whole stack times out on
`GitOps sync timeout strike 1/3`.

The failure was masked on `main` because the most-recent KWOK
Cluster Validation run on `main` (#26469449378 at 0d3e62d, success)
ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed
on main, no fresh KWOK run has triggered on `main` yet — but the
next one will fail the same way every open PR's argocd-helm-oci
Tier-1 jobs are currently failing.

Fix is a one-line drop of the per-recipe suffix from
OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle.
The flux branch keeps the per-recipe suffix because flux's
OCIRepository CR consumes the FULL artifact URL (recipe segment
included). Updated the surrounding comment to point at the
post-NVIDIA#1032 contract so the next reader understands the asymmetry.

End-to-end check (verified from PR NVIDIA#1030's debug artifact at
b3f2296): repo-server log shows
`registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`,
caused by the same double-append. With the recipe suffix dropped,
Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed
artifact at `oci://…/aicr/<recipe>:<tag>`.

Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change),
PR NVIDIA#1035 (parent App template enforcement).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(bundler): configurable parent Application name for multi-bundle clusters

3 participants