fix(bundler): fix bugs - assemble full OCI artifact in path-based child apps; quote chart name (#1034)#1035
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThis PR fixes two correctness bugs in the argocd-helm bundler's OCI publishing path. First, it updates the templating logic for path-based child Applications to append Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…ote chart name (NVIDIA#1034) Two argocd-helm OCI publishing bugs surfaced by Codex review against the post-NVIDIA#1032 contract. P1 — Path-based child Applications resolve to a non-existent artifact NVIDIA#1032 changed the --set repoURL contract so callers pass only the parent namespace (e.g., oci://reg/org); the parent Application appends .Chart.Name via its separate source.chart field. The parent-App template at parentAppTemplate / line 407 implements this correctly, but the path-based child-App template in injectValuesIntoSingleSource at line 703 was left emitting only .Values.repoURL. Argo CD's generic OCI source (used by path-based children since they have no source.chart) treats spec.source.repoURL as the full artifact reference and resolves it directly, so under the new contract a child source pointed at oci://reg/org:tag — an artifact that doesn't exist — and the child Application failed to sync. Fix: append /{{ .Chart.Name }} to the rendered repoURL value so the assembled URL matches the artifact the parent App's repoURL/chart:tag triple resolves to. The error-message text in the required directive is updated to say "this template appends .Chart.Name" (the path-based template is doing the appending, not the parent App). P2 — Unquoted name and version in generated Chart.yaml writeChartYAML emitted "name: %s" / "version: %s" with raw fmt.Fprintf. (*Reference).ChartName() returns path.Base(Repository), so a valid OCI artifact path whose last segment is a YAML reserved scalar — "null", "true", "false", "yes", "no", numeric strings — produced an unquoted YAML reserved word as the chart's name. Helm's loader then parsed name: null as YAML null, chart.Metadata.Name became empty, and helm package / helm push rejected the chart with "chart.metadata.name is required". Same trap for the version field ("1.0" parses as float). Fix: emit both via %q so the values round-trip as YAML strings. OCI artifact path segments are constrained to printable ASCII by the docker reference grammar, which is the safe charset for Go's %q. Tests - TestInjectValuesIntoSingleSource_AppendsChartName — asserts the rendered path-based child source.repoURL appends /{{ .Chart.Name }} after the user-supplied .Values.repoURL. - TestWriteChartYAML_QuotesYAMLReservedScalarsAsName — table-driven case across null / true / false / numeric / yes / normal name; for each, writes Chart.yaml and verifies yaml.Unmarshal round-trips name back as the expected string. - TestHelmTemplate_RendersWithSetRepoURL — updated to exercise the new contract: --set repoURL=oci://reg/org (parent namespace only). Asserts the parent App's repoURL equals the parent namespace and the child App's repoURL equals parent-namespace + /aicr-bundle. - TestGenerate_CustomChartName and the existing Chart.yaml-version assertion updated to expect quoted scalars. - Golden templates and Chart.yaml fixtures regenerated. Closes NVIDIA#1034 Related: NVIDIA#1032 (the contract change this PR completes), NVIDIA#1019, NVIDIA#965
54709ad to
882d91a
Compare
mchmarny
left a comment
There was a problem hiding this comment.
Both fixes look correct and the test coverage is thorough.
P1 (path-based child repoURL): Append of /{{ .Chart.Name }} matches the artifact the parent App's repoURL/chart:tag triple resolves to (g.chartName() flows into both the parent's source.chart and the chart's Chart.yaml, so they can't drift). The updated required message accurately describes child-template behavior. The TestHelmTemplate_RendersWithSetRepoURL change is the right canary — it now exercises the new contract end-to-end via real helm template.
P2 (Chart.yaml quoting): %q is safe here because OCI artifact path segments are constrained to printable ASCII by the docker reference grammar, which is exactly Go's %q safe charset. Defense-in-depth quoting for version is also sound — Helm's semver validator would reject most reserved scalars downstream, but failing at Chart.yaml parse time with a clear YAML round-trip is friendlier than a Helm push error.
Two nits inline (trailing-slash robustness on the rendered template, and adding a version case to the reserved-scalar table) — neither blocks merge. All 22 chainsaw tests and the rest of CI are green.
| Tag: yamlStringTag, | ||
| Style: yaml.SingleQuotedStyle, | ||
| Value: `{{ required "repoURL is required: pass --set repoURL=<parent namespace> (e.g., oci://<registry>/<path>) — do NOT include the chart name; the parent App appends .Chart.Name to assemble the full OCI artifact reference" .Values.repoURL }}`, | ||
| Value: `{{ required "repoURL is required: pass --set repoURL=<parent namespace> (e.g., oci://<registry>/<path>) — do NOT include the chart name; this template appends .Chart.Name to assemble the full OCI artifact reference" .Values.repoURL }}/{{ .Chart.Name }}`, |
There was a problem hiding this comment.
nit: if a user passes --set repoURL=oci://reg/org/ (trailing slash, easy to copy-paste from a registry UI), this renders oci://reg/org//aicr-bundle — Argo CD will fail to sync with no obvious hint that the slash is the culprit. Cheap fix:
Value: `{{ required "..." .Values.repoURL | trimSuffix "/" }}/{{ .Chart.Name }}`
Same robustness for the parent App's source.chart/source.repoURL pair would be worth a follow-up if it doesn't already do this. Not a blocker.
There was a problem hiding this comment.
Addressed in d8adac9 — the rendered value is now {{ required "..." .Values.repoURL | trimSuffix "/" }}/{{ .Chart.Name }}. So --set repoURL=oci://reg/org/ (with trailing slash) renders cleanly as oci://reg/org/aicr-bundle instead of the double-slash artifact path. Goldens regenerated.
Good catch on the parent App's source.chart / source.repoURL pair too — I'll file that as a follow-up issue since the parent template would need its own trimSuffix treatment and the same caveat applies (silent Argo-CD sync failure with no operator hint).
| } | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
nit: the table covers the "name is a YAML reserved scalar" case but the same risk applies to version — a user-supplied version like "1.0" (no patch) renders as the YAML float 1 once Helm reparses, which is what makes version: "%q" defense-in-depth and not just cosmetic. Worth adding a case here (e.g., chartName: "aicr-bundle", version: "1.0") so the version branch of writeChartYAML is exercised by the same test that documents the rationale.
There was a problem hiding this comment.
Addressed in d8adac9 — restructured the table to take an explicit (chartName, chartVersion) pair and added rows for float-looking ("1.0") and numeric-looking ("123") versions alongside the existing reserved-scalar name rows. The assertion now round-trips both fields through yaml.Unmarshal and checks each against its exact input string, so the version branch of writeChartYAML's %q wrap is exercised by the same test that documents the rationale.
Two non-blocking nits from Mark's review:
Nit 1 — trailing-slash robustness on path-based child repoURL
--set repoURL=oci://reg/org/ (easy to copy-paste from a registry
UI) used to render oci://reg/org//aicr-bundle, which Argo CD fails
to sync with no obvious operator-facing hint that the trailing
slash is the culprit. Pipe .Values.repoURL through trimSuffix "/"
before appending /{{ .Chart.Name }}. Golden fixtures regenerated.
Nit 2 — exercise the version branch of writeChartYAML
The reserved-scalar table only covered name; a user-supplied
version like "1.0" reparses as the YAML float 1 without the %q
wrap, so the version-side quoting is load-bearing, not cosmetic.
Restructured the table to take an explicit (chartName,
chartVersion) pair; added float-looking and numeric-looking
version rows that round-trip the version field too. The
assertion now checks each input round-trips back to its exact
string form on YAML unmarshal.
Drive-by: relaxed two pre-existing substring assertions that
hardcoded `.Values.repoURL }}` to just `.Values.repoURL` so they
survive the new trimSuffix pipeline insertion without losing the
load-bearing intent (they verify the directive references the
caller's value; the exact pipeline shape is documented elsewhere).
Refs NVIDIA#1034, PR NVIDIA#1035
The argocd-helm-oci wrapper script was passing the FULL bundle URL to `helm install --set repoURL=…` (including the per-recipe chart name at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent Application's `source.chart` was hardcoded to `aicr-bundle`. PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template to expect the parent-namespace-only repoURL and to append .Chart.Name itself via the separate `source.chart` field. The wrapper script wasn't updated to match. Result on every PR with argocd-helm-oci Tier-1 KWOK coverage: the parent App resolves to `oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`, the OCI artifact lookup 404s, gpu-operator-post's Application can never sync, and the whole stack times out on `GitOps sync timeout strike 1/3`. The failure was masked on `main` because the most-recent KWOK Cluster Validation run on `main` (#26469449378 at 0d3e62d, success) ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed on main, no fresh KWOK run has triggered on `main` yet — but the next one will fail the same way every open PR's argocd-helm-oci Tier-1 jobs are currently failing. Fix is a one-line drop of the per-recipe suffix from OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle. The flux branch keeps the per-recipe suffix because flux's OCIRepository CR consumes the FULL artifact URL (recipe segment included). Updated the surrounding comment to point at the post-NVIDIA#1032 contract so the next reader understands the asymmetry. End-to-end check (verified from PR NVIDIA#1030's debug artifact at b3f2296): repo-server log shows `registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`, caused by the same double-append. With the recipe suffix dropped, Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed artifact at `oci://…/aicr/<recipe>:<tag>`. Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App template enforcement).
Codex's P2 note on the previous wrapper-script fix: dropping the
per-recipe suffix from OCI_IN_CLUSTER_REF unconditionally broke the
argocd-oci lane. The two argocd-* deployers consume that value
differently:
- argocd-oci bakes it into each flat Application via `--repo`. The
rendered Applications use `spec.source.repoURL` as the FULL OCI
artifact location and never append .Chart.Name. They need the
per-recipe suffix in the value.
- argocd-helm-oci passes it through to `helm install --set repoURL`
at install time. The wrapper chart's parent Application appends
.Chart.Name via its separate `source.chart` field, and the
path-based child template appends /{{ .Chart.Name }} as a string,
so both halves of the bundle would double-resolve the recipe
segment if it were in the value already.
Fix: keep OCI_IN_CLUSTER_REF as the FULL artifact URL (its
generate_bundle-side contract, used by argocd-oci) and strip the
trailing "/<recipe>" only at the argocd-helm-oci helm-install site
via parameter expansion (helm_repo_url="${OCI_IN_CLUSTER_REF%/*}").
Documented the asymmetry inline at both sites so the next reader sees
why the two consumers disagree.
Refs PR NVIDIA#1030, PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App
template enforcement). Restores argocd-oci correctness while keeping
the post-NVIDIA#1032 contract fix for argocd-helm-oci.
The argocd-helm-oci wrapper script was passing the FULL bundle URL to `helm install --set repoURL=…` (including the per-recipe chart name at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent Application's `source.chart` was hardcoded to `aicr-bundle`. PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template to expect the parent-namespace-only repoURL and to append .Chart.Name itself via the separate `source.chart` field. The wrapper script wasn't updated to match. Result on every PR with argocd-helm-oci Tier-1 KWOK coverage: the parent App resolves to `oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`, the OCI artifact lookup 404s, gpu-operator-post's Application can never sync, and the whole stack times out on `GitOps sync timeout strike 1/3`. The failure was masked on `main` because the most-recent KWOK Cluster Validation run on `main` (#26469449378 at 0d3e62d, success) ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed on main, no fresh KWOK run has triggered on `main` yet — but the next one will fail the same way every open PR's argocd-helm-oci Tier-1 jobs are currently failing. Fix is a one-line drop of the per-recipe suffix from OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle. The flux branch keeps the per-recipe suffix because flux's OCIRepository CR consumes the FULL artifact URL (recipe segment included). Updated the surrounding comment to point at the post-NVIDIA#1032 contract so the next reader understands the asymmetry. End-to-end check (verified from PR NVIDIA#1030's debug artifact at b3f2296): repo-server log shows `registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`, caused by the same double-append. With the recipe suffix dropped, Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed artifact at `oci://…/aicr/<recipe>:<tag>`. Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App template enforcement).
Summary
Two argocd-helm OCI publishing bugs against the post-#1032 contract:
spec.source.repoURLtemplate never appended.Chart.Name. Under fix(bundler): derive argocd-helm chart name from OCI artifact path #1032's contract (--set repoURL=<parent namespace>), the parent App appends.Chart.Namevia its separatesource.chartfield, but path-based children have nochartfield — Argo CD's generic OCI source resolvesrepoURLdirectly. Bundles with manifest-only / mixed-local-chart children silently failed to sync.writeChartYAMLemittedname:andversion:unquoted. Valid OCI artifact paths whose last segment is a YAML reserved scalar (null,true,false,yes,no, numeric strings) producedname: nulletc., which Helm's YAML parser interprets as the non-string scalar, failing the chart with "chart.metadata.name is required".Motivation / Context
Found via Codex review of
origin/mainat51eb4b5f. P1 is a regression introduced today by #1032 (which landed the parent-App + bundle-chart-name fix for #1019 but left the path-based child template on the old--set repoURL=<full bundle URL>contract). P2 is a latent corner case that the same review surfaced.Fixes: #1034
Related: #1032 (the parent-App contract change this PR completes), #1019 (original chart-name bug), #965 (DRA stale-NVML mitigation, unrelated but in the same OCI chain)
Type of Change
Component(s) Affected
pkg/bundler,pkg/component/*) —pkg/bundler/deployer/argocdhelm/argocdhelm.goImplementation Notes
P1 —
injectValuesIntoSingleSource(line 703). Append/{{ .Chart.Name }}to the renderedrepoURLvalue so the assembled URL matches the artifact the parent'srepoURL/chart:tagtriple resolves to. Updated therequirederror message to reflect that the path-based template itself is doing the chart-name appending (the parent App also appends it, but via the separatesource.chartfield; the wording was misleading on the child).P2 —
writeChartYAML(line 944). Emitnameandversionviafmt.Fprintfwith%q. OCI artifact path segments are constrained to printable ASCII by the docker reference grammar, which is exactly the safe charset for Go's%q(so the rendered YAML is always a clean double-quoted string).Testing
Tests added:
TestInjectValuesIntoSingleSource_AppendsChartName— asserts the rendered childsource.repoURLvalue is…/{{ .Chart.Name }}(with the originalrequireddirective intact).TestWriteChartYAML_QuotesYAMLReservedScalarsAsName— table-driven case acrossnull,true,false,123,yes, and a normal name. For each, writesChart.yamland assertsyaml.Unmarshalround-tripsnameback as the expected string.Tests updated:
TestHelmTemplate_RendersWithSetRepoURL— exercises the new contract (--set repoURL=oci://reg/org, parent namespace only) end-to-end via realhelm template. Asserts the parent App'sspec.source.repoURLequals the parent namespace and the path-based child'sspec.source.repoURLequals parent-namespace +/aicr-bundle. This test was the canary: it was passing under the old contract (full bundle URL) because the parent App'ssource.chartwas being ignored at install time; my fix surfaces the contract drift.TestGenerate_CustomChartNameand the existingversion: 2.5.0assertion updated to expect quoted scalars.testdata/helm_and_manifest_only/andtestdata/mixed_component/) regenerated: childrepoURLlines andChart.yamlname/versionlines now reflect the fix.Risk Assessment
/{{ .Chart.Name }}only takes effect under the new--set repoURL=<parent namespace>contract).Rollout notes: Anyone who picked up #1032's contract change since today (2026-05-26) needs this fix before publishing argocd-helm bundles with raw-manifest children to OCI. Existing bundles already pushed to OCI work as-is; the fix takes effect on the next
aicr bundle --output oci://…run.Checklist
make testwith-race) — viamake qualifymake lint) — viamake qualifyrequireddirective is the only "doc" touchedgit commit -S)