Skip to content

fix(bundler): fix bugs - assemble full OCI artifact in path-based child apps; quote chart name (#1034)#1035

Merged
mchmarny merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:fix/1034-argocd-helm-oci-bugs
May 26, 2026
Merged

fix(bundler): fix bugs - assemble full OCI artifact in path-based child apps; quote chart name (#1034)#1035
mchmarny merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:fix/1034-argocd-helm-oci-bugs

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

Summary

Two argocd-helm OCI publishing bugs against the post-#1032 contract:

  1. P1 — Path-based child Applications were resolving to a non-existent artifact because their spec.source.repoURL template never appended .Chart.Name. Under fix(bundler): derive argocd-helm chart name from OCI artifact path #1032's contract (--set repoURL=<parent namespace>), the parent App appends .Chart.Name via its separate source.chart field, but path-based children have no chart field — Argo CD's generic OCI source resolves repoURL directly. Bundles with manifest-only / mixed-local-chart children silently failed to sync.
  2. P2writeChartYAML emitted name: and version: unquoted. Valid OCI artifact paths whose last segment is a YAML reserved scalar (null, true, false, yes, no, numeric strings) produced name: null etc., which Helm's YAML parser interprets as the non-string scalar, failing the chart with "chart.metadata.name is required".

Motivation / Context

Found via Codex review of origin/main at 51eb4b5f. P1 is a regression introduced today by #1032 (which landed the parent-App + bundle-chart-name fix for #1019 but left the path-based child template on the old --set repoURL=<full bundle URL> contract). P2 is a latent corner case that the same review surfaced.

Fixes: #1034
Related: #1032 (the parent-App contract change this PR completes), #1019 (original chart-name bug), #965 (DRA stale-NVML mitigation, unrelated but in the same OCI chain)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Component(s) Affected

  • Bundlers (pkg/bundler, pkg/component/*) — pkg/bundler/deployer/argocdhelm/argocdhelm.go

Implementation Notes

P1 — injectValuesIntoSingleSource (line 703). Append /{{ .Chart.Name }} to the rendered repoURL value so the assembled URL matches the artifact the parent's repoURL/chart:tag triple resolves to. Updated the required error message to reflect that the path-based template itself is doing the chart-name appending (the parent App also appends it, but via the separate source.chart field; the wording was misleading on the child).

P2 — writeChartYAML (line 944). Emit name and version via fmt.Fprintf with %q. OCI artifact path segments are constrained to printable ASCII by the docker reference grammar, which is exactly the safe charset for Go's %q (so the rendered YAML is always a clean double-quoted string).

Testing

unset GITLAB_TOKEN ; make qualify
# All 22 chainsaw tests pass.
# Coverage: 76.9% > 75% floor.
# Vulnerability scan: clean.

Tests added:

  • TestInjectValuesIntoSingleSource_AppendsChartName — asserts the rendered child source.repoURL value is …/{{ .Chart.Name }} (with the original required directive intact).
  • TestWriteChartYAML_QuotesYAMLReservedScalarsAsName — table-driven case across null, true, false, 123, yes, and a normal name. For each, writes Chart.yaml and asserts yaml.Unmarshal round-trips name back as the expected string.

Tests updated:

  • TestHelmTemplate_RendersWithSetRepoURL — exercises the new contract (--set repoURL=oci://reg/org, parent namespace only) end-to-end via real helm template. Asserts the parent App's spec.source.repoURL equals the parent namespace and the path-based child's spec.source.repoURL equals parent-namespace + /aicr-bundle. This test was the canary: it was passing under the old contract (full bundle URL) because the parent App's source.chart was being ignored at install time; my fix surfaces the contract drift.
  • TestGenerate_CustomChartName and the existing version: 2.5.0 assertion updated to expect quoted scalars.
  • Golden fixtures (testdata/helm_and_manifest_only/ and testdata/mixed_component/) regenerated: child repoURL lines and Chart.yaml name/version lines now reflect the fix.

Risk Assessment

  • Low — narrow scope, two-line code change plus assertion updates. No new external dependencies, no surface-area expansion. Bundles produced by this PR are correct for the new contract; bundles produced by the pre-fix(bundler): derive argocd-helm chart name from OCI artifact path #1032 codepath are unaffected (the child template change is additive — /{{ .Chart.Name }} only takes effect under the new --set repoURL=<parent namespace> contract).

Rollout notes: Anyone who picked up #1032's contract change since today (2026-05-26) needs this fix before publishing argocd-helm bundles with raw-manifest children to OCI. Existing bundles already pushed to OCI work as-is; the fix takes effect on the next aicr bundle --output oci://… run.

Checklist

  • Tests pass locally (make test with -race) — via make qualify
  • Linter passes (make lint) — via make qualify
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed — no user-facing surface change; the error-message text inside the required directive is the only "doc" touched
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 53b4eb4d-c587-4d28-8b15-632f1ef02662

📥 Commits

Reviewing files that changed from the base of the PR and between 54709ad and 882d91a.

📒 Files selected for processing (6)
  • pkg/bundler/deployer/argocdhelm/argocdhelm.go
  • pkg/bundler/deployer/argocdhelm/argocdhelm_test.go
  • pkg/bundler/deployer/argocdhelm/testdata/helm_and_manifest_only/Chart.yaml
  • pkg/bundler/deployer/argocdhelm/testdata/helm_and_manifest_only/templates/nodewright-customizations.yaml
  • pkg/bundler/deployer/argocdhelm/testdata/mixed_component/Chart.yaml
  • pkg/bundler/deployer/argocdhelm/testdata/mixed_component/templates/gpu-operator-post.yaml

📝 Walkthrough

Walkthrough

This PR fixes two correctness bugs in the argocd-helm bundler's OCI publishing path. First, it updates the templating logic for path-based child Applications to append .Chart.Name to the user-provided repoURL, ensuring they resolve to the correct OCI artifact path when the bundle is published. Second, it modifies the Chart.yaml generation to quote the name and version fields, preventing YAML reserved scalars from being misparsed by Helm. The changes include new unit tests validating both fixes and updates to existing tests and testdata to reflect the corrected behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/aicr#1032: Introduced the path-based child Application .Chart.Name contract; this PR completes the implementation for the child Application template alongside Chart.yaml generation.

Suggested labels

size/L

Suggested reviewers

  • mchmarny
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the two main bugs fixed: appending chart name to path-based child apps' OCI artifact URLs and quoting chart names in YAML output to prevent YAML reserved-scalar parsing errors.
Description check ✅ Passed The description comprehensively explains both P1 and P2 bugs, their root causes, reproduction context, implementation details, testing approach, and risk assessment—all directly related to the changeset.
Linked Issues check ✅ Passed The code changes fully address all acceptance criteria from #1034: injectValuesIntoSingleSource appends /{{ .Chart.Name }} for path-based children, writeChartYAML quotes name and version via %q, and tests verify both behaviors including YAML reserved-scalar handling.
Out of Scope Changes check ✅ Passed All changes are narrowly scoped to the two bugs in #1034: the argocdhelm.go logic fix, test coverage, and golden fixture updates. No unrelated changes or scope creep detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuanchen8911 yuanchen8911 requested review from lockwobr and mchmarny May 26, 2026 19:10
@yuanchen8911 yuanchen8911 changed the title fix(bundler): assemble full OCI artifact in path-based child apps; quote chart name (#1034) fix(bundler): fix bugs - assemble full OCI artifact in path-based child apps; quote chart name (#1034) May 26, 2026
…ote chart name (NVIDIA#1034)

Two argocd-helm OCI publishing bugs surfaced by Codex review against
the post-NVIDIA#1032 contract.

P1 — Path-based child Applications resolve to a non-existent artifact

NVIDIA#1032 changed the --set repoURL contract so callers pass only the
parent namespace (e.g., oci://reg/org); the parent Application
appends .Chart.Name via its separate source.chart field. The
parent-App template at parentAppTemplate / line 407 implements this
correctly, but the path-based child-App template in
injectValuesIntoSingleSource at line 703 was left emitting only
.Values.repoURL. Argo CD's generic OCI source (used by path-based
children since they have no source.chart) treats spec.source.repoURL
as the full artifact reference and resolves it directly, so under
the new contract a child source pointed at oci://reg/org:tag — an
artifact that doesn't exist — and the child Application failed to
sync.

Fix: append /{{ .Chart.Name }} to the rendered repoURL value so the
assembled URL matches the artifact the parent App's repoURL/chart:tag
triple resolves to. The error-message text in the required directive
is updated to say "this template appends .Chart.Name" (the path-based
template is doing the appending, not the parent App).

P2 — Unquoted name and version in generated Chart.yaml

writeChartYAML emitted "name: %s" / "version: %s" with raw
fmt.Fprintf. (*Reference).ChartName() returns path.Base(Repository),
so a valid OCI artifact path whose last segment is a YAML reserved
scalar — "null", "true", "false", "yes", "no", numeric strings —
produced an unquoted YAML reserved word as the chart's name. Helm's
loader then parsed name: null as YAML null, chart.Metadata.Name
became empty, and helm package / helm push rejected the chart with
"chart.metadata.name is required". Same trap for the version field
("1.0" parses as float).

Fix: emit both via %q so the values round-trip as YAML strings. OCI
artifact path segments are constrained to printable ASCII by the
docker reference grammar, which is the safe charset for Go's %q.

Tests

- TestInjectValuesIntoSingleSource_AppendsChartName — asserts the
  rendered path-based child source.repoURL appends /{{ .Chart.Name }}
  after the user-supplied .Values.repoURL.
- TestWriteChartYAML_QuotesYAMLReservedScalarsAsName — table-driven
  case across null / true / false / numeric / yes / normal name; for
  each, writes Chart.yaml and verifies yaml.Unmarshal round-trips
  name back as the expected string.
- TestHelmTemplate_RendersWithSetRepoURL — updated to exercise the
  new contract: --set repoURL=oci://reg/org (parent namespace only).
  Asserts the parent App's repoURL equals the parent namespace and
  the child App's repoURL equals parent-namespace + /aicr-bundle.
- TestGenerate_CustomChartName and the existing Chart.yaml-version
  assertion updated to expect quoted scalars.
- Golden templates and Chart.yaml fixtures regenerated.

Closes NVIDIA#1034
Related: NVIDIA#1032 (the contract change this PR completes), NVIDIA#1019, NVIDIA#965
@yuanchen8911 yuanchen8911 force-pushed the fix/1034-argocd-helm-oci-bugs branch from 54709ad to 882d91a Compare May 26, 2026 19:12
Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both fixes look correct and the test coverage is thorough.

P1 (path-based child repoURL): Append of /{{ .Chart.Name }} matches the artifact the parent App's repoURL/chart:tag triple resolves to (g.chartName() flows into both the parent's source.chart and the chart's Chart.yaml, so they can't drift). The updated required message accurately describes child-template behavior. The TestHelmTemplate_RendersWithSetRepoURL change is the right canary — it now exercises the new contract end-to-end via real helm template.

P2 (Chart.yaml quoting): %q is safe here because OCI artifact path segments are constrained to printable ASCII by the docker reference grammar, which is exactly Go's %q safe charset. Defense-in-depth quoting for version is also sound — Helm's semver validator would reject most reserved scalars downstream, but failing at Chart.yaml parse time with a clear YAML round-trip is friendlier than a Helm push error.

Two nits inline (trailing-slash robustness on the rendered template, and adding a version case to the reserved-scalar table) — neither blocks merge. All 22 chainsaw tests and the rest of CI are green.

Tag: yamlStringTag,
Style: yaml.SingleQuotedStyle,
Value: `{{ required "repoURL is required: pass --set repoURL=<parent namespace> (e.g., oci://<registry>/<path>) — do NOT include the chart name; the parent App appends .Chart.Name to assemble the full OCI artifact reference" .Values.repoURL }}`,
Value: `{{ required "repoURL is required: pass --set repoURL=<parent namespace> (e.g., oci://<registry>/<path>) — do NOT include the chart name; this template appends .Chart.Name to assemble the full OCI artifact reference" .Values.repoURL }}/{{ .Chart.Name }}`,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if a user passes --set repoURL=oci://reg/org/ (trailing slash, easy to copy-paste from a registry UI), this renders oci://reg/org//aicr-bundle — Argo CD will fail to sync with no obvious hint that the slash is the culprit. Cheap fix:

Value: `{{ required "..." .Values.repoURL | trimSuffix "/" }}/{{ .Chart.Name }}`

Same robustness for the parent App's source.chart/source.repoURL pair would be worth a follow-up if it doesn't already do this. Not a blocker.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d8adac9 — the rendered value is now {{ required "..." .Values.repoURL | trimSuffix "/" }}/{{ .Chart.Name }}. So --set repoURL=oci://reg/org/ (with trailing slash) renders cleanly as oci://reg/org/aicr-bundle instead of the double-slash artifact path. Goldens regenerated.

Good catch on the parent App's source.chart / source.repoURL pair too — I'll file that as a follow-up issue since the parent template would need its own trimSuffix treatment and the same caveat applies (silent Argo-CD sync failure with no operator hint).

}
})
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the table covers the "name is a YAML reserved scalar" case but the same risk applies to version — a user-supplied version like "1.0" (no patch) renders as the YAML float 1 once Helm reparses, which is what makes version: "%q" defense-in-depth and not just cosmetic. Worth adding a case here (e.g., chartName: "aicr-bundle", version: "1.0") so the version branch of writeChartYAML is exercised by the same test that documents the rationale.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d8adac9 — restructured the table to take an explicit (chartName, chartVersion) pair and added rows for float-looking ("1.0") and numeric-looking ("123") versions alongside the existing reserved-scalar name rows. The assertion now round-trips both fields through yaml.Unmarshal and checks each against its exact input string, so the version branch of writeChartYAML's %q wrap is exercised by the same test that documents the rationale.

@mchmarny mchmarny merged commit d638d00 into NVIDIA:main May 26, 2026
30 checks passed
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
Two non-blocking nits from Mark's review:

Nit 1 — trailing-slash robustness on path-based child repoURL
  --set repoURL=oci://reg/org/ (easy to copy-paste from a registry
  UI) used to render oci://reg/org//aicr-bundle, which Argo CD fails
  to sync with no obvious operator-facing hint that the trailing
  slash is the culprit. Pipe .Values.repoURL through trimSuffix "/"
  before appending /{{ .Chart.Name }}. Golden fixtures regenerated.

Nit 2 — exercise the version branch of writeChartYAML
  The reserved-scalar table only covered name; a user-supplied
  version like "1.0" reparses as the YAML float 1 without the %q
  wrap, so the version-side quoting is load-bearing, not cosmetic.
  Restructured the table to take an explicit (chartName,
  chartVersion) pair; added float-looking and numeric-looking
  version rows that round-trip the version field too. The
  assertion now checks each input round-trips back to its exact
  string form on YAML unmarshal.

Drive-by: relaxed two pre-existing substring assertions that
hardcoded `.Values.repoURL }}` to just `.Values.repoURL` so they
survive the new trimSuffix pipeline insertion without losing the
load-bearing intent (they verify the directive references the
caller's value; the exact pipeline shape is documented elsewhere).

Refs NVIDIA#1034, PR NVIDIA#1035
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
The argocd-helm-oci wrapper script was passing the FULL bundle URL to
`helm install --set repoURL=…` (including the per-recipe chart name
at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent
Application's `source.chart` was hardcoded to `aicr-bundle`.

PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template
to expect the parent-namespace-only repoURL and to append .Chart.Name
itself via the separate `source.chart` field. The wrapper script
wasn't updated to match. Result on every PR with argocd-helm-oci
Tier-1 KWOK coverage: the parent App resolves to
`oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`,
the OCI artifact lookup 404s, gpu-operator-post's Application can
never sync, and the whole stack times out on
`GitOps sync timeout strike 1/3`.

The failure was masked on `main` because the most-recent KWOK
Cluster Validation run on `main` (#26469449378 at 0d3e62d, success)
ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed
on main, no fresh KWOK run has triggered on `main` yet — but the
next one will fail the same way every open PR's argocd-helm-oci
Tier-1 jobs are currently failing.

Fix is a one-line drop of the per-recipe suffix from
OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle.
The flux branch keeps the per-recipe suffix because flux's
OCIRepository CR consumes the FULL artifact URL (recipe segment
included). Updated the surrounding comment to point at the
post-NVIDIA#1032 contract so the next reader understands the asymmetry.

End-to-end check (verified from PR NVIDIA#1030's debug artifact at
b3f2296): repo-server log shows
`registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`,
caused by the same double-append. With the recipe suffix dropped,
Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed
artifact at `oci://…/aicr/<recipe>:<tag>`.

Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change),
PR NVIDIA#1035 (parent App template enforcement).
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
Codex's P2 note on the previous wrapper-script fix: dropping the
per-recipe suffix from OCI_IN_CLUSTER_REF unconditionally broke the
argocd-oci lane. The two argocd-* deployers consume that value
differently:

  - argocd-oci bakes it into each flat Application via `--repo`. The
    rendered Applications use `spec.source.repoURL` as the FULL OCI
    artifact location and never append .Chart.Name. They need the
    per-recipe suffix in the value.
  - argocd-helm-oci passes it through to `helm install --set repoURL`
    at install time. The wrapper chart's parent Application appends
    .Chart.Name via its separate `source.chart` field, and the
    path-based child template appends /{{ .Chart.Name }} as a string,
    so both halves of the bundle would double-resolve the recipe
    segment if it were in the value already.

Fix: keep OCI_IN_CLUSTER_REF as the FULL artifact URL (its
generate_bundle-side contract, used by argocd-oci) and strip the
trailing "/<recipe>" only at the argocd-helm-oci helm-install site
via parameter expansion (helm_repo_url="${OCI_IN_CLUSTER_REF%/*}").
Documented the asymmetry inline at both sites so the next reader sees
why the two consumers disagree.

Refs PR NVIDIA#1030, PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App
template enforcement). Restores argocd-oci correctness while keeping
the post-NVIDIA#1032 contract fix for argocd-helm-oci.
yuanchen8911 added a commit to yuanchen8911/aicr that referenced this pull request May 26, 2026
The argocd-helm-oci wrapper script was passing the FULL bundle URL to
`helm install --set repoURL=…` (including the per-recipe chart name
at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent
Application's `source.chart` was hardcoded to `aicr-bundle`.

PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template
to expect the parent-namespace-only repoURL and to append .Chart.Name
itself via the separate `source.chart` field. The wrapper script
wasn't updated to match. Result on every PR with argocd-helm-oci
Tier-1 KWOK coverage: the parent App resolves to
`oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`,
the OCI artifact lookup 404s, gpu-operator-post's Application can
never sync, and the whole stack times out on
`GitOps sync timeout strike 1/3`.

The failure was masked on `main` because the most-recent KWOK
Cluster Validation run on `main` (#26469449378 at 0d3e62d, success)
ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed
on main, no fresh KWOK run has triggered on `main` yet — but the
next one will fail the same way every open PR's argocd-helm-oci
Tier-1 jobs are currently failing.

Fix is a one-line drop of the per-recipe suffix from
OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle.
The flux branch keeps the per-recipe suffix because flux's
OCIRepository CR consumes the FULL artifact URL (recipe segment
included). Updated the surrounding comment to point at the
post-NVIDIA#1032 contract so the next reader understands the asymmetry.

End-to-end check (verified from PR NVIDIA#1030's debug artifact at
b3f2296): repo-server log shows
`registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`,
caused by the same double-append. With the recipe suffix dropped,
Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed
artifact at `oci://…/aicr/<recipe>:<tag>`.

Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change),
PR NVIDIA#1035 (parent App template enforcement).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

argocd-helm OCI bundle: path-based child Applications miss .Chart.Name; Chart.yaml emits unquoted name

2 participants