fix(bundler): tolerate trailing slash on repoURL; quote Chart.yaml version#1038
Conversation
…rsion Follow-up to PR NVIDIA#1035 (merged) addressing two non-blocking nits from the review that weren't included in the squash: 1. Trailing-slash robustness on path-based child Applications --set repoURL=oci://reg/org/ (an easy copy-paste from a registry UI) used to render as oci://reg/org//aicr-bundle, which Argo CD silently fails to sync against — the double slash hits the OCI resolver as an empty path segment with no operator-facing hint. Pipe .Values.repoURL through trimSuffix "/" before appending /{{ .Chart.Name }} in injectValuesIntoSingleSource so the rendered value is always a clean single-slash artifact reference. Golden fixtures regenerated. 2. Exercise the version branch of writeChartYAML's %q quoting The TestWriteChartYAML_QuotesYAMLReservedScalarsAsName table only covered chart-name reserved scalars. A user-supplied version like "1.0" reparses as the YAML float 1 without the %q wrap on version, so the version-side quoting is load-bearing, not cosmetic. Restructured the table to take an explicit (chartName, chartVersion) pair and added float-looking ("1.0") and numeric-looking ("123") version rows alongside the existing reserved-scalar name rows. Each row now round-trips both fields through yaml.Unmarshal and asserts the exact input string is recovered, so writeChartYAML's version branch is exercised by the same test that documents the rationale. Drive-by: relaxed two pre-existing substring assertions in TestInjectValuesIntoSingleSource that hardcoded `.Values.repoURL }}` to just `.Values.repoURL` so they survive the new trimSuffix pipeline insertion without losing the load-bearing intent (they verify the directive references the caller's value; the exact pipeline shape is documented elsewhere). Related: NVIDIA#1034 (the bug the original NVIDIA#1035 fixed), NVIDIA#1035 (where Mark's review suggested these nits)
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR trims trailing slashes from user-supplied Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
The argocd-helm-oci wrapper script was passing the FULL bundle URL to `helm install --set repoURL=…` (including the per-recipe chart name at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent Application's `source.chart` was hardcoded to `aicr-bundle`. PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template to expect the parent-namespace-only repoURL and to append .Chart.Name itself via the separate `source.chart` field. The wrapper script wasn't updated to match. Result on every PR with argocd-helm-oci Tier-1 KWOK coverage: the parent App resolves to `oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`, the OCI artifact lookup 404s, gpu-operator-post's Application can never sync, and the whole stack times out on `GitOps sync timeout strike 1/3`. The failure was masked on `main` because the most-recent KWOK Cluster Validation run on `main` (#26469449378 at 0d3e62d, success) ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed on main, no fresh KWOK run has triggered on `main` yet — but the next one will fail the same way every open PR's argocd-helm-oci Tier-1 jobs are currently failing. Fix is a one-line drop of the per-recipe suffix from OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle. The flux branch keeps the per-recipe suffix because flux's OCIRepository CR consumes the FULL artifact URL (recipe segment included). Updated the surrounding comment to point at the post-NVIDIA#1032 contract so the next reader understands the asymmetry. End-to-end check (verified from PR NVIDIA#1030's debug artifact at b3f2296): repo-server log shows `registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`, caused by the same double-append. With the recipe suffix dropped, Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed artifact at `oci://…/aicr/<recipe>:<tag>`. Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App template enforcement).
The argocd-helm-oci wrapper script was passing the FULL bundle URL to `helm install --set repoURL=…` (including the per-recipe chart name at the end). That matched the pre-PR-NVIDIA#1032 contract where the parent Application's `source.chart` was hardcoded to `aicr-bundle`. PR NVIDIA#1032 (and NVIDIA#1035's reinforcement) changed the parent App template to expect the parent-namespace-only repoURL and to append .Chart.Name itself via the separate `source.chart` field. The wrapper script wasn't updated to match. Result on every PR with argocd-helm-oci Tier-1 KWOK coverage: the parent App resolves to `oci://registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>`, the OCI artifact lookup 404s, gpu-operator-post's Application can never sync, and the whole stack times out on `GitOps sync timeout strike 1/3`. The failure was masked on `main` because the most-recent KWOK Cluster Validation run on `main` (#26469449378 at 0d3e62d, success) ran *before* PR NVIDIA#1035 merged. After NVIDIA#1035 / NVIDIA#1036 / NVIDIA#1038 all landed on main, no fresh KWOK run has triggered on `main` yet — but the next one will fail the same way every open PR's argocd-helm-oci Tier-1 jobs are currently failing. Fix is a one-line drop of the per-recipe suffix from OCI_IN_CLUSTER_REF in the argocd-helm-oci branch of generate_bundle. The flux branch keeps the per-recipe suffix because flux's OCIRepository CR consumes the FULL artifact URL (recipe segment included). Updated the surrounding comment to point at the post-NVIDIA#1032 contract so the next reader understands the asymmetry. End-to-end check (verified from PR NVIDIA#1030's debug artifact at b3f2296): repo-server log shows `registry.aicr-registry.svc.cluster.local:5000/aicr/<recipe>/<recipe>:<tag>: not found`, caused by the same double-append. With the recipe suffix dropped, Argo's resolution `<repoURL>/<chart>:<tag>` aligns with the pushed artifact at `oci://…/aicr/<recipe>:<tag>`. Refs PR NVIDIA#1030 (where this surfaced), PR NVIDIA#1032 (contract change), PR NVIDIA#1035 (parent App template enforcement).
Summary
Follow-up to PR #1035 (merged at
d638d009) addressing two non-blocking review nits from @mchmarny that weren't included in the squash:injectValuesIntoSingleSourcenow pipes.Values.repoURLthroughtrimSuffix "/"before appending/{{ .Chart.Name }}, so--set repoURL=oci://reg/org/(easy copy-paste from a registry UI) renders asoci://reg/org/aicr-bundleinstead of a silently-brokenoci://reg/org//aicr-bundle.TestWriteChartYAML_QuotesYAMLReservedScalarsAsNamenow takes an explicit(chartName, chartVersion)pair and exercises the version branch with"1.0"and"123"rows — proves the%qwrap onversionis load-bearing (without it"1.0"reparses as the YAML float1), not just cosmetic.Motivation / Context
Mark's inline review on PR #1035 flagged both as non-blocking nits; they're small but worth the durability. The original #1035 was approved and merged before these landed, so they need a separate PR.
Fixes: N/A (cosmetic / defense-in-depth)
Related: #1034 (the bug #1035 fixed), #1035 (where the nits were raised)
Type of Change
Component(s) Affected
pkg/bundler,pkg/component/*) —pkg/bundler/deployer/argocdhelm/argocdhelm.goImplementation Notes
Nit 1 — trailing-slash robustness. Single one-line template change. Goldens (
nodewright-customizations.yaml,gpu-operator-post.yaml) regenerated to reflect the new| trimSuffix "/" }}/{{ .Chart.Name }}shape.Nit 2 — exercise the version branch. Restructured the table from
(name, chartName, wantUnmarsh)(which assumedwantUnmarsh == chartNameand hardcoded version"1.0.0") to(name, chartName, chartVersion)with the assertion checking each input round-trips back to its exact string viayaml.Unmarshal. Added two version-side rows ("1.0","123").Drive-by. Relaxed two pre-existing assertions in
TestInjectValuesIntoSingleSourcethat hardcoded.Values.repoURL }}(the closing brace) to just.Values.repoURLso they survive the newtrimSuffixpipeline insertion. The load-bearing intent — the directive references the caller's value — is preserved; the exact pipeline shape is documented elsewhere.Testing
Risk Assessment
--set repoURLvalues (existing renders pass throughtrimSuffix "/"unchanged); only operator-supplied trailing slashes are normalized. Bundles generated before this PR are unaffected; bundles generated after produce slightly more forgiving rendered output.Rollout notes: None — the change is additive within the template pipeline and goldens.
Checklist
make testwith-race) — viamake qualifymake lint) — viamake qualifygit commit -S)