Skip to content

feat(validator): replace helm CLI subprocess with Helm Go SDK for chart rendering#186

Merged
mchmarny merged 4 commits intoNVIDIA:mainfrom
xdu31:feat/helm-sdk
Feb 23, 2026
Merged

feat(validator): replace helm CLI subprocess with Helm Go SDK for chart rendering#186
mchmarny merged 4 commits intoNVIDIA:mainfrom
xdu31:feat/helm-sdk

Conversation

@xdu31
Copy link
Contributor

@xdu31 xdu31 commented Feb 23, 2026

Summary

Replace the helm template CLI subprocess with the Helm Go SDK (helm.sh/helm/v3) for chart rendering in the validator's expected resource auto-discovery. This eliminates the external binary dependency, enabling the validator to run in distroless/Ko images without a helm binary or shell.

Motivation / Context

The validator auto-discovers expected workload resources (Deployments, DaemonSets, StatefulSets) from Helm charts using helm template. The previous implementation shelled out to the helm CLI as a subprocess, which required:

  • The helm binary installed on the machine or in the container image
  • A shell to execute the subprocess
  • Temp file I/O to pass values via -f flag

The Helm Go SDK provides the same rendering capability as a library call, removing all three requirements. This is critical for the API server path (built with Ko/distroless) and simplifies the credential story — the SDK honors HELM_REGISTRY_CONFIG identically to the CLI.

Follows up on PR #164 which introduced the two-phase discovery.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other

Implementation Notes

Modified: pkg/validator/resource_discovery.go

Removed:

  • helmCommand constant and exec.LookPath pre-check for helm binary
  • executeSubprocess() — generic CLI runner via exec.CommandContext
  • writeValuesToTempFile() — marshal values to temp YAML file for -f flag
  • os, os/exec, stderrors imports

Added:

  • Helm Go SDK imports: helm.sh/helm/v3/pkg/action, chart/loader, cli, registry
  • renderHelmTemplate() rewritten using SDK:
    • registry.NewClient() for OCI auth (honors HELM_REGISTRY_CONFIG)
    • action.NewInstall with DryRun: true, ClientOnly: true (no cluster connection)
    • install.RunWithContext(ctx, chart, values) — values passed as Go map directly
  • locateChart() — resolves chart reference via install.LocateChart() for both OCI and HTTP repos

Changed:

  • resolveExpectedResources() — removed needsHelm / exec.LookPath pre-check, added ctx.Err() check in the component loop so context cancellation propagates as a hard error
  • Individual component render failures remain warnings (non-blocking)

Modified: pkg/validator/resource_discovery_test.go

  • Removed TestWriteValuesToTempFile (temp file no longer used)
  • Removed TestResolveExpectedResources_ErrorOnMissingCLI (no CLI binary to check)

Modified: go.mod

  • Added helm.sh/helm/v3 v3.20.0 as direct dependency
  • ~50 new indirect dependencies (containerd, OCI, cobra, etc.)

Behavioral Changes

Aspect Before (CLI) After (SDK)
Helm binary Required, hard error if missing Not needed
Values passing Temp YAML file + -f flag Go map passed directly
Credentials HELM_REGISTRY_CONFIG via helm CLI Same, via registry.NewClient()
Chart download helm template --repo / OCI install.LocateChart()
Render failure Warning per component Warning per component (same)
Context cancel Not checked in loop Hard error on ctx.Err()
Container image Needs helm binary + shell Works in distroless/Ko

Testing

# Unit tests — all pass with race detector
go test -race ./pkg/validator/... -count=1

# Lint — 0 issues with go1.25.7
GOTOOLCHAIN=go1.25.7 golangci-lint -c .golangci.yaml run ./pkg/validator/

Risk Assessment

  • Low — Isolated refactoring, same rendering behavior, no interface changes
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rollout notes: Pure refactoring — the SDK produces identical rendered output to the CLI subprocess. The credential mechanism (HELM_REGISTRY_CONFIG) is unchanged. The only behavioral difference is that context cancellation is now checked per-component in the discovery loop.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@xdu31 xdu31 requested a review from a team as a code owner February 23, 2026 08:32
Copy link
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean refactoring — the motivation (distroless/Ko compatibility) is solid and the implementation follows project patterns well. The behavioral equivalence table in the description is thorough.

One item to fix before merge: the downloaded chart archive from locateChart() is never cleaned up, which leaks temp files in the long-running API server path. See inline comment on lines 186-189.

The remaining comments are non-blocking nits and suggestions (context cancellation gap in LocateChart, credential options, error code precision, test coverage for the new SDK path).

@xdu31 xdu31 requested a review from mchmarny February 23, 2026 18:22
Copy link
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mchmarny mchmarny merged commit 29b512b into NVIDIA:main Feb 23, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants