Skip to content

feat: add two-phase expected resource auto-discovery to validator#164

Merged
mchmarny merged 5 commits intoNVIDIA:mainfrom
xdu31:feat/check-helm-deployment
Feb 20, 2026
Merged

feat: add two-phase expected resource auto-discovery to validator#164
mchmarny merged 5 commits intoNVIDIA:mainfrom
xdu31:feat/check-helm-deployment

Conversation

@xdu31
Copy link
Contributor

@xdu31 xdu31 commented Feb 20, 2026

Summary

Add two-phase expected resource auto-discovery to the validator: Phase 1 renders Helm charts via CLI subprocess (helm template), Phase 2 renders component manifest files via shared pkg/manifest.Render(). Extract shared manifest rendering to pkg/manifest to eliminate duplication between bundler and validator. Manual expectedResources merge with and override auto-discovered ones.

Motivation / Context

Validators need to know which Kubernetes resources (Deployments, DaemonSets, StatefulSets) a component should produce so they can check deployment health. Previously this required manually listing expectedResources in every recipe component.

Now the validator auto-discovers workload resources from two sources:

  1. Helm chart rendering via CLI subprocess (helm template) — discovers main workload resources from charts
  2. Manifest file rendering via pkg/manifest.Render() — discovers supplementary resources from component manifest files (same logic as the bundler)

Missing helm CLI is a hard error when Helm components with chart coordinates exist. Helm is included in the validator image (Dockerfile.validator).

The bundler and validator previously duplicated Go-template rendering logic. This is now extracted to a shared pkg/manifest package.

Fixes: N/A
Related: N/A

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/eidos, pkg/cli)
  • API server (cmd/eidosd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: pkg/manifest (new shared package), pkg/defaults

Implementation Notes

New: pkg/manifest — shared manifest rendering

  • Extracted Render(), HelmFuncMap(), RenderInput from the bundler's unexported helpers
  • Uses gopkg.in/yaml.v3 for toYaml to maintain behavioral compatibility with the bundler's existing output (avoids JSON-intermediate marshaling differences from sigs.k8s.io/yaml)
  • Both pkg/bundler/deployer/helm and pkg/validator now import from one place
  • Tests migrated from bundler; 92% coverage

New: pkg/validator/resource_discovery.go — two-phase discovery

  • resolveExpectedResources() runs before validation in the CLI process
  • Phase 1: Renders Helm charts (helm template) via CLI subprocess to discover main workload resources. Requires network access for chart downloads (HTTP repos via --repo, OCI registries via oci:// prefix)
  • Phase 2: Renders component manifest files using pkg/manifest.Render() to discover supplementary resources
  • Missing helm CLI is a hard error when Helm components with chart coordinates exist
  • mergeExpectedResources() combines auto-discovered + manual, with manual taking precedence
  • renderManifestFiles() accepts context.Context and checks ctx.Done() for cancellation
  • Timeout constant ComponentRenderTimeout lives in pkg/defaults per project conventions

Modified: pkg/bundler/deployer/helm/helm.go

  • Removed duplicated rendering code, now imports pkg/manifest

Modified: pkg/validator/phases.go

  • Calls resolveExpectedResources at the start of validation; intentionally mutates recipeResult before serialization so the check pod sees the full expected resources list
  • Fixed 9 staticcheck QF1012 issues (WriteString(Sprintf)Fprintf)

Modified: pkg/defaults/timeouts.go

  • Added ComponentRenderTimeout (60s) for helm template and manifest file rendering

Modified: pkg/component/helpers.go

  • Fixed 3 staticcheck QF1012 issues

Modified: tests/e2e/run.sh

  • Added Test 3: Manual expectedResources matching a helm-installed nginx Deployment
  • Added Test 4: Manual expectedResources merge (real + fake resource) — asserts failure when nonexistent-deploy is not found

Testing

# Unit tests — all pass with race detector, coverage >70%
go test -race -coverprofile=cover.out ./pkg/manifest/... ./pkg/validator/... ./pkg/defaults/... -count=1
# pkg/manifest: 92.0%  |  pkg/validator: 75.3%

# Lint — 0 issues with go1.25.7
GOTOOLCHAIN=go1.25.7 golangci-lint -c .golangci.yaml run ./pkg/manifest/ ./pkg/validator/ ./pkg/defaults/

# E2E — new tests exercise real helm install on Kind
./tools/e2e

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert
  • Medium — Touches multiple components or has broader impact
  • High — Breaking change, affects critical paths, or complex rollout

Rollout notes: Discovery is additive — components without chart sources or manifest files simply skip discovery. Manual expectedResources still work exactly as before.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@xdu31 xdu31 force-pushed the feat/check-helm-deployment branch from 7ceb158 to 7cd6a2f Compare February 20, 2026 07:22
Copy link
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extraction of manifest rendering into pkg/manifest is a good refactor — sharing rendering logic between bundler and validator avoids drift. The resource discovery design (two-phase: helm template + manifest file rendering, with merge semantics) is well-structured, and the test coverage is thorough (784 lines of tests for 391 lines of implementation).

Note: The PR template is not filled out — no summary, type of change, risk assessment, or component checkboxes are checked. For a +1627/-237 change touching 9 files across bundler, validator, manifest, and e2e, this context would help reviewers.

Main concerns:

  1. YAML library behavioral change (gopkg.in/yaml.v3sigs.k8s.io/yaml) — This changes toYaml output for the bundler too (not just validator), since helm.go now calls the shared manifest.Render(). The libraries have different marshaling paths (direct YAML vs JSON→YAML intermediate) which can produce different output.

  2. Network dependency introducedhelm template --repo downloads charts from remote repositories. Validators that were previously offline-capable now require network + helm CLI when recipes have chart coordinates. The helm CLI check is a hard error (line 76), while individual chart failures are warnings — this asymmetry could be confusing.

  3. E2E Test 4 always passes — The merge test passes regardless of whether the expected-resources check fails or succeeds, defeating the purpose of including nonexistent-deploy.

Minor concerns:

  1. Timeout constant should be in pkg/defaults per project conventions
  2. renderManifestFiles missing context.Context parameter (inconsistent with project rules and with renderHelmTemplate)
  3. Recipe mutation before serialization should be documented as intentional

…dd context to renderManifestFiles, fix e2e assertion
@xdu31 xdu31 requested a review from mchmarny February 20, 2026 21:21
@xdu31 xdu31 marked this pull request as ready for review February 20, 2026 21:21
Copy link
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mchmarny mchmarny merged commit f176162 into NVIDIA:main Feb 20, 2026
14 checks passed
@mchmarny mchmarny deleted the feat/check-helm-deployment branch February 20, 2026 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants