Skip to content

fix(recipe): deep-copy validation config during merge#992

Closed
kiwigitops wants to merge 2 commits into
NVIDIA:mainfrom
kiwigitops:fix-validation-merge-aliasing
Closed

fix(recipe): deep-copy validation config during merge#992
kiwigitops wants to merge 2 commits into
NVIDIA:mainfrom
kiwigitops:fix-validation-merge-aliasing

Conversation

@kiwigitops
Copy link
Copy Markdown

Summary

  • clone validation configs/phases when merging recipe metadata instead of assigning overlay pointers directly
  • deep-copy validation slices and node-selection maps/slices
  • add regression coverage for readiness, deployment, performance, and conformance phases

Closes #984.

Testing

  • git diff --check

I could not run go test ./pkg/recipe locally because this Windows environment does not have go on PATH.

@kiwigitops kiwigitops requested a review from a team as a code owner May 20, 2026 19:03
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

Welcome to AICR, @kiwigitops! Thanks for your first pull request.

Before review, please ensure:

  • All commits are signed off per the DCO
  • CI checks pass (tests, lint, security scan)
  • The PR description explains the why behind your changes

A maintainer will review this soon.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 7e4c8986-075f-4d8a-9de5-b98c063942d1

📥 Commits

Reviewing files that changed from the base of the PR and between 6ff1291 and 911b8ec.

📒 Files selected for processing (2)
  • pkg/recipe/metadata.go
  • pkg/recipe/metadata_test.go

📝 Walkthrough

Walkthrough

This PR adds private helpers — cloneValidationConfig, cloneValidationPhase, and cloneNodeSelection — that deep-copy ValidationConfig and its nested fields (constraints, checks, node-selection maps/slices). RecipeMetadataSpec.Merge now uses these helpers instead of pointer-assignment for validation phases. A new table-driven test mutates the overlay after merging and asserts the merged result is unchanged, confirming there is no aliasing of slices/maps between overlay and merged copies.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/aicr#989: Also modifies pkg/recipe/metadata.go to deep-clone ValidationConfig/validation phases and adds regression tests to prevent pointer aliasing.

Suggested labels

size/L, area/recipes

Suggested reviewers

  • mchmarny
  • yuanchen8911
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(recipe): deep-copy validation config during merge' directly and concisely summarizes the main change: implementing deep-copying of validation configs instead of pointer aliasing during the merge operation.
Description check ✅ Passed The description clearly relates to the changeset, explaining the deep-cloning approach for validation configs/phases, mentioning the specific slices/maps affected, and noting regression test coverage for all four phases.
Linked Issues check ✅ Passed The PR implements all coding requirements from issue #984: adds three private clone helpers (cloneValidationConfig, cloneValidationPhase, cloneNodeSelection) that deep-copy slices and maps, updates Merge() to use them for all four phases, and includes comprehensive table-driven regression tests verifying mutations don't alias.
Out of Scope Changes check ✅ Passed All changes are directly in-scope: adding clone helpers, updating Merge() calls, and adding regression tests per issue #984 requirements; cloning ComponentRefs/top-level Constraints and cache refactoring were explicitly marked out-of-scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into #984. The branch was opened before the fix landed on main as 05b6ae37, which already solves the same root cause: it adds cloneValidationConfig / cloneValidationPhase in pkg/recipe/validation.go and wires them into Merge. After a rebase onto main, the additions to metadata.go in this PR become duplicate declarations and the package won't compile (verified locally with go build ./pkg/recipe/...).

The PR description mentioned go wasn't on PATH in the Windows environment — that's why this slipped through; CI didn't run either because the gate workflow appears to require maintainer approval for new external contributors.

Recommendation: rebase onto main, remove the helper additions in metadata.go (they're already in validation.go), and keep only the per-phase aliasing test in metadata_test.go — that subtest is the one genuinely useful improvement here, since current main only exercises two of the four phases. Inline comments mark each location.

If a rebase ends up trivial enough that nothing remains beyond the test, closing this PR in favor of a small follow-up scoped to just the test addition would also work.

Comment thread pkg/recipe/metadata.go
Comment on lines +620 to +641
func cloneValidationConfig(in *ValidationConfig) *ValidationConfig {
if in == nil {
return nil
}
return &ValidationConfig{
Readiness: cloneValidationPhase(in.Readiness),
Deployment: cloneValidationPhase(in.Deployment),
Performance: cloneValidationPhase(in.Performance),
Conformance: cloneValidationPhase(in.Conformance),
}
}

func cloneValidationPhase(in *ValidationPhase) *ValidationPhase {
if in == nil {
return nil
}
out := *in
out.Constraints = append([]Constraint(nil), in.Constraints...)
out.Checks = append([]string(nil), in.Checks...)
out.NodeSelection = cloneNodeSelection(in.NodeSelection)
return &out
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both cloneValidationConfig and cloneValidationPhase already exist on main since 05b6ae37 (in pkg/recipe/validation.go, lines 69 and 84), which fixed #984 directly. Adding them here is a redeclaration — go build ./pkg/recipe/... on this branch fails:

pkg/recipe/validation.go:69:6: cloneValidationConfig redeclared in this block
    pkg/recipe/metadata.go:620:6: other declaration of cloneValidationConfig
pkg/recipe/validation.go:84:6: cloneValidationPhase redeclared in this block
    pkg/recipe/metadata.go:632:6: other declaration of cloneValidationPhase

The reason CI didn't catch this is that the gate workflow likely requires maintainer approval for first-time external contributors — the only checks that ran here were Label PRs and the skipped auto-merge job.

Please rebase onto main; once you do, you'll see the helpers (and the s.Validation = cloneValidationConfig(...) call sites in Merge) are already in place. Both helpers in this hunk should then be removed.

Comment thread pkg/recipe/metadata.go
Comment on lines +643 to +656
func cloneNodeSelection(in *NodeSelection) *NodeSelection {
if in == nil {
return nil
}
out := *in
if in.Selector != nil {
out.Selector = make(map[string]string, len(in.Selector))
for key, value := range in.Selector {
out.Selector[key] = value
}
}
out.ExcludeNodes = append([]string(nil), in.ExcludeNodes...)
return &out
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloneNodeSelection is a new helper, but main's cloneValidationPhase (validation.go:84) already clones NodeSelection inline — both the Selector map and ExcludeNodes slice. Extracting it as a named function isn't a bad idea on its own, but it should live in validation.go next to its siblings, not metadata.go. After rebasing, if you still want this extraction, open it as a small refactor PR against the already-fixed code.

Comment on lines +1015 to +1078
t.Run("merged validation does not alias overlay", func(t *testing.T) {
phases := []struct {
name string
set func(*ValidationConfig, *ValidationPhase)
get func(*ValidationConfig) *ValidationPhase
}{
{
name: "readiness",
set: func(config *ValidationConfig, phase *ValidationPhase) { config.Readiness = phase },
get: func(config *ValidationConfig) *ValidationPhase { return config.Readiness },
},
{
name: "deployment",
set: func(config *ValidationConfig, phase *ValidationPhase) { config.Deployment = phase },
get: func(config *ValidationConfig) *ValidationPhase { return config.Deployment },
},
{
name: "performance",
set: func(config *ValidationConfig, phase *ValidationPhase) { config.Performance = phase },
get: func(config *ValidationConfig) *ValidationPhase { return config.Performance },
},
{
name: "conformance",
set: func(config *ValidationConfig, phase *ValidationPhase) { config.Conformance = phase },
get: func(config *ValidationConfig) *ValidationPhase { return config.Conformance },
},
}

for _, tt := range phases {
t.Run(tt.name, func(t *testing.T) {
base := RecipeMetadataSpec{}
overlay := RecipeMetadataSpec{Validation: &ValidationConfig{}}
tt.set(overlay.Validation, &ValidationPhase{
Constraints: []Constraint{{Name: testK8sVersionConstant, Value: ">= 1.30"}},
Checks: []string{"expected-resources"},
NodeSelection: &NodeSelection{
Selector: map[string]string{"accelerator": "h100"},
MaxNodes: 2,
ExcludeNodes: []string{"node-a"},
},
})
base.Merge(&overlay)

overlayPhase := tt.get(overlay.Validation)
overlayPhase.Constraints[0].Value = ">= 1.31"
overlayPhase.Checks[0] = "check-nvidia-smi"
overlayPhase.NodeSelection.Selector["accelerator"] = "b200"
overlayPhase.NodeSelection.ExcludeNodes[0] = "node-b"

mergedPhase := tt.get(base.Validation)
if mergedPhase.Constraints[0].Value != ">= 1.30" {
t.Error("constraints should not alias overlay")
}
if mergedPhase.Checks[0] != "expected-resources" {
t.Error("checks should not alias overlay")
}
if mergedPhase.NodeSelection.Selector["accelerator"] != "h100" {
t.Error("node selector should not alias overlay")
}
if mergedPhase.NodeSelection.ExcludeNodes[0] != "node-a" {
t.Error("excluded nodes should not alias overlay")
}
})
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the salvageable part of the PR — the per-phase aliasing matrix (readiness/deployment/performance/conformance) is genuinely better coverage than what landed with 05b6ae37, which only exercises the deployment+conformance paths in t.Run("merge does not alias source validation", ...) at the bottom of TestMergeValidationConfig. After dropping the duplicate helpers in metadata.go, this table-driven subtest is worth keeping. Consider trimming the focused PR to just this test addition.

@kiwigitops
Copy link
Copy Markdown
Author

Closing this because main now includes the same validation deep-copy fix via #989. That should cover the production aliasing issue without keeping duplicate PRs open.

@kiwigitops kiwigitops closed this May 21, 2026
@kiwigitops kiwigitops deleted the fix-validation-merge-aliasing branch May 22, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(recipe): deep-clone ValidationConfig phases in Merge to prevent cache corruption

2 participants