feat(cli): add aicr diff for configuration drift detection by sanjeevrg89 · Pull Request #582 · NVIDIA/aicr

sanjeevrg89 · 2026-04-15T03:59:22Z

Adds aicr diff — compares a snapshot against a recipe to surface constraint violations and component version drift.

What's included:

Constraint evaluation: flags mismatches between expected recipe values and actual snapshot state
Component drift: detects version skew between recipe chart versions and deployed images
Prometheus metrics for drift tracking (counters, gauges, histograms)
Table output for human-readable reports
Full test coverage (190 tests, race-clean)

Scoped to just the diff feature per @xdu31's feedback on #464 — node-validate and node-scoped constraints will follow as separate PRs.

Summary by CodeRabbit

New Features
- Added a new diff subcommand to compare two snapshots and report field-level changes; supports table and serialized outputs and a fail-on-drift option.
- Integrates snapshot source selection and kubeconfig-aware snapshot loading with CLI timeout handling.
Documentation
- Added guidance describing planned drift-detection use cases and future integration notes.
Tests
- New unit tests covering CLI flags, snapshot diff behavior, table output, and metrics recording.

Add snapshot-vs-recipe comparison that detects constraint violations and component version drift. Includes Prometheus metrics, table output, and comprehensive test coverage. New packages: pkg/diff (core engine, metrics, table renderer) New command: pkg/cli/diff.go registered in root.go

copy-pr-bot · 2026-04-15T03:59:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

mchmarny

Welcome — solid first contribution. Good test coverage (190 tests), clean pkg/cli / pkg/diff separation, and smart reuse of constraints.Evaluate. The tactical items (Close errors, missing timeout, dead metrics code) are straightforward fixes — see inline comments.

mchmarny

The bigger questions are architectural:

Parallel constraint evaluation path — evaluateConstraint reimplements the loop from validator.checkReadiness(). Consider extracting a shared function so both commands stay in sync.
Union result type — Result serves both modes via optional fields keyed on .Mode. Separate types per mode would be type-safe and easier to schema-validate for CI consumers.
Component drift matching is heuristic — string-matching ComponentRef.Name against image keys will produce false "not-observed" in real clusters where image names don't exactly match component names.
Scope — Snapshot-vs-snapshot comparison is genuinely novel. Recipe-vs-snapshot is a read-only subset of what validate already does. Would it be cleaner to ship snapshot-vs-snapshot here, and add recipe drift as aicr validate --drift-only (skip job execution, report constraint state)? That keeps constraint evaluation in one code path.

yuanchen8911

Two regressions remain in the GPU workflow changes.

The new trigger/filter lists no longer include pkg/validator/checks/conformance/** or pkg/validator/checks/deployment/**. That means a PR that only changes validator checks will stop triggering these GPU workflows on pull-request/* pushes, creating a silent CI gap for the main end-to-end validator path.

The conformance evidence collection step now requires validate-conformance to have run with a success or failure outcome. If chainsaw test or expected-resource verification fails first, validate-conformance is skipped and evidence collection never runs. Previously we still uploaded conformance evidence after those earlier failures, which was useful for debugging common GPU CI breakages. The same regression exists in both the inference and training workflows.

yuanchen8911 · 2026-04-15T18:31:23Z

I think this is really two different features. Snapshot-vs-snapshot feels like a natural diff command. Recipe-vs-snapshot feels more like validate in an offline/read-only mode than an extension of snapshot.

That also reinforces my earlier concern about the union Result type: one command is currently carrying two distinct concepts and output schemas behind a .Mode switch.

My preference would be to keep diff focused on snapshot comparison, and put recipe drift under validate so the constraint-evaluation path stays unified.

ArangoGutierrez

Welcome, and thanks for the contribution -- configuration drift detection is a useful feature. Mark's review covers the key architectural questions well. The main one to address first is the scope question: whether recipe-vs-snapshot belongs here or under validate --drift-only, since that drives the type design (union Result vs separate types) and avoids the parallel constraint evaluation path.

Once you have a chance to respond to the review feedback, the tactical items (Close errors, missing timeout, unused metrics registration) should be quick fixes. Looking forward to seeing the next iteration.

Address review feedback on NVIDIA#582 by removing recipe-vs-snapshot mode from aicr diff. Reviewers agreed recipe drift belongs under validate --drift-only to keep constraint evaluation in one code path. Changes: - Remove RecipeVsSnapshot path, union Result type, component drift matching - Remove pkg/diff/metrics.go (dead code — promauto was registering gauges at init that nothing populated) - Add context.WithTimeout on CLI I/O using defaults.CLISnapshotTimeout - Fix writable f.Close() and serializer closer.Close() to capture errors Preserved in pkg/diff/_future/ with a README documenting the use cases that justify recipe-vs-snapshot drift detection (airgapped audit, historical compliance, fleet-scale monitoring, CI/CD gating) and a reuse plan for validate --drift-only. Net: -1,004 lines. 176 tests pass with -race. go vet clean.

sanjeevrg89 · 2026-04-16T11:30:07Z

Thanks @mchmarny, @yuanchen8911, and @ArangoGutierrez for the detailed feedback — all well taken. Pushed an update that scopes this PR down to snapshot-vs-snapshot only, which I think addresses the architectural concerns.

What changed in this push:

Stripped RecipeVsSnapshot, the union Result type, component drift matching, and the parallel constraint evaluation loop from pkg/diff. aicr diff is now just snapshot-vs-snapshot — one concept, one type, one code path.
Removed pkg/diff/metrics.go. Agreed it was dead code in this PR — promauto was registering gauges at init that nothing populated.
Added context.WithTimeout(ctx, defaults.CLISnapshotTimeout) around the CLI I/O path.
Fixed the writable f.Close() and the serializer closer.Close() to capture and log errors instead of discarding them.

Net change is -1,004 lines on the feature. 196 tests pass with -race, go vet clean.

On recipe-vs-snapshot:

I agree with the suggestion to put it under aicr validate --drift-only so constraint evaluation stays in one place. I preserved the recipe-mode code in pkg/diff/_future/ (outside Go's build graph) with a README documenting the use cases that I think justify keeping it as a first-class capability even if it moves:

Airgapped/disconnected audit — auditor has snapshot + recipe, no cluster access
Historical compliance — "was cluster X compliant on 2026-03-15?" given that day's snapshot
Fleet-scale drift monitoring — analyze 500 snapshots offline instead of round-tripping validate
CI/CD drift gating — sub-second drift check on a recipe change without spinning up a cluster
Pre-flight before validate — cheap read-only check before the heavy Job-deploying run
Evidence/attestation trail — reproducible signed drift report from recipe hash + snapshot hash

The reviewers' concern was that constraint evaluation should live in one path — they're right. The suggested validate --drift-only design unifies the code path while keeping both use cases first-class. Happy to take that on as a follow-up PR, with the constraint evaluator extracted to pkg/constraints as a shared function and a structured ComponentRef→image mapping to replace the heuristic matching.

Let me know if the scope on this PR now feels right.

issues addressed

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/cli/diff_test.go`:
- Around line 39-51: The variable name requiredFlags in pkg/cli/diff_test.go is
misleading because some entries are optional; rename it to expectedFlags and
update all references in the test (the slice definition and the for-range that
iterates it) so the intent is clear, keeping the rest of the logic (loop over
cmd.Flags and hasFlag) unchanged; ensure the test error message that uses
flagName remains the same.

In `@pkg/cli/diff.go`:
- Around line 138-141: The drift-detection branch currently returns
errors.New(..., errors.ErrCodeInternal) which is the wrong code; update the
error creation in the block that checks cmd.Bool("fail-on-drift") &&
result.HasDrift() to use errors.ErrCodeInvalidRequest (or add a new constant
like ErrCodeConfigurationMismatch in the central error codes and use that)
instead of ErrCodeInternal, and ensure the new/changed error code maps to the
intended exit code (exit code 2) in the error handling/mapping logic.

In `@pkg/diff/_future/cli_diff.go.bak`:
- Around line 251-259: The table branch currently treats any non-empty output as
a filesystem path (creating a file for "--output -"), so in the block guarded by
outFormat == serializer.FormatTable replace the manual os.Create logic with the
same normalization used elsewhere: call serializer.NewFileWriterOrStdout(output)
to obtain the writer, propagate any error, assign the returned writer to w, and
if the returned writer implements io.Closer defer Close() to avoid leaking file
descriptors; reference the symbols outFormat, serializer.FormatTable, output, w
and serializer.NewFileWriterOrStdout when locating and changing the code.
- Around line 258-272: The JSON/YAML output path currently ignores Close()
errors on the serializer (ser) just like the earlier file branch where defer
f.Close() drops errors; change both close paths to capture and propagate Close()
failures: for the file branch replace defer f.Close() with a deferred closure
that checks f.Close() error and returns or wraps it into the function's error
return, and for the serializer branch change the anonymous defer to call
closer.Close(), check its error, and if non-nil wrap and return it (preserve
existing error wrapping style used around serializer.NewFileWriterOrStdout);
reference the variables/functions ser, f, serializer.NewFileWriterOrStdout and
ensure the function returns the Close() error instead of discarding it.

In `@pkg/diff/_future/README.md`:
- Line 1: Replace the current H2 heading "## Recipe-vs-Snapshot Drift Detection
(deferred)" with a top-level H1 heading so the README begins with a single
top-level heading; update the line in pkg/diff/_future/README.md that contains
"Recipe-vs-Snapshot Drift Detection (deferred)" to use an H1 (single leading
hash) instead of H2 to satisfy MD041.

In `@pkg/diff/_future/recipe_diff.go.bak`:
- Around line 267-281: The current loop in the block that assigns
result.ValidationPhases = checkValidationPhases(rec, snap) is aggregating
phase-level constraint outcomes (iterating vp.ConstraintResults) into
result.Summary.Constraints* while those constraint rows are not present in
result.ConstraintResults, causing inconsistent counts. Fix by either (A) keeping
separate phase counters on result (e.g., result.PhaseSummary.* or new fields)
and increment those instead of result.Summary, or (B) append/flatten each
vp.ConstraintResults into result.ConstraintResults before computing
result.Summary.Total and the ConstraintsPassed/Failed/Error counts; update the
code paths that compute result.Summary (referencing result.ValidationPhases,
vp.ConstraintResults, result.ConstraintResults, and result.Summary) so top-level
summary only reflects entries actually present in result.ConstraintResults.
- Around line 230-236: Guard against nil inputs at the start of
RecipeVsSnapshot: check if rec==nil or snap==nil before using rec.Constraints or
dereferencing snap, and return a safe empty Result (or an explicit error Result)
instead of letting the function panic. Concretely, compute a constraintsLen :=
0; if rec != nil { constraintsLen = len(rec.Constraints) } and use that when
allocating ConstraintResults, and if snap == nil or rec == nil return the
initialized Result (Mode "recipe-vs-snapshot") immediately or set an error field
so downstream code in RecipeVsSnapshot, and any uses of rec or snap, do not
dereference nil values.

In `@pkg/diff/diff.go`:
- Around line 86-92: The docstring for Snapshots contradicts its behavior:
instead of requiring non-nil inputs, the function currently returns an empty
*Result when baseline or target is nil; update the comment for Snapshots to
state it is nil-safe and that when either baseline or target is nil the function
returns an empty Result (with Changes initialized empty), referencing the Result
and Change return values and the snapshotter.Snapshot parameters so callers know
they don't need to pre-validate; do not change the function signature.

In `@pkg/diff/table.go`:
- Around line 56-60: The else branch printing "NO DRIFT" is unreachable because
the function returns when len(result.Changes) == 0 and by the time we reach this
block result.HasDrift() will always be true; remove the dead else branch and
simplify the block to always print the drift message (e.g., replace the if/else
using result.HasDrift() with a single fmt.Fprintln(w, "DRIFT DETECTED")). Ensure
references to result.Changes, result.Summary.Total (populated by Snapshots()),
and result.HasDrift() remain correct elsewhere.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 392d5693-49cb-4486-be78-91531e3ee884

📥 Commits

Reviewing files that changed from the base of the PR and between 03b82bd and dc011a8.

📒 Files selected for processing (13)

pkg/cli/diff.go
pkg/cli/diff_test.go
pkg/cli/root.go
pkg/diff/_future/README.md
pkg/diff/_future/cli_diff.go.bak
pkg/diff/_future/metrics.go.bak
pkg/diff/_future/metrics_test.go.bak
pkg/diff/_future/recipe_diff.go.bak
pkg/diff/_future/recipe_diff_test.go.bak
pkg/diff/_future/table.go.bak
pkg/diff/diff.go
pkg/diff/diff_test.go
pkg/diff/table.go

coderabbitai · 2026-04-16T14:52:12Z

+	if outFormat == serializer.FormatTable {
+		w := os.Stdout
+		if output != "" {
+			f, err := os.Create(output)
+			if err != nil {
+				return errors.Wrap(errors.ErrCodeInternal, "failed to create output file", err)
+			}
+			defer f.Close()
+			w = f


⚠️ Potential issue | 🟠 Major

Honor stdout sentinels in the table path.

The table branch treats any non-empty --output as a filesystem path, so --output - (and the stdout URI) will create a file instead of writing to stdout. Reuse the same output normalization as serializer.NewFileWriterOrStdout here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/diff/_future/cli_diff.go.bak` around lines 251 - 259, The table branch currently treats any non-empty output as a filesystem path (creating a file for "--output -"), so in the block guarded by outFormat == serializer.FormatTable replace the manual os.Create logic with the same normalization used elsewhere: call serializer.NewFileWriterOrStdout(output) to obtain the writer, propagate any error, assign the returned writer to w, and if the returned writer implements io.Closer defer Close() to avoid leaking file descriptors; reference the symbols outFormat, serializer.FormatTable, output, w and serializer.NewFileWriterOrStdout when locating and changing the code.

coderabbitai · 2026-04-16T14:52:12Z

+			defer f.Close()
+			w = f
+		}
+		return diff.WriteTable(w, result)
+	}
+
+	// JSON/YAML use standard serializer
+	ser, err := serializer.NewFileWriterOrStdout(outFormat, output)
+	if err != nil {
+		return errors.Wrap(errors.ErrCodeInternal, "failed to create output writer", err)
+	}
+	defer func() {
+		if closer, ok := ser.(interface{ Close() error }); ok {
+			_ = closer.Close()
+		}


⚠️ Potential issue | 🟠 Major

Propagate close failures from output writers.

Both close paths drop errors. That can silently lose the last buffered bytes and still return success. The future copy should keep the same close/error pattern as the live CLI before it is brought back.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/diff/_future/cli_diff.go.bak` around lines 258 - 272, The JSON/YAML output path currently ignores Close() errors on the serializer (ser) just like the earlier file branch where defer f.Close() drops errors; change both close paths to capture and propagate Close() failures: for the file branch replace defer f.Close() with a deferred closure that checks f.Close() error and returns or wraps it into the function's error return, and for the serializer branch change the anonymous defer to call closer.Close(), check its error, and if non-nil wrap and return it (preserve existing error wrapping style used around serializer.NewFileWriterOrStdout); reference the variables/functions ser, f, serializer.NewFileWriterOrStdout and ensure the function returns the Close() error instead of discarding it.

coderabbitai · 2026-04-16T14:52:12Z

@@ -0,0 +1,75 @@
+## Recipe-vs-Snapshot Drift Detection (deferred)


⚠️ Potential issue | 🟡 Minor

Add top-level heading per markdown standards.

Static analysis flags MD041: first line should be a top-level heading. The current line 1 is ## Recipe-vs-Snapshot Drift Detection (deferred) which is H2.

📝 Suggested fix

+# Deferred Implementation + ## Recipe-vs-Snapshot Drift Detection (deferred)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## Recipe-vs-Snapshot Drift Detection (deferred)

# Deferred Implementation

## Recipe-vs-Snapshot Drift Detection (deferred)

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/diff/_future/README.md` at line 1, Replace the current H2 heading "## Recipe-vs-Snapshot Drift Detection (deferred)" with a top-level H1 heading so the README begins with a single top-level heading; update the line in pkg/diff/_future/README.md that contains "Recipe-vs-Snapshot Drift Detection (deferred)" to use an H1 (single leading hash) instead of H2 to satisfy MD041.

coderabbitai · 2026-04-16T14:52:12Z

+func RecipeVsSnapshot(rec *recipe.RecipeResult, snap *snapshotter.Snapshot) *Result {
+	result := &Result{
+		Mode:              "recipe-vs-snapshot",
+		ConstraintResults: make([]ConstraintResult, 0, len(rec.Constraints)),
+		ComponentDrifts:   make([]ComponentDrift, 0),
+		ValidationPhases:  make([]ValidationPhaseSummary, 0),
+	}


⚠️ Potential issue | 🟠 Major

Guard nil recipe/snapshot inputs before constructing the result.

len(rec.Constraints) panics when rec is nil, and a nil snap is dereferenced later during constraint and component evaluation. For a reusable package API, this should return a safe empty result or an explicit error path instead of crashing. Based on learnings: Functional Packages (pkg/oci, pkg/bundler, pkg/recipe, pkg/collector, pkg/validator) must be self-contained, reusable business logic usable independently without CLI/API.

Possible hardening

func RecipeVsSnapshot(rec *recipe.RecipeResult, snap *snapshotter.Snapshot) *Result { + if rec == nil || snap == nil { + return &Result{ + Mode: "recipe-vs-snapshot", + ConstraintResults: []ConstraintResult{}, + ComponentDrifts: []ComponentDrift{}, + ValidationPhases: []ValidationPhaseSummary{}, + } + } + result := &Result{ Mode: "recipe-vs-snapshot", ConstraintResults: make([]ConstraintResult, 0, len(rec.Constraints)),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func RecipeVsSnapshot(rec *recipe.RecipeResult, snap *snapshotter.Snapshot) *Result {

result := &Result{

Mode: "recipe-vs-snapshot",

ConstraintResults: make([]ConstraintResult, 0, len(rec.Constraints)),

ComponentDrifts: make([]ComponentDrift, 0),

ValidationPhases: make([]ValidationPhaseSummary, 0),

}

func RecipeVsSnapshot(rec *recipe.RecipeResult, snap *snapshotter.Snapshot) *Result {

if rec == nil || snap == nil {

return &Result{

Mode: "recipe-vs-snapshot",

ConstraintResults: []ConstraintResult{},

ComponentDrifts: []ComponentDrift{},

ValidationPhases: []ValidationPhaseSummary{},

}

}

result := &Result{

Mode: "recipe-vs-snapshot",

ConstraintResults: make([]ConstraintResult, 0, len(rec.Constraints)),

ComponentDrifts: make([]ComponentDrift, 0),

ValidationPhases: make([]ValidationPhaseSummary, 0),

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/diff/_future/recipe_diff.go.bak` around lines 230 - 236, Guard against nil inputs at the start of RecipeVsSnapshot: check if rec==nil or snap==nil before using rec.Constraints or dereferencing snap, and return a safe empty Result (or an explicit error Result) instead of letting the function panic. Concretely, compute a constraintsLen := 0; if rec != nil { constraintsLen = len(rec.Constraints) } and use that when allocating ConstraintResults, and if snap == nil or rec == nil return the initialized Result (Mode "recipe-vs-snapshot") immediately or set an error field so downstream code in RecipeVsSnapshot, and any uses of rec or snap, do not dereference nil values.

coderabbitai · 2026-04-16T14:52:13Z

+	// 3. Summarize validation phase configuration and count phase-level constraints
+	result.ValidationPhases = checkValidationPhases(rec, snap)
+	for _, vp := range result.ValidationPhases {
+		for _, cr := range vp.ConstraintResults {
+			if cr.Error != "" {
+				result.Summary.ConstraintsError++
+			} else if cr.Passed {
+				result.Summary.ConstraintsPassed++
+			} else {
+				result.Summary.ConstraintsFailed++
+			}
+		}
+	}
+
+	result.Summary.Total = len(result.ConstraintResults)


⚠️ Potential issue | 🟠 Major

Don't mix phase constraint totals into the top-level constraint summary.

This loop folds phase-level constraint outcomes into Summary.Constraints*, but those rows are not added to ConstraintResults. Any consumer that renders ConstraintResults with Summary will show inconsistent counts versus rows. Keep separate phase counters, or flatten phase constraint results into the main slice before summarizing.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/diff/_future/recipe_diff.go.bak` around lines 267 - 281, The current loop in the block that assigns result.ValidationPhases = checkValidationPhases(rec, snap) is aggregating phase-level constraint outcomes (iterating vp.ConstraintResults) into result.Summary.Constraints* while those constraint rows are not present in result.ConstraintResults, causing inconsistent counts. Fix by either (A) keeping separate phase counters on result (e.g., result.PhaseSummary.* or new fields) and increment those instead of result.Summary, or (B) append/flatten each vp.ConstraintResults into result.ConstraintResults before computing result.Summary.Total and the ConstraintsPassed/Failed/Error counts; update the code paths that compute result.Summary (referencing result.ValidationPhases, vp.ConstraintResults, result.ConstraintResults, and result.Summary) so top-level summary only reflects entries actually present in result.ConstraintResults.

mchmarny · 2026-04-16T14:52:23Z

@sanjeevrg89 looks like some linting issues remain, otherwise LGTM. Sounds like coderabbit though has some other ideas ;)

Address review feedback on NVIDIA#582 by narrowing aicr diff to just snapshot-vs-snapshot comparison. Reviewers agreed recipe-vs-snapshot drift belongs under validate --drift-only to keep constraint evaluation in one code path. Architectural changes: - Remove RecipeVsSnapshot path, union Result type, component drift matching, and the parallel constraint evaluation loop - Remove pkg/diff/metrics.go (dead code — promauto was registering gauges at init that nothing populated) Tactical review fixes: - Add context.WithTimeout on CLI I/O using defaults.CLISnapshotTimeout - Capture and log errors from writable f.Close() and serializer closer.Close() instead of discarding them - Use ErrCodeInvalidRequest (exit 2) instead of ErrCodeInternal (exit 8) when fail-on-drift triggers — drift is an expected outcome, not an internal failure - Document nil-safe behavior on Snapshots() instead of claiming both inputs must be non-nil - Remove unreachable "NO DRIFT" branch in table.go (early return handles the empty-changes path) - Rename requiredFlags to expectedFlags in diff_test.go Lint fixes: - gocritic unlambda: drop wrapper closure around runDiffCmd - govet shadow: reassign err instead of shadowing at initDataProvider - prealloc: preallocate changes slice in addedMeasurement and removedMeasurement Preserved in pkg/diff/_future/ with a README documenting the use cases (airgapped audit, historical compliance, fleet-scale monitoring, CI/CD gating, pre-flight validate, evidence trail) that justify recipe-vs-snapshot drift detection as a first-class capability under validate --drift-only. golangci-lint v2.10.1: 0 issues. 196 tests pass with -race.

sanjeevrg89 · 2026-04-16T18:16:56Z

Thanks @mchmarny — linting's green now. Just pushed a fix.

Resolved all 4 golangci-lint failures from tests / Lint:

pkg/cli/diff.go:54 (gocritic unlambda) — dropped the wrapper closure, use Action: runDiffCmd directly
pkg/cli/diff.go:105 (govet shadow) — reassign outer err at initDataProvider instead of shadowing
pkg/diff/diff.go:220,228 (prealloc) — preallocate the changes slice in addedMeasurement and removedMeasurement using len(m.Subtypes)

While I was in there I also took the valid CodeRabbit nits on the live code:

pkg/cli/diff.go:136 — use ErrCodeInvalidRequest (exit 2) instead of ErrCodeInternal (exit 8) when --fail-on-drift triggers. Drift is an expected outcome, not an internal failure — this also makes the exit code useful for CI scripting.
pkg/diff/diff.go — Snapshots() docstring now documents the nil-safe behavior instead of claiming both inputs must be non-nil (the implementation was already nil-safe).
pkg/diff/table.go — removed the unreachable NO DRIFT branch; the empty-changes path returns early above it.
pkg/cli/diff_test.go — renamed requiredFlags → expectedFlags since the slice includes optional flags like --output and --format.

Skipped the CodeRabbit suggestions against files in pkg/diff/_future/ — those .bak files aren't in the Go build graph or linted, they're archived preservation for when someone picks up validate --drift-only. The README H1 nit is also there; markdown isn't in the repo's lint pipeline.

Verified locally with golangci-lint v2.10.1 (matches CI): 0 issues. 196 tests pass with -race.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

pkg/cli/diff.go (1)

153-161: ⚠️ Potential issue | 🟠 Major

Return Close() failures for writable outputs instead of only logging them.

Close() is where buffered I/O errors often surface. Logging these failures still lets aicr diff exit 0 after a truncated output file. Make writeDiffResult use a named err return and promote close failures when no earlier error exists.

💾 Suggested pattern

-func writeDiffResult(ctx context.Context, cmd *cli.Command, outFormat serializer.Format, result *diff.Result) error {
+func writeDiffResult(ctx context.Context, cmd *cli.Command, outFormat serializer.Format, result *diff.Result) (err error) {
 	output := cmd.String("output")
 ...
 			defer func() {
-				if closeErr := f.Close(); closeErr != nil {
-					slog.Error("failed to close output file", "error", closeErr)
+				if closeErr := f.Close(); closeErr != nil && err == nil {
+					err = errors.Wrap(errors.ErrCodeInternal, "failed to close output file", closeErr)
 				}
 			}()
 ...
 	defer func() {
 		if closer, ok := ser.(interface{ Close() error }); ok {
-			if closeErr := closer.Close(); closeErr != nil {
-				slog.Error("failed to close serializer", "error", closeErr)
+			if closeErr := closer.Close(); closeErr != nil && err == nil {
+				err = errors.Wrap(errors.ErrCodeInternal, "failed to close serializer", closeErr)
 			}
 		}
 	}()

As per coding guidelines "Always capture and check Close() error on writable file handles (files opened with Create or OpenFile with write modes)".

Also applies to: 172-178

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/cli/diff.go` around lines 153 - 161, The defer that closes the created
file (variable f) must promote Close() failures to the function return instead
of only logging them: change writeDiffResult to use a named return variable
(err) and in the defer check closeErr := f.Close(); if closeErr != nil && err ==
nil { err = errors.Wrap(errors.ErrCodeInternal, "failed to close output file",
closeErr) } so Close() errors are returned; apply the same pattern for the other
writable output file close at the second occurrence (lines around the other
defer) to ensure buffered-write errors are propagated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/diff/diff.go`:
- Around line 152-156: indexMeasurements currently dereferences m.Type without
checking for nil, causing a panic if measurements contains nil entries; update
the function (indexMeasurements in pkg/diff/diff.go) to skip nil measurement
pointers by adding a guard (if m == nil { continue }) before accessing m.Type so
nil entries are ignored (or handled) instead of panicking, ensuring the map
creation remains safe on invalid input.

---

Duplicate comments:
In `@pkg/cli/diff.go`:
- Around line 153-161: The defer that closes the created file (variable f) must
promote Close() failures to the function return instead of only logging them:
change writeDiffResult to use a named return variable (err) and in the defer
check closeErr := f.Close(); if closeErr != nil && err == nil { err =
errors.Wrap(errors.ErrCodeInternal, "failed to close output file", closeErr) }
so Close() errors are returned; apply the same pattern for the other writable
output file close at the second occurrence (lines around the other defer) to
ensure buffered-write errors are propagated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 6efb79ea-a4e0-4d31-8e33-44cff404500f

📥 Commits

Reviewing files that changed from the base of the PR and between dc011a8 and ca9e24f.

📒 Files selected for processing (4)

pkg/cli/diff.go
pkg/cli/diff_test.go
pkg/diff/diff.go
pkg/diff/table.go

Three fixes from review: - pkg/diff/table.go: wrap writes to the output Writer with an errWriter helper so the first write failure (broken pipe, full disk) surfaces as a structured pkg/errors.Wrap(ErrCodeInternal, ...) return value instead of a silent exit 0. Also wraps the bare err from tw.Flush() that CLAUDE.md's "never return bare err" rule forbids. - pkg/cli/diff.go: convert writeDiffResult to a named-return function and promote Close() failures on the output file and serializer into the return value when no earlier error exists. Buffered-write failures on Close were previously logged-and-lost, allowing aicr diff to exit 0 after a truncated output file. - pkg/diff/diff.go: guard against nil *measurement.Measurement entries in indexMeasurements. A snapshot loaded from malformed YAML could contain nil entries and panic the dereference on m.Type; now skipped cleanly. Added tests for the new failure paths: TestSnapshots_NilMeasurementEntries exercises the nil guard, TestWriteTable_PropagatesWriteErrors exercises both the empty and non-empty output branches with a failing writer. golangci-lint v2.10.1: 0 issues. 200 tests pass with -race.

sanjeevrg89 · 2026-04-16T18:39:00Z

Good catches from CodeRabbit — pushed fixes for all three:

1. pkg/diff/table.go write propagation. Wrote an errWriter helper that captures the first io.Writer error and turns subsequent writes into no-ops. WriteTable now wraps any failure with errors.Wrap(errors.ErrCodeInternal, "failed to write diff table output", err). This covers the broken-pipe / full-disk scenario and also addresses the bare tw.Flush() return that was violating the "never return bare err" rule.

2. pkg/cli/diff.go Close() error propagation. Converted writeDiffResult to a named-return function (err error) and promote Close() failures on both the output file (f.Close()) and the serializer (closer.Close()) into the return value when no earlier error exists. Previously these were logged and dropped — meaning aicr diff could exit 0 after a truncated output file.

3. pkg/diff/diff.go nil guard on indexMeasurements. Skip nil entries instead of panicking on m.Type. Real snapshots from the snapshotter won't contain nil entries, but malformed YAML deserialization can produce them and we shouldn't crash on invalid input.

Added two tests for the new failure paths: TestSnapshots_NilMeasurementEntries exercises the nil guard, TestWriteTable_PropagatesWriteErrors uses a failingWriter to verify both the empty-changes and with-changes paths return wrapped errors when the target writer fails.

golangci-lint v2.10.1: 0 issues. 200 tests pass with -race.

sanjeevrg89 requested a review from a team as a code owner April 15, 2026 03:59

github-actions bot added area/cli size/XL labels Apr 15, 2026

mchmarny assigned sanjeevrg89 Apr 15, 2026

mchmarny self-requested a review April 15, 2026 17:25

mchmarny added the enhancement New feature or request label Apr 15, 2026

mchmarny reviewed Apr 15, 2026

View reviewed changes

Comment thread pkg/cli/diff.go

Comment thread pkg/cli/diff.go Outdated

Comment thread pkg/cli/diff.go Outdated

Comment thread pkg/diff/_future/metrics.go.bak

Comment thread pkg/diff/diff.go Outdated

Comment thread pkg/diff/diff.go

Comment thread pkg/diff/diff.go Outdated

mchmarny previously requested changes Apr 15, 2026

View reviewed changes

Merge branch 'main' into feat/aicr-diff

2359231

yuanchen8911 reviewed Apr 15, 2026

View reviewed changes

yuanchen8911 requested review from mchmarny and yuanchen8911 April 15, 2026 18:17

ArangoGutierrez reviewed Apr 16, 2026

View reviewed changes

Merge branch 'main' into feat/aicr-diff

dc011a8

This comment was marked as resolved.

Sign in to view

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Comment thread pkg/diff/diff.go

Comment thread pkg/diff/table.go Outdated

		@@ -0,0 +1,75 @@
		## Recipe-vs-Snapshot Drift Detection (deferred)

Conversation

sanjeevrg89 commented Apr 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 15, 2026

Uh oh!

mchmarny left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mchmarny left a comment

Choose a reason for hiding this comment

Uh oh!

yuanchen8911 left a comment

Choose a reason for hiding this comment

Uh oh!

yuanchen8911 commented Apr 15, 2026

Uh oh!

ArangoGutierrez left a comment

Choose a reason for hiding this comment

Uh oh!

sanjeevrg89 commented Apr 16, 2026

Uh oh!

This comment was marked as resolved.

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mchmarny commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanjeevrg89 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sanjeevrg89 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sanjeevrg89 commented Apr 15, 2026 •

edited by coderabbitai bot

Loading

mchmarny left a comment •

edited

Loading

mchmarny commented Apr 16, 2026 •

edited

Loading

sanjeevrg89 commented Apr 16, 2026 •

edited

Loading