fix: context deadline exceeded shouldn't throw an error#203
Conversation
Signed-off-by: Amber Xue <ambermingxin@nvidia.com>
📝 WalkthroughWalkthroughAdds an immediate one-time export when the exporter runs in OfflineMode; introduces Server loop lifecycle state (cancelable loop context + waitgroup) and a wait helper with tests; and adds optional Helm enrollment metadata flags, template wiring, and docs. ChangesOffline exporter initial export
Server loop lifecycle and shutdown
Helm enrollment flags and docs
Sequence Diagram(s)sequenceDiagram
participant ComponentA
participant ComponentB
ComponentA->>ComponentB: observable interaction
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Amber Xue <ambermingxin@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/server/server.go (1)
223-224:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't suppress joined loop failures just because the context ended.
Once
ctx.Err()is set, this drops any error that matchescontext.Canceledorcontext.DeadlineExceeded, including joined/wrapped errors that also contain a real loop failure. That still masks the scenario called out earlier instead of only silencing the expected shutdown path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/server/server.go` around lines 223 - 224, The current condition suppresses any error that wraps context.Canceled/DeadlineExceeded; change the check to only suppress when the error is exactly the cancellation error (not wrapped): in the block referencing ctx and err, replace errors.Is(err, context.Canceled)/errors.Is(err, context.DeadlineExceeded) with direct equality checks (err == context.Canceled || err == context.DeadlineExceeded) while still requiring ctx != nil && ctx.Err() != nil, so wrapped/joined loop failures are not accidentally dropped.
🧹 Nitpick comments (1)
internal/exporter/exporter_test.go (1)
343-346: ⚡ Quick winConsider adding a brief sleep after Start() for test reliability.
The existing test at line 287-305 includes a 100ms sleep after calling
Start()before reading the directory. For consistency and to guard against file system buffering delays, consider adding the same safety margin here.Suggested addition
err = exporter.Start() require.NoError(t, err) + +// Brief wait to ensure file system operations complete +time.Sleep(100 * time.Millisecond) entries, err := os.ReadDir(tmpDir)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/exporter/exporter_test.go` around lines 343 - 346, Add a short sleep after calling exporter.Start() to match the other test's safety margin and avoid flaky reads: after invoking exporter.Start() (the call to exporter.Start() that precedes reading tmpDir with os.ReadDir), insert a brief pause (e.g., 100ms) before calling os.ReadDir(tmpDir) so the exporter has time to write files and the filesystem buffers settle; keep the delay minimal and document it with a one-line comment near the exporter.Start() call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deployments/helm/fleet-intelligence-agent/values.yaml`:
- Around line 64-67: The values.yaml defaults for enrollment metadata use empty
strings for nodeGroup and computeZone which causes the template checks (see
daemonset.yaml checks using "ne nil" at the node-group/compute-zone flag
generation) to always pass and emit --node-group "" / --compute-zone ""; change
the defaults for the nodeGroup and computeZone keys in values.yaml from "" to
null so the template's nil checks behave correctly and preserve existing values
(also ensure README.md table remains consistent with the null defaults).
---
Duplicate comments:
In `@internal/server/server.go`:
- Around line 223-224: The current condition suppresses any error that wraps
context.Canceled/DeadlineExceeded; change the check to only suppress when the
error is exactly the cancellation error (not wrapped): in the block referencing
ctx and err, replace errors.Is(err, context.Canceled)/errors.Is(err,
context.DeadlineExceeded) with direct equality checks (err == context.Canceled
|| err == context.DeadlineExceeded) while still requiring ctx != nil &&
ctx.Err() != nil, so wrapped/joined loop failures are not accidentally dropped.
---
Nitpick comments:
In `@internal/exporter/exporter_test.go`:
- Around line 343-346: Add a short sleep after calling exporter.Start() to match
the other test's safety margin and avoid flaky reads: after invoking
exporter.Start() (the call to exporter.Start() that precedes reading tmpDir with
os.ReadDir), insert a brief pause (e.g., 100ms) before calling
os.ReadDir(tmpDir) so the exporter has time to write files and the filesystem
buffers settle; keep the delay minimal and document it with a one-line comment
near the exporter.Start() call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3ce52339-623f-4ed1-bcb8-b89bf6475581
📒 Files selected for processing (7)
deployments/helm/fleet-intelligence-agent/README.mddeployments/helm/fleet-intelligence-agent/templates/daemonset.yamldeployments/helm/fleet-intelligence-agent/values.yamldocs/install-helm.mdinternal/exporter/exporter_test.gointernal/server/server.gointernal/server/server_test.go
✅ Files skipped from review due to trivial changes (1)
- deployments/helm/fleet-intelligence-agent/README.md
Signed-off-by: Amber Xue <ambermingxin@nvidia.com>
Signed-off-by: Amber Xue <ambermingxin@nvidia.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
internal/server/server.go (1)
495-496:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDifferentiate expected context timeout exits from real loop failures.
Line 495 and Line 532 now suppress only
context.Canceled. In timeout-driven shutdown, these loops can legitimately exit withcontext.DeadlineExceeded, which will now be logged as an error and conflicts with the PR objective. SuppressDeadlineExceededonly when it comes from the loop context (ctx.Err()), and still log real internal deadline failures.Proposed patch
@@ s.loopWG.Add(1) go func() { defer s.loopWG.Done() - if err := manager.Run(ctx); err != nil && !errors.Is(err, context.Canceled) { - log.Logger.Errorw("inventory loop manager exited", "error", err) - } + if err := manager.Run(ctx); err != nil { + expectedCtxExit := errors.Is(err, context.Canceled) || + (errors.Is(err, context.DeadlineExceeded) && errors.Is(ctx.Err(), context.DeadlineExceeded)) + if !expectedCtxExit { + log.Logger.Errorw("inventory loop manager exited", "error", err) + } + } }() } @@ s.loopWG.Add(1) go func() { defer s.loopWG.Done() - if err := manager.Run(ctx); err != nil && !errors.Is(err, context.Canceled) { - log.Logger.Errorw("attestation loop exited", "error", err) - } + if err := manager.Run(ctx); err != nil { + expectedCtxExit := errors.Is(err, context.Canceled) || + (errors.Is(err, context.DeadlineExceeded) && errors.Is(ctx.Err(), context.DeadlineExceeded)) + if !expectedCtxExit { + log.Logger.Errorw("attestation loop exited", "error", err) + } + } }() }Also applies to: 532-533
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/server/server.go` around lines 495 - 496, The current error suppression only filters context.Canceled causing context.DeadlineExceeded from internal failures to be swallowed; update the error-checking around manager.Run(ctx) (and the other loop Run call at the later occurrence) to suppress DeadlineExceeded only when it originated from the loop context by checking ctx.Err() — i.e., keep the existing suppression for context.Canceled, and add logic so that errors.Is(err, context.DeadlineExceeded) is ignored only if ctx.Err() == context.DeadlineExceeded; otherwise treat DeadlineExceeded as a real error and log it with log.Logger.Errorw.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@internal/server/server.go`:
- Around line 495-496: The current error suppression only filters
context.Canceled causing context.DeadlineExceeded from internal failures to be
swallowed; update the error-checking around manager.Run(ctx) (and the other loop
Run call at the later occurrence) to suppress DeadlineExceeded only when it
originated from the loop context by checking ctx.Err() — i.e., keep the existing
suppression for context.Canceled, and add logic so that errors.Is(err,
context.DeadlineExceeded) is ignored only if ctx.Err() ==
context.DeadlineExceeded; otherwise treat DeadlineExceeded as a real error and
log it with log.Logger.Errorw.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ed80e414-67a0-4341-8abb-f96029360639
📒 Files selected for processing (2)
internal/server/server.gointernal/server/server_test.go
💤 Files with no reviewable changes (1)
- internal/server/server_test.go
Description
Checklist
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests