refactor: simplify inventory and attestation loops#184
Conversation
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
📝 WalkthroughWalkthroughRemoves attestation InitialInterval and per-loop Timeouts, replaces per-loop JitterEnabled/InitialInterval with explicit StartupJitter and package-level MinInventoryInterval/MinAttestationInterval (5m), updates loop run logic (runAttempt, startup jitter, retry behavior), surfaces loop enablement/intervals in agent DTOs and docs, and drops FLEETINT_ATTESTATION_INITIAL_INTERVAL env usage. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Server as Server
participant Manager as Manager
participant Collector as Collector
participant Sink as Sink
participant Backend as Backend
Server->>Manager: start loop(Interval, RetryInterval, StartupJitter, Timeout)
Manager->>Manager: sleep(random up to StartupJitter)
loop periodic
Manager->>Manager: runAttempt()
Manager->>Collector: CollectOnce(with timeout)
Collector-->>Manager: snapshot / error
Manager->>Sink: Export(snapshot)
Sink->>Backend: send Export
Backend-->>Sink: ack / ErrNotReady / error
alt Export succeeds
Sink-->>Manager: success
Manager->>Manager: schedule next = Interval
else Export fails or ErrNotReady
Sink-->>Manager: error
Manager->>Manager: schedule next = RetryInterval (if >0)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efdf70ad60
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/attestation/manager_test.go (1)
267-289:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winThis test doesn't actually prove the retry-interval path.
With a 25ms parent deadline,
mgr.Run(ctx)returnscontext.DeadlineExceededwhether the loop sleeps 5ms or 1h, so the elapsed-time assertions can still pass ifnextIntervalregresses back toInterval. Count attempts on thenodeUUIDProviderclosure and assert that the manager retried before the deadline.Possible fix
func TestManagerRunUsesRetryIntervalWhenNotEnrolled(t *testing.T) { + attempts := 0 mgr := NewManager( - func(context.Context) (string, error) { return "", ErrNotEnrolled }, + func(context.Context) (string, error) { + attempts++ + return "", ErrNotEnrolled + }, &testJWTProvider{jwt: "jwt-token"}, &testNonceProvider{nonce: "abc123"}, &testEvidenceCollector{resp: &SDKResponse{ResultCode: 200}}, @@ ctx, cancel := context.WithTimeout(context.Background(), 25*time.Millisecond) defer cancel() - start := time.Now() err := mgr.Run(ctx) - elapsed := time.Since(start) require.ErrorIs(t, err, context.DeadlineExceeded) - require.GreaterOrEqual(t, elapsed, 15*time.Millisecond) - require.Less(t, elapsed, 100*time.Millisecond) + require.GreaterOrEqual(t, attempts, 2) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/attestation/manager_test.go` around lines 267 - 289, The test fails to prove the retry-path because the UUID provider always returns ErrNotEnrolled without proving multiple attempts; change the first argument to NewManager in TestManagerRunUsesRetryIntervalWhenNotEnrolled to a closure that increments a counter (use an atomic counter or channel) each time it's called and returns ("", ErrNotEnrolled), run mgr.Run(ctx) as before, then assert the counter shows multiple attempts (e.g., attempts >= 2) to prove the retry interval path was exercised; keep existing context timeout and other providers the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/attestation/manager_test.go`:
- Around line 267-289: The test fails to prove the retry-path because the UUID
provider always returns ErrNotEnrolled without proving multiple attempts; change
the first argument to NewManager in
TestManagerRunUsesRetryIntervalWhenNotEnrolled to a closure that increments a
counter (use an atomic counter or channel) each time it's called and returns
("", ErrNotEnrolled), run mgr.Run(ctx) as before, then assert the counter shows
multiple attempts (e.g., attempts >= 2) to prove the retry interval path was
exercised; keep existing context timeout and other providers the same.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8cf0d2dc-56a0-477e-9f77-8760affdd35f
📒 Files selected for processing (17)
cmd/fleetint/run.godeployments/helm/fleet-intelligence-agent/README.mddeployments/helm/fleet-intelligence-agent/values.yamldeployments/packages/systemd/fleetint.envdocs/configuration.mdinternal/attestation/manager.gointernal/attestation/manager_test.gointernal/attestation/types.gointernal/config/config.gointernal/config/config_test.gointernal/config/default.gointernal/inventory/manager.gointernal/inventory/manager_run_test.gointernal/inventory/manager_test.gointernal/inventory/types.gointernal/server/server.gointernal/server/server_test.go
💤 Files with no reviewable changes (2)
- deployments/packages/systemd/fleetint.env
- deployments/helm/fleet-intelligence-agent/values.yaml
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/enrollment/enrollment.go (1)
156-177:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftUse the effective config here, not
config.Default().These new
Inventory*/Attestation*fields are populated from the default config loaded above, so the post-enroll sync now reports default loop settings rather than the user's actual settings. If inventory is disabled in the real config, the backend can be left with permanently incorrect agent-config metadata after enrollment. Please plumb the resolved config into this path, or omit these fields from the enroll-time sync until you have the real values.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/enrollment/enrollment.go` around lines 156 - 177, The enroll-time agent config is using values from the default config instead of the resolved/user config; update the call that builds the inventory AgentConfig in inventorysource.NewMachineInfoSourceWithAgentConfig (the InventoryEnabled/InventoryIntervalSeconds/AttestationEnabled/AttestationIntervalSeconds fields currently set from variables derived from config.Default()) to use the resolved config values (the already-loaded cfg or the resolved config variable) returned earlier rather than config.Default(), or remove those Inventory*/Attestation* fields from the payload until real values are available; ensure you reference the existing functions inventoryLoopAgentConfig/attestationLoopAgentConfig on the resolved cfg (or the resolved cfg fields) when populating the AgentConfig.
🧹 Nitpick comments (1)
internal/server/server.go (1)
472-499: Consider scaling startup jitter with the loop interval.A fixed one-minute
loopStartupJitterkeeps the code simple, but after a fleet-wide restart it can still bunch first-run inventory and especially attestation work into the same minute. Deriving jitter from each loop's interval, or making it configurable, would spread startup load more effectively.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/server/server.go` around lines 472 - 499, The startup jitter is a constant (loopStartupJitter) causing potential thundering-herd on startAttestationLoop; change it to be derived from the per-loop interval (obtained via getAttestationInterval) or make it configurable and pass that computed value into attestation.AttestationConfig.StartupJitter instead of the fixed loopStartupJitter so each server uses a jitter scaled to its attestation interval (update creation in startAttestationLoop and any related config parsing to expose the new jitter option).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/enrollment/enrollment.go`:
- Around line 156-177: The enroll-time agent config is using values from the
default config instead of the resolved/user config; update the call that builds
the inventory AgentConfig in inventorysource.NewMachineInfoSourceWithAgentConfig
(the
InventoryEnabled/InventoryIntervalSeconds/AttestationEnabled/AttestationIntervalSeconds
fields currently set from variables derived from config.Default()) to use the
resolved config values (the already-loaded cfg or the resolved config variable)
returned earlier rather than config.Default(), or remove those
Inventory*/Attestation* fields from the payload until real values are available;
ensure you reference the existing functions
inventoryLoopAgentConfig/attestationLoopAgentConfig on the resolved cfg (or the
resolved cfg fields) when populating the AgentConfig.
---
Nitpick comments:
In `@internal/server/server.go`:
- Around line 472-499: The startup jitter is a constant (loopStartupJitter)
causing potential thundering-herd on startAttestationLoop; change it to be
derived from the per-loop interval (obtained via getAttestationInterval) or make
it configurable and pass that computed value into
attestation.AttestationConfig.StartupJitter instead of the fixed
loopStartupJitter so each server uses a jitter scaled to its attestation
interval (update creation in startAttestationLoop and any related config parsing
to expose the new jitter option).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f4a21423-8889-483f-8f8c-bb3d77807219
📒 Files selected for processing (9)
internal/backendclient/types.gointernal/config/config.gointernal/config/config_test.gointernal/enrollment/enrollment.gointernal/inventory/mapper/backend.gointernal/inventory/mapper/backend_test.gointernal/inventory/source/source_test.gointernal/inventory/types.gointernal/server/server.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/config/config.go
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/fleetint/enroll.go`:
- Around line 109-120: Load the env-file defaults before snapshotting the
config: call config.LoadEnvFileDefaults(fleetintEnvFilePath) prior to invoking
config.Default(ctx) so the returned cfg includes env-file-backed settings; then
continue to call configureLoopConfigFromEnv(cfg) to ensure the four loop fields
are set and proceed to performEnrollWorkflow(ctx, baseEndpoint, sakToken, cfg);
this preserves other env-backed values used later (e.g.,
cfg.InventoryAgentConfig).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 54380090-974d-4a05-849f-3ed512afe836
📒 Files selected for processing (12)
cmd/fleetint/enroll.gocmd/fleetint/enroll_test.gointernal/attestation/defaults.gointernal/backendclient/types.gointernal/config/env_file.gointernal/config/env_file_test.gointernal/enrollment/enrollment.gointernal/enrollment/enrollment_test.gointernal/inventory/defaults.gointernal/inventory/mapper/backend.gointernal/inventory/mapper/backend_test.gointernal/server/server.go
✅ Files skipped from review due to trivial changes (2)
- internal/inventory/defaults.go
- internal/attestation/defaults.go
🚧 Files skipped from review as they are similar to previous changes (3)
- internal/inventory/mapper/backend.go
- internal/backendclient/types.go
- internal/server/server.go
Description
Checklist
Summary by CodeRabbit
Configuration Changes
Behavioral Changes
API / Agent Reporting
New Features
Documentation