Skip to content

refactor: simplify inventory and attestation loops#184

Merged
jingxiang-z merged 7 commits into
mainfrom
feat/simplify-loop-scheduling
May 1, 2026
Merged

refactor: simplify inventory and attestation loops#184
jingxiang-z merged 7 commits into
mainfrom
feat/simplify-loop-scheduling

Conversation

@jingxiang-z
Copy link
Copy Markdown
Collaborator

@jingxiang-z jingxiang-z commented Apr 30, 2026

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Summary by CodeRabbit

  • Configuration Changes

    • Removed attestation initial/startup-interval; inventory and attestation intervals now enforce 5m minimums; deployment charts/values and defaults updated.
  • Behavioral Changes

    • StartupJitter replaces prior initial-interval/jitter behavior. Export failures now surface as failures and trigger retry logic with a 5m retry baseline.
  • API / Agent Reporting

    • Agent payloads include inventory/attestation enable flags and interval seconds.
  • New Features

    • Loads defaults from an env-file and passes effective loop config into enrollment workflows.
  • Documentation

    • Docs and Helm README/values updated to reflect env and timing changes.

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

Removes attestation InitialInterval and per-loop Timeouts, replaces per-loop JitterEnabled/InitialInterval with explicit StartupJitter and package-level MinInventoryInterval/MinAttestationInterval (5m), updates loop run logic (runAttempt, startup jitter, retry behavior), surfaces loop enablement/intervals in agent DTOs and docs, and drops FLEETINT_ATTESTATION_INITIAL_INTERVAL env usage.

Changes

Cohort / File(s) Summary
Type Definitions
internal/attestation/types.go, internal/inventory/types.go, internal/backendclient/types.go
Removed InitialInterval and JitterEnabled; added StartupJitter; expanded AgentConfig with Inventory/Attestation enabled flags and interval-seconds fields; JSON tag omitempty removals for some fields.
Configuration Core & Defaults
internal/config/config.go, internal/config/default.go, internal/config/config_test.go
Removed per-loop Timeout and attestation InitialInterval; added MinInventoryInterval/MinAttestationInterval = 5m; refactored validation and added helpers exposing loop enablement/interval seconds; tests updated.
Managers (Inventory & Attestation)
internal/inventory/manager.go, internal/inventory/manager_test.go, internal/inventory/manager_run_test.go, internal/attestation/manager.go, internal/attestation/manager_test.go
Centralized startup jitter sleep, replaced collectOnceForRun with runAttempt, removed capped jitter helpers, use RetryInterval on failures, and propagate ErrNotReady from sinks; tests adapted to new APIs and behavior.
Package Defaults
internal/inventory/defaults.go, internal/attestation/defaults.go
Added package-level constants: DefaultRetryInterval = 5m and DefaultStartupJitter = 1m.
Server & Startup
internal/server/server.go, internal/server/server_test.go, internal/enrollment/enrollment.go
Resolve and log startup_jitter/retry_interval/timeout; obtain loop enabled flags and interval seconds via new helpers; pass effective loop config into agent configs; stop relying on attestation initial interval/jitter from cfg.
Mapping & Backend
internal/inventory/mapper/backend.go, internal/inventory/mapper/backend_test.go
Map new inventory/attestation enablement and interval fields into backend AgentConfig; added helper to clone string slices; tests assert JSON includes explicit zero/default values.
Inventory Collection Behavior
internal/inventory/manager.go, internal/inventory/manager_test.go
CollectOnce now returns ErrNotReady when sink is not ready; run scheduling and non-overlap locking moved into runAttempt.
CLI / Env / Deployments
cmd/fleetint/run.go, cmd/fleetint/enroll.go, cmd/fleetint/enroll_test.go, deployments/helm/fleet-intelligence-agent/README.md, deployments/helm/fleet-intelligence-agent/values.yaml, deployments/packages/systemd/fleetint.env, docs/configuration.md
Stop reading/documenting FLEETINT_ATTESTATION_INITIAL_INTERVAL; enforce minimums (5m) for inventory/attestation intervals in env parsing and docs; enroll command now loads config, applies env-file defaults and passes cfg into enrollment; helm/systemd values/docs updated.
Env file loader
internal/config/env_file.go, internal/config/env_file_test.go
Added DefaultEnvFilePath and LoadEnvFileDefaults(path) to read export KEY=VALUE style env-files without overwriting existing env vars; parsing helpers and tests added.
Enrollment & Sync
internal/enrollment/enrollment.go, internal/enrollment/enrollment_test.go
Added EnrollWithConfig(ctx, baseEndpoint, sakToken, cfg *config.Config) and updated sync-after-enroll to accept cfg; tests added to verify config passing.
Tests & Examples
internal/inventory/source/source_test.go, various *_test.go files`
Tests updated to construct/expect new agent config fields, use runAttempt where appropriate, and remove assertions around removed jitter-cap/timeout fields.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Server as Server
    participant Manager as Manager
    participant Collector as Collector
    participant Sink as Sink
    participant Backend as Backend

    Server->>Manager: start loop(Interval, RetryInterval, StartupJitter, Timeout)
    Manager->>Manager: sleep(random up to StartupJitter)
    loop periodic
        Manager->>Manager: runAttempt()
        Manager->>Collector: CollectOnce(with timeout)
        Collector-->>Manager: snapshot / error
        Manager->>Sink: Export(snapshot)
        Sink->>Backend: send Export
        Backend-->>Sink: ack / ErrNotReady / error
        alt Export succeeds
            Sink-->>Manager: success
            Manager->>Manager: schedule next = Interval
        else Export fails or ErrNotReady
            Sink-->>Manager: error
            Manager->>Manager: schedule next = RetryInterval (if >0)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I twitched my nose at StartupJitter bright,
Five-minute bounds kept my circuits light,
Retries bounce steady when exports delay,
Jitter at dawn, then steady play,
The rabbit hops — loops hum all night.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: simplify inventory and attestation loops' accurately describes the primary change: refactoring the inventory and attestation loop scheduling logic to simplify complexity.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/simplify-loop-scheduling

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: efdf70ad60

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/server/server.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/attestation/manager_test.go (1)

267-289: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This test doesn't actually prove the retry-interval path.

With a 25ms parent deadline, mgr.Run(ctx) returns context.DeadlineExceeded whether the loop sleeps 5ms or 1h, so the elapsed-time assertions can still pass if nextInterval regresses back to Interval. Count attempts on the nodeUUIDProvider closure and assert that the manager retried before the deadline.

Possible fix
 func TestManagerRunUsesRetryIntervalWhenNotEnrolled(t *testing.T) {
+	attempts := 0
 	mgr := NewManager(
-		func(context.Context) (string, error) { return "", ErrNotEnrolled },
+		func(context.Context) (string, error) {
+			attempts++
+			return "", ErrNotEnrolled
+		},
 		&testJWTProvider{jwt: "jwt-token"},
 		&testNonceProvider{nonce: "abc123"},
 		&testEvidenceCollector{resp: &SDKResponse{ResultCode: 200}},
@@
 	ctx, cancel := context.WithTimeout(context.Background(), 25*time.Millisecond)
 	defer cancel()
-	start := time.Now()
 	err := mgr.Run(ctx)
-	elapsed := time.Since(start)
 
 	require.ErrorIs(t, err, context.DeadlineExceeded)
-	require.GreaterOrEqual(t, elapsed, 15*time.Millisecond)
-	require.Less(t, elapsed, 100*time.Millisecond)
+	require.GreaterOrEqual(t, attempts, 2)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestation/manager_test.go` around lines 267 - 289, The test fails
to prove the retry-path because the UUID provider always returns ErrNotEnrolled
without proving multiple attempts; change the first argument to NewManager in
TestManagerRunUsesRetryIntervalWhenNotEnrolled to a closure that increments a
counter (use an atomic counter or channel) each time it's called and returns
("", ErrNotEnrolled), run mgr.Run(ctx) as before, then assert the counter shows
multiple attempts (e.g., attempts >= 2) to prove the retry interval path was
exercised; keep existing context timeout and other providers the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/attestation/manager_test.go`:
- Around line 267-289: The test fails to prove the retry-path because the UUID
provider always returns ErrNotEnrolled without proving multiple attempts; change
the first argument to NewManager in
TestManagerRunUsesRetryIntervalWhenNotEnrolled to a closure that increments a
counter (use an atomic counter or channel) each time it's called and returns
("", ErrNotEnrolled), run mgr.Run(ctx) as before, then assert the counter shows
multiple attempts (e.g., attempts >= 2) to prove the retry interval path was
exercised; keep existing context timeout and other providers the same.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8cf0d2dc-56a0-477e-9f77-8760affdd35f

📥 Commits

Reviewing files that changed from the base of the PR and between f1e5740 and efdf70a.

📒 Files selected for processing (17)
  • cmd/fleetint/run.go
  • deployments/helm/fleet-intelligence-agent/README.md
  • deployments/helm/fleet-intelligence-agent/values.yaml
  • deployments/packages/systemd/fleetint.env
  • docs/configuration.md
  • internal/attestation/manager.go
  • internal/attestation/manager_test.go
  • internal/attestation/types.go
  • internal/config/config.go
  • internal/config/config_test.go
  • internal/config/default.go
  • internal/inventory/manager.go
  • internal/inventory/manager_run_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/types.go
  • internal/server/server.go
  • internal/server/server_test.go
💤 Files with no reviewable changes (2)
  • deployments/packages/systemd/fleetint.env
  • deployments/helm/fleet-intelligence-agent/values.yaml

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/enrollment/enrollment.go (1)

156-177: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Use the effective config here, not config.Default().

These new Inventory*/Attestation* fields are populated from the default config loaded above, so the post-enroll sync now reports default loop settings rather than the user's actual settings. If inventory is disabled in the real config, the backend can be left with permanently incorrect agent-config metadata after enrollment. Please plumb the resolved config into this path, or omit these fields from the enroll-time sync until you have the real values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/enrollment/enrollment.go` around lines 156 - 177, The enroll-time
agent config is using values from the default config instead of the
resolved/user config; update the call that builds the inventory AgentConfig in
inventorysource.NewMachineInfoSourceWithAgentConfig (the
InventoryEnabled/InventoryIntervalSeconds/AttestationEnabled/AttestationIntervalSeconds
fields currently set from variables derived from config.Default()) to use the
resolved config values (the already-loaded cfg or the resolved config variable)
returned earlier rather than config.Default(), or remove those
Inventory*/Attestation* fields from the payload until real values are available;
ensure you reference the existing functions
inventoryLoopAgentConfig/attestationLoopAgentConfig on the resolved cfg (or the
resolved cfg fields) when populating the AgentConfig.
🧹 Nitpick comments (1)
internal/server/server.go (1)

472-499: Consider scaling startup jitter with the loop interval.

A fixed one-minute loopStartupJitter keeps the code simple, but after a fleet-wide restart it can still bunch first-run inventory and especially attestation work into the same minute. Deriving jitter from each loop's interval, or making it configurable, would spread startup load more effectively.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/server.go` around lines 472 - 499, The startup jitter is a
constant (loopStartupJitter) causing potential thundering-herd on
startAttestationLoop; change it to be derived from the per-loop interval
(obtained via getAttestationInterval) or make it configurable and pass that
computed value into attestation.AttestationConfig.StartupJitter instead of the
fixed loopStartupJitter so each server uses a jitter scaled to its attestation
interval (update creation in startAttestationLoop and any related config parsing
to expose the new jitter option).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@internal/enrollment/enrollment.go`:
- Around line 156-177: The enroll-time agent config is using values from the
default config instead of the resolved/user config; update the call that builds
the inventory AgentConfig in inventorysource.NewMachineInfoSourceWithAgentConfig
(the
InventoryEnabled/InventoryIntervalSeconds/AttestationEnabled/AttestationIntervalSeconds
fields currently set from variables derived from config.Default()) to use the
resolved config values (the already-loaded cfg or the resolved config variable)
returned earlier rather than config.Default(), or remove those
Inventory*/Attestation* fields from the payload until real values are available;
ensure you reference the existing functions
inventoryLoopAgentConfig/attestationLoopAgentConfig on the resolved cfg (or the
resolved cfg fields) when populating the AgentConfig.

---

Nitpick comments:
In `@internal/server/server.go`:
- Around line 472-499: The startup jitter is a constant (loopStartupJitter)
causing potential thundering-herd on startAttestationLoop; change it to be
derived from the per-loop interval (obtained via getAttestationInterval) or make
it configurable and pass that computed value into
attestation.AttestationConfig.StartupJitter instead of the fixed
loopStartupJitter so each server uses a jitter scaled to its attestation
interval (update creation in startAttestationLoop and any related config parsing
to expose the new jitter option).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f4a21423-8889-483f-8f8c-bb3d77807219

📥 Commits

Reviewing files that changed from the base of the PR and between efdf70a and f9e6adc.

📒 Files selected for processing (9)
  • internal/backendclient/types.go
  • internal/config/config.go
  • internal/config/config_test.go
  • internal/enrollment/enrollment.go
  • internal/inventory/mapper/backend.go
  • internal/inventory/mapper/backend_test.go
  • internal/inventory/source/source_test.go
  • internal/inventory/types.go
  • internal/server/server.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/config/config.go

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@jingxiang-z jingxiang-z requested a review from mukilsh April 30, 2026 22:43
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/fleetint/enroll.go`:
- Around line 109-120: Load the env-file defaults before snapshotting the
config: call config.LoadEnvFileDefaults(fleetintEnvFilePath) prior to invoking
config.Default(ctx) so the returned cfg includes env-file-backed settings; then
continue to call configureLoopConfigFromEnv(cfg) to ensure the four loop fields
are set and proceed to performEnrollWorkflow(ctx, baseEndpoint, sakToken, cfg);
this preserves other env-backed values used later (e.g.,
cfg.InventoryAgentConfig).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 54380090-974d-4a05-849f-3ed512afe836

📥 Commits

Reviewing files that changed from the base of the PR and between 2365b4c and 671c821.

📒 Files selected for processing (12)
  • cmd/fleetint/enroll.go
  • cmd/fleetint/enroll_test.go
  • internal/attestation/defaults.go
  • internal/backendclient/types.go
  • internal/config/env_file.go
  • internal/config/env_file_test.go
  • internal/enrollment/enrollment.go
  • internal/enrollment/enrollment_test.go
  • internal/inventory/defaults.go
  • internal/inventory/mapper/backend.go
  • internal/inventory/mapper/backend_test.go
  • internal/server/server.go
✅ Files skipped from review due to trivial changes (2)
  • internal/inventory/defaults.go
  • internal/attestation/defaults.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/inventory/mapper/backend.go
  • internal/backendclient/types.go
  • internal/server/server.go

Comment thread cmd/fleetint/enroll.go
@jingxiang-z jingxiang-z merged commit 18af139 into main May 1, 2026
9 checks passed
@jingxiang-z jingxiang-z deleted the feat/simplify-loop-scheduling branch May 1, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants