Skip to content

feat: add backend client and inventory attestation call#176

Merged
jingxiang-z merged 31 commits into
mainfrom
feat/agent-backend-client-skeleton
Apr 28, 2026
Merged

feat: add backend client and inventory attestation call#176
jingxiang-z merged 31 commits into
mainfrom
feat/agent-backend-client-skeleton

Conversation

@jingxiang-z
Copy link
Copy Markdown
Collaborator

@jingxiang-z jingxiang-z commented Apr 15, 2026

Summary

Implements the agent-side client and runtime loops for the new backend-facing inventory and attestation architecture described in the design doc.

This PR:

  • Adds internal/backendclient — typed HTTP client for all five agent endpoints (/v1/agent/enroll, /v1/agent/nodes/{nodeUUID}, /v1/agent/nodes/{nodeUUID}/nonce, /v1/agent/nodes/{nodeUUID}/attestation, /v1/agent/token)
  • Adds internal/agentstate — centralized persistent state access (JWT, SAK, backend base URL, node ID) backed by the existing SQLite metadata database, with legacy endpoint fallback
  • Refactors internal/attestation — replaces the monolithic attestation.go with split manager, collector, backend, nonce, and types packages using backendclient and agentstate
  • Adds internal/inventory — source, sink, and manager for periodic node inventory upsert via PUT /v1/agent/nodes/{nodeUUID}
  • Wires enrollment to use backendclient.Enroll and performs a best-effort inventory sync immediately after enrollment
  • Wires both inventory and attestation loops into the server startup path (startInventoryLoop, startAttestationLoop)
  • Removes machine info fields from OTLP resource attributes — inventory fields now go exclusively through PUT /v1/agent/nodes/{nodeUUID}
  • Moves AttestationConfig out of HealthExporterConfig to top-level Config and adds InventoryConfig with Enabled/Interval/InitialInterval controls and env var overrides
  • Deletes the legacy internal/attestation/attestation.go monolith and its associated tests
  • Bumps Go toolchain to 1.26.2

Validation

go test -race ./internal/backendclient/... ./internal/agentstate/... ./internal/attestation/... ./internal/inventory/... ./internal/enrollment/... ./internal/config/... ./cmd/fleetint/...

Follow-up

  • Attach X-GPUHealth-Agent-Mode: direct-inventory-v1 header on OTLP metrics/logs requests to suppress legacy node extraction on the backend
  • Add serial, vendor, model fields to DiskInfo.BlockDevice per the inventory field classification in the design doc

Summary by CodeRabbit

  • New Features

    • Periodic local inventory collection with change-only backend sync and initial post-enroll sync.
    • Modular attestation flow with nonce/evidence providers, CLI evidence collector, and backend submitter.
    • Local agent state persistence for backend URL, JWT, SAK, and node ID.
  • Configuration

    • New loop controls for inventory and attestation (enable, intervals, initial-interval, timeouts); defaults provided.
  • Other

    • Go toolchain bumped to 1.26.2; UI/CSV label changed to "Agent Version"; backend endpoint validation allows localhost/loopback HTTP.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

Adds agent-state persistence, a backend HTTP client, inventory collection and mapping, and a modular attestation subsystem; enrollment now uses the backend client and persists metadata to SQLite; server starts inventory/attestation loops; CI, deployment, and config defaults/validation updated.

Changes

Cohort / File(s) Summary
Agent State
internal/agentstate/state.go, internal/agentstate/sqlite.go, internal/agentstate/sqlite_test.go
New exported State interface and SQLite-backed implementation for persisted metadata keys (backend_base_url, sak_token, JWT, node ID) with context-aware getters (value, ok, err) and setters that persist and secure state.
Backend Client
internal/backendclient/client.go, internal/backendclient/types.go, internal/backendclient/errors.go, internal/backendclient/client_test.go, internal/backendclient/errors_test.go
New HTTP backend client and DTOs with constructors, JSON request helper, response-size limiting, redirect-rejection, HTTPStatusError type, error mapping for enroll, and comprehensive tests.
Inventory Subsystem
internal/inventory/types.go, internal/inventory/manager.go, internal/inventory/hash.go, internal/inventory/mapper/backend.go, internal/inventory/sink/backend.go, internal/inventory/*_test.go
New inventory model, Manager for periodic collection with jitter/retry, hash-based export suppression, mapper to backend upsert request, backend sink that reads agent state to create clients, and many tests for scheduling, hashing, mapping, and export behavior.
Attestation (Modularized)
internal/attestation/types.go, internal/attestation/manager.go, internal/attestation/collector.go, internal/attestation/nonce.go, internal/attestation/backend.go, internal/attestation/*_test.go
Replace monolith with modular interfaces (NonceProvider, EvidenceCollector, Submitter, Manager), CLI evidence collector, backend adapters that resolve clients from agent state, manager orchestration and tests; old monolithic implementation and its tests removed.
Enrollment & CLI
internal/enrollment/enrollment.go, internal/enrollment/enrollment_test.go, cmd/fleetint/enroll.go, cmd/fleetint/enroll_test.go
Enrollment workflow now validates/normalizes base endpoint, uses backend client Enroll, persists SAK/JWT/backend_base_url to SQLite (secureing state file) and triggers best-effort inventory sync; CLI wiring updated to call new workflow and tests adjusted.
Server & Background Loops
internal/server/server.go, internal/server/server_test.go, cmd/fleetint/run.go
Server now starts inventory and attestation managers using state-backed providers and CLI evidence collector; new env/config loop handling added and loop-interval helpers introduced.
Exporter & Collector
internal/exporter/collector/collector.go, internal/exporter/collector_test.go, internal/exporter/exporter.go, internal/exporter/*
Removed attestation collection/config entries from health collector; HealthData now includes GPU UUID→index mapping; JWT refresh uses backend client with backend_base_url; tests and converters updated to reflect removed attestation and machine-info changes.
Config API & Defaults
internal/config/*
Moved attestation/inventory loop configs to top-level Config, added Inventory config, updated defaults (inventory 1h, attestation initial 5m/24h), and strengthened validation (interval and initial_interval constraints).
Endpoint Validation
internal/endpoint/endpoint.go, internal/endpoint/endpoint_test.go
ValidateBackendEndpoint permits http only for loopback hosts; added DeriveBackendBaseURL to extract scheme+host from legacy endpoints and used as fallback.
CLI Status / Unenroll
cmd/fleetint/status.go, cmd/fleetint/unenroll.go, cmd/fleetint/unenroll_test.go
Status now derives metrics/logs endpoints from backend_base_url with legacy fallback; unenroll deletes backend metadata using agentstate constants; tests updated.
Machine Info Rename
internal/machineinfo/machineinfo.go, internal/machineinfo/machineinfo_test.go
Renamed FleetintVersionAgentVersion (field, JSON key, display label); tests adapted.
Build & Deployment
.github/workflows/ci.yml, .github/workflows/release.yml, .golangci.yml, deployments/*, docs/*, SECURITY.md
Bumped Go toolchain to 1.26.2 across CI/lint; replaced attestation jitter env with inventory/attestation scheduling envs in deployment values and docs; SECURITY.md attestation paths updated.

Sequence Diagram(s)

sequenceDiagram
    participant Manager as Attestation Manager
    participant State as agentstate.State
    participant Backend as backendclient.Client
    participant Collector as EvidenceCollector

    Manager->>State: GetNodeID(ctx)
    State-->>Manager: nodeID
    Manager->>State: GetJWT(ctx)
    State-->>Manager: jwt
    Manager->>Backend: GetNonce(ctx, nodeID, jwt)
    Backend-->>Manager: nonce, refreshTS, refreshedJWT
    alt refreshedJWT present
        Manager->>State: SetJWT(ctx, refreshedJWT)
        State-->>Manager: ok
    end
    Manager->>Collector: Collect(ctx, nonce)
    Collector-->>Manager: SDKResponse
    Manager->>Backend: SubmitAttestation(ctx, nodeID, mappedRequest, jwt)
    Backend-->>Manager: success / error
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I hopped through state and backend lines,

I hashed the snapshots, saved the signs,
I chased a nonce through HTTP,
Collected evidence from CLI,
New loops now hum beneath the pines. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add backend client and inventory attestation call' accurately describes the main changes: it adds internal/backendclient for HTTP communication and integrates inventory/attestation workflows. The title captures the primary feature additions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-backend-client-skeleton

Comment @coderabbitai help to get the list of available commands and usage tips.

@jingxiang-z jingxiang-z changed the title chore: add backend client and inventory skeletons feat: add backend client and inventory skeletons Apr 15, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
internal/agentstate/state.go (1)

21-34: State interface should follow the existing interface segregation pattern or clarify its purpose relative to existing StateStore interfaces.

The State interface is currently unused in the codebase. However, the codebase already demonstrates interface segregation: JWTProvider (GetJWT/SetJWT only) is used by the attestation manager, and package-specific StateStore interfaces exist in both internal/attestationloop/types.go and internal/inventory/types.go. If State is intended as a unified interface, consider splitting it into narrower, focused interfaces (e.g., JWTStore, SAKStore, BackendURLStore, NodeIDStore) to reduce coupling, or clarify how it relates to the existing StateStore pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agentstate/state.go` around lines 21 - 34, The State interface
currently groups unrelated getters/setters (GetBackendBaseURL/SetBackendBaseURL,
GetJWT/SetJWT, GetSAK/SetSAK, GetNodeID/SetNodeID) and conflicts with the
existing interface segregation pattern (see JWTProvider and package StateStore
types in internal/attestationloop/types.go and internal/inventory/types.go); fix
by splitting State into focused interfaces (e.g., JWTStore with GetJWT/SetJWT,
SAKStore with GetSAK/SetSAK, BackendURLStore with
GetBackendBaseURL/SetBackendBaseURL, NodeIDStore with GetNodeID/SetNodeID) and
update any consumers to depend on the narrow interface(s) they actually need, or
alternatively add clear package-level comments explaining State is a unified
facade and ensure it is actually used instead of the existing
JWTProvider/StateStore types.
internal/store/memory.go (1)

49-55: Note: Large struct parameters are dictated by interface contracts.

Static analysis flags snap (496 bytes) and result (136 bytes) as heavy parameters. These signatures are constrained by the inventory.StateStore and attestationloop.StateStore interfaces which define value semantics.

If copy overhead becomes a concern, consider updating the interfaces to use pointer parameters (*inventory.Snapshot, *attestationloop.Result). For this skeleton, the current approach is acceptable.

Also applies to: 77-83

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/store/memory.go` around lines 49 - 55, The review notes that
large-value parameters (inventory.Snapshot in MemoryStore.PutInventory and
attestationloop.Result elsewhere) are expensive to copy but are required by the
current interfaces; to address this either (A) leave the implementation as-is
for this skeleton, or (B) change the method signatures on the interfaces
inventory.StateStore and attestationloop.StateStore to accept pointer types
(*inventory.Snapshot, *attestationloop.Result) and update all implementations
(e.g., MemoryStore.PutInventory) and callers to use pointers to avoid copies;
locate uses by searching for MemoryStore.PutInventory, inventory.Snapshot,
attestationloop.Result, and the StateStore interfaces and consistently switch
signatures and call sites if you opt for the pointer change.
internal/backendclient/client.go (1)

90-98: Consider: Large request structs passed by value.

Static analysis flags NodeUpsertRequest (456 bytes) and AttestationRequest (96 bytes) as heavy parameters. Since these are immediately marshaled to JSON in doJSON, the copy overhead is negligible compared to serialization and network I/O.

If you prefer pointer semantics for consistency with Go conventions on large structs, update the interface signatures. Otherwise, value semantics are acceptable here.

Also applies to: 118-126

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backendclient/client.go` around lines 90 - 98, The NodeUpsertRequest
(456 bytes) and AttestationRequest (96 bytes) are large structs—switch the
public method signatures to pointer semantics for consistency: change func (c
*client) UpsertNode(ctx context.Context, nodeID string, req NodeUpsertRequest,
jwt string) error to use *NodeUpsertRequest, and the analogous
Attest/Attestation method (the one referenced at lines 118-126) to use
*AttestationRequest; update all internal callers to pass pointers, add a nil
check for the pointer params (return an error if nil), and pass the pointer
directly into c.doJSON (no other marshaling changes needed). Ensure any
interface or tests that call these methods are updated to use &struct literals
or existing pointers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/agentstate/state.go`:
- Around line 22-34: The getters on the State interface (GetBackendBaseURL,
GetJWT, GetSAK, GetNodeID) currently return (string, error) which makes "unset"
indistinguishable from an empty persisted value; change each getter signature to
return an explicit presence flag, e.g. (string, bool, error) (or equivalent like
(*string, error) if preferred), update the State interface method signatures
accordingly, and update all implementations of State to return the boolean
presence flag (true when a value exists, false when not persisted) so callers
can reliably differentiate unset vs empty string without relying on errors.

In `@internal/attestationloop/sink/backend.go`:
- Around line 40-45: In Export (method backendSink.Export) add nil-check guards
for the JWT and client dependencies: after calling s.jwt(ctx) validate the
returned jwt is not nil and return a clear error if it is; likewise ensure
s.client is not nil before calling s.client.SubmitAttestation and return an
explicit error if missing; keep the existing call to
s.client.SubmitAttestation(result.NodeID, mapper.ToAttestationRequest(result),
jwt) only after both checks succeed to avoid panics.

In `@internal/backendclient/client_test.go`:
- Around line 42-44: The test assigns c via New(server.URL) and then immediately
overwrites it with NewWithHTTPClient, making the first call dead code; either
remove the first assignment to only test NewWithHTTPClient, or split into two
assertions/tests so both constructors are exercised: keep the New(server.URL)
call in a separate subtest or add checks after each constructor (using variable
names like c1/c2 or separate t.Run blocks) to validate New and NewWithHTTPClient
(with mustParseURL and server.Client()) independently.

In `@internal/backendclient/client.go`:
- Around line 33-36: The file containing the constants userAgent and
maxResponseBodyBytes has gofmt formatting issues; run gofmt -w on that source
file to apply standard Go formatting, then re-stage the file and ensure the
constants block (and surrounding code) are correctly formatted before pushing
the change.

In `@internal/backendclient/errors.go`:
- Around line 32-36: The HTTPStatusError.Error method currently returns the
entire e.Body which can leak sensitive data and create huge logs; modify
HTTPStatusError.Error to sanitize and cap the body fragment before including it
(e.g., trim whitespace, replace newlines, limit to a safe length like 200
characters and append "…(truncated)" if longer) and fall back to a status-only
message when the sanitized fragment is empty; update references to the
HTTPStatusError struct/fields (Error(), Body, StatusCode) so the error string
never exposes unbounded raw response content.

In `@internal/inventory/mapper/backend.go`:
- Around line 42-52: The NodeResources construction (backendclient.NodeResources
in the inventory mapper) currently only maps CPUInfo and MemoryInfo and drops
GPUInfo, DiskInfo, and NICInfo from s.Resources; update the mapping in the
function that builds backendclient.NodeResources to include GPUInfo, DiskInfo,
and NICInfo by mapping s.Resources.GPUInfo, s.Resources.DiskInfo and
s.Resources.NICInfo into the corresponding backendclient types (preserving any
slice elements or nested fields), ensuring any necessary conversions for element
types or fields are implemented alongside CPUInfo and MemoryInfo so no resource
inventory is omitted.

In `@internal/inventory/sink/backend.go`:
- Around line 40-45: The Export method may panic when dependencies are nil:
check the result of s.jwt(ctx) for a nil jwt (in addition to err) and return a
descriptive error instead of proceeding, and also guard s.client (and/or
s.client.UpsertNode) before calling UpsertNode; if s.client is nil return a
clear error (e.g., using fmt.Errorf) so Export fails gracefully rather than
panicking. Ensure these checks are added inside backendSink.Export around the
calls to s.jwt and s.client.UpsertNode and keep error messages referencing
Export, jwt, and client for clarity.

In `@internal/store/memory.go`:
- Around line 32-36: The file internal/store/memory.go has gofmt/formatting
issues around the struct fields (inventory, hasInventory, lastInventoryHash,
lastInventorySyncTS); run the formatter to fix whitespace and alignment by
executing `gofmt -w internal/store/memory.go` (or apply equivalent editor Go
formatting) so the struct and surrounding code conform to gofmt standards.

---

Nitpick comments:
In `@internal/agentstate/state.go`:
- Around line 21-34: The State interface currently groups unrelated
getters/setters (GetBackendBaseURL/SetBackendBaseURL, GetJWT/SetJWT,
GetSAK/SetSAK, GetNodeID/SetNodeID) and conflicts with the existing interface
segregation pattern (see JWTProvider and package StateStore types in
internal/attestationloop/types.go and internal/inventory/types.go); fix by
splitting State into focused interfaces (e.g., JWTStore with GetJWT/SetJWT,
SAKStore with GetSAK/SetSAK, BackendURLStore with
GetBackendBaseURL/SetBackendBaseURL, NodeIDStore with GetNodeID/SetNodeID) and
update any consumers to depend on the narrow interface(s) they actually need, or
alternatively add clear package-level comments explaining State is a unified
facade and ensure it is actually used instead of the existing
JWTProvider/StateStore types.

In `@internal/backendclient/client.go`:
- Around line 90-98: The NodeUpsertRequest (456 bytes) and AttestationRequest
(96 bytes) are large structs—switch the public method signatures to pointer
semantics for consistency: change func (c *client) UpsertNode(ctx
context.Context, nodeID string, req NodeUpsertRequest, jwt string) error to use
*NodeUpsertRequest, and the analogous Attest/Attestation method (the one
referenced at lines 118-126) to use *AttestationRequest; update all internal
callers to pass pointers, add a nil check for the pointer params (return an
error if nil), and pass the pointer directly into c.doJSON (no other marshaling
changes needed). Ensure any interface or tests that call these methods are
updated to use &struct literals or existing pointers.

In `@internal/store/memory.go`:
- Around line 49-55: The review notes that large-value parameters
(inventory.Snapshot in MemoryStore.PutInventory and attestationloop.Result
elsewhere) are expensive to copy but are required by the current interfaces; to
address this either (A) leave the implementation as-is for this skeleton, or (B)
change the method signatures on the interfaces inventory.StateStore and
attestationloop.StateStore to accept pointer types (*inventory.Snapshot,
*attestationloop.Result) and update all implementations (e.g.,
MemoryStore.PutInventory) and callers to use pointers to avoid copies; locate
uses by searching for MemoryStore.PutInventory, inventory.Snapshot,
attestationloop.Result, and the StateStore interfaces and consistently switch
signatures and call sites if you opt for the pointer change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c225851a-de17-4aed-b5a7-ab146dd9e0d7

📥 Commits

Reviewing files that changed from the base of the PR and between 07f3216 and c7fdf10.

📒 Files selected for processing (16)
  • internal/agentstate/state.go
  • internal/attestationloop/manager.go
  • internal/attestationloop/mapper/backend.go
  • internal/attestationloop/sink/backend.go
  • internal/attestationloop/source/source.go
  • internal/attestationloop/types.go
  • internal/backendclient/client.go
  • internal/backendclient/client_test.go
  • internal/backendclient/errors.go
  • internal/backendclient/types.go
  • internal/inventory/manager.go
  • internal/inventory/mapper/backend.go
  • internal/inventory/sink/backend.go
  • internal/inventory/source/source.go
  • internal/inventory/types.go
  • internal/store/memory.go

Comment thread internal/agentstate/state.go
Comment thread internal/attestationloop/sink/backend.go Outdated
Comment thread internal/backendclient/client_test.go Outdated
Comment thread internal/backendclient/client.go
Comment thread internal/backendclient/errors.go
Comment thread internal/inventory/mapper/backend.go
Comment thread internal/inventory/sink/backend.go Outdated
Comment thread internal/store/memory.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
cmd/fleetint/unenroll.go (1)

67-78: Consider generating placeholders dynamically to avoid count mismatch.

The addition of "backend_base_url" is correct and aligns with the enrollment changes. However, manually maintaining the placeholder count in the query string is error-prone.

♻️ Proposed refactor to generate placeholders dynamically
 	keysToDelete := []string{
 		pkgmetadata.MetadataKeyToken,
 		"sak_token",
 		"backend_base_url",
 		"enroll_endpoint",
 		"metrics_endpoint",
 		"logs_endpoint",
 		"nonce_endpoint",
 	}

 	// Build batch delete query
-	query := "DELETE FROM gpud_metadata WHERE key IN (?, ?, ?, ?, ?, ?, ?)"
+	placeholders := make([]string, len(keysToDelete))
+	for i := range placeholders {
+		placeholders[i] = "?"
+	}
+	query := "DELETE FROM gpud_metadata WHERE key IN (" + strings.Join(placeholders, ", ") + ")"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/fleetint/unenroll.go` around lines 67 - 78, The hard-coded query
placeholder count can go out of sync with keysToDelete; instead, generate the
IN-list dynamically from len(keysToDelete) (e.g., build a slice of "?" repeated
len(keysToDelete) and strings.Join it with commas) and then set query =
fmt.Sprintf("DELETE FROM gpud_metadata WHERE key IN (%s)", placeholdersJoined);
update the code around keysToDelete and query to use this generated placeholder
string and import strings/fmt as needed.
internal/agentstate/sqlite.go (1)

56-62: Consider defining a constant for "sak_token".

The string literal "sak_token" is used here and in cmd/fleetint/enroll.go and cmd/fleetint/unenroll.go. A shared constant would prevent typo-related bugs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agentstate/sqlite.go` around lines 56 - 62, Introduce a shared
constant for the metadata key currently hardcoded as "sak_token" and replace the
literal in sqliteState.GetSAK and sqliteState.SetSAK with that constant; update
the other usages in cmd/fleetint/enroll.go and cmd/fleetint/unenroll.go to
import or reference the same constant so all places (including the getMetadata
and setMetadata calls in GetSAK and SetSAK) use the single source of truth
(e.g., define SAKMetadataKey or similar and use it in GetSAK, SetSAK, enroll,
and unenroll).
internal/inventory/manager_run_test.go (1)

57-75: Potential flakiness with time.Sleep for synchronization.

The test relies on time.Sleep(25 * time.Millisecond) to ensure at least one export occurs before cancellation. While this should work given the 10ms interval, sleep-based synchronization can be flaky under CI load. Consider using a channel signal from the sink when an export occurs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/manager_run_test.go` around lines 57 - 75, The test
TestManagerRunStopsOnContextCancel uses time.Sleep for synchronization which can
be flaky; modify the fakeSink used by the test to include a notification channel
(e.g., add a field like exportedCh chan struct{} to the fakeSink and close/send
on it inside the sink's Export method), then in
TestManagerRunStopsOnContextCancel wait on that channel with a timeout instead
of sleeping (select on exportedCh and a time.After), then cancel the context and
assert NewManager(...).Run(ctx) returns context.Canceled and that sink.exported
is not empty; update fakeSink and the test to use exportedCh to
deterministically detect the first export.
cmd/fleetint/enroll.go (1)

169-171: Use a shared constant for "backend_base_url" to avoid duplication across files.

The hardcoded string "backend_base_url" appears in multiple locations (enroll.go:169 and unenroll.go:70) but a matching constant metadataKeyBackendBaseURL already exists in internal/agentstate/sqlite.go:29. Either export this constant from agentstate and use it consistently, or define a shared constant in a more appropriate location.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/fleetint/enroll.go` around lines 169 - 171, Replace the hardcoded
"backend_base_url" string used in pkgmetadata.SetMetadata calls with a single
shared constant; either export the existing metadataKeyBackendBaseURL from
internal/agentstate (rename to MetadataKeyBackendBaseURL) and import that
constant into cmd/fleetint/enroll.go and cmd/fleetint/unenroll.go, or define a
shared constant in a common package (e.g., pkgmetadata) and update both
enroll.go and unenroll.go to use that constant instead of the literal; update
imports and any references to the old unexported name accordingly.
cmd/fleetint/enroll_test.go (1)

79-90: Assert performInventorySync call behavior explicitly.

These tests stub performInventorySync but don’t verify whether it was invoked. Adding explicit assertions will catch regressions in the post-enroll flow.

✅ Suggested test hardening
 func TestEnrollCommandBlocksOnFailedPrecheck(t *testing.T) {
@@
 	enrollmentCalled := false
+	inventorySyncCalled := false
@@
-	performInventorySync = func(context.Context) error { return nil }
+	performInventorySync = func(context.Context) error {
+		inventorySyncCalled = true
+		return nil
+	}
@@
 	assert.Contains(t, out.String(), "Enrollment skipped: precheck failed")
 	assert.False(t, enrollmentCalled)
+	assert.False(t, inventorySyncCalled)
 }

 func TestEnrollCommandForceBypassesFailedPrecheck(t *testing.T) {
@@
 	enrollmentCalled := false
+	inventorySyncCalled := false
@@
-	performInventorySync = func(context.Context) error { return nil }
+	performInventorySync = func(context.Context) error {
+		inventorySyncCalled = true
+		return nil
+	}
@@
 	require.NoError(t, err)
 	assert.True(t, enrollmentCalled)
+	assert.True(t, inventorySyncCalled)
 }

Also applies to: 120-129

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/fleetint/enroll_test.go` around lines 79 - 90, The test stubs
performInventorySync but never asserts whether it was called; update the enroll
tests (the block around the existing performInventorySync stub and the similar
block at lines 120-129) to replace the noop stub with a spy that sets a boolean
(e.g., inventorySyncCalled) or increments a counter when invoked, and add
assertions (require.True/False or assert.Equal) after app.Run to explicitly
verify that performInventorySync was (or was not) called as expected; reference
the performInventorySync symbol and the existing enrollmentCalled boolean to
implement the spy and assertions.
internal/inventory/source/source_test.go (1)

37-114: Add explicit error-path tests for source guards.

Happy-path mapping looks good, but this file should also cover: nil collector, collector error, and nil machine-info returns to lock in Collect guard behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/source/source_test.go` around lines 37 - 114, Add tests
exercising Collect's error/guard paths: create three new subtests that call
NewMachineInfoSource with (1) a nil collector and assert Collect returns an
error, (2) a fakeMachineInfoCollector whose Collect returns an error and assert
that error is propagated, and (3) a fakeMachineInfoCollector that returns (nil,
nil) for MachineInfo and assert Collect returns an error or empty snapshot per
the current guard behavior; reference NewMachineInfoSource,
fakeMachineInfoCollector and its Collect method, and the source.Collect call to
locate where to add these assertions.
internal/inventory/sink/backend_test.go (1)

120-135: Verify base URL is passed from state into clientFactory.

TestBackendSinkExportUsesState checks JWT usage but currently does not assert the baseURL argument used to create the backend client.

🔍 Suggested assertion addition
 func TestBackendSinkExportUsesState(t *testing.T) {
 	client := &fakeClient{}
+	var gotBaseURL string
 	s := &backendSink{
 		state: fakeState{
 			baseURL: "https://example.com",
 			jwt:     "jwt-token",
 		},
-		clientFactory: func(string) (backendclient.Client, error) {
+		clientFactory: func(rawBaseURL string) (backendclient.Client, error) {
+			gotBaseURL = rawBaseURL
 			return client, nil
 		},
 	}
@@
 	require.NoError(t, err)
+	require.Equal(t, "https://example.com", gotBaseURL)
 	require.Equal(t, "node-1", client.nodeID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/sink/backend_test.go` around lines 120 - 135, Update the
test TestBackendSinkExportUsesState to verify the baseURL passed into the
clientFactory matches the sink state’s backend endpoint: modify the
clientFactory closure used in the test (the func(string) (backendclient.Client,
error) assigned to clientFactory) to capture its string argument into a local
variable (e.g., gotBaseURL) and after calling s.Export with the
inventory.Snapshot, assert that gotBaseURL equals the expected backend endpoint
from the sink state; keep the existing assertions for client.nodeID, client.jwt
and client.req.
internal/backendclient/client.go (1)

213-239: Consider wrapping errors to preserve diagnostic context.

The mapped errors replace the original HTTPStatusError, discarding potentially useful diagnostic information (e.g., response body). For debugging production issues, consider wrapping instead:

return fmt.Errorf("the token used in the enrollment is not in the correct format: %w", statusErr)

This preserves the ability to inspect the original status code and body via errors.As.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backendclient/client.go` around lines 213 - 239, The mapped errors
in mapEnrollError replace the original HTTPStatusError (statusErr) and lose
diagnostic context; update each returned error to wrap the original statusErr
using the Go %w wrapping (e.g., return fmt.Errorf("<message>: %w", statusErr))
so callers can still inspect the underlying HTTPStatusError (status code, body)
via errors.As; apply this change to every case in mapEnrollError while leaving
the default branch returning err unchanged.
internal/attestationloop/manager.go (3)

115-122: Clarify semantics of lastResult on submission failure.

lastResult is updated before submission (line 117), so it reflects the last attempted collection even if submission fails (line 119). If consumers expect lastResult to represent successfully submitted results only, consider updating it after successful submission instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestationloop/manager.go` around lines 115 - 122, The code sets
m.lastResult before calling m.submitter.Submit so lastResult reflects attempted
collections even when Submit fails; move the clone-and-assign of m.lastResult to
after successful submission (i.e., only assign m.lastResult = &cloned after
m.submitter.Submit(ctx, result, jwt) returns nil), keeping the same clone
behavior and mutex (use m.mu.Lock()/Unlock() around the assignment), and leave
Submit called with the original result; this ensures lastResult represents only
successfully submitted results.

51-68: Consider validating required dependencies in constructor.

Dependencies are validated in CollectOnce (line 78), but deferring validation until runtime delays detection of misconfiguration. For production, consider validating non-nil dependencies in NewManager to fail fast during initialization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestationloop/manager.go` around lines 51 - 68, NewManager should
fail fast by validating required dependencies instead of deferring to
CollectOnce; update the constructor (NewManager) to check that nodeIDProvider,
jwtProvider, nonceProvider, collector, submitter and interval are non-nil/valid
and return an error (or panic consistently with project conventions) when any
required dependency is missing, and update callers to handle the changed
signature if you switch to returning (Manager, error); this ensures
misconfiguration is detected at initialization rather than in CollectOnce.

115-118: Shallow copy shares underlying slice data in SDKResponse.Evidences.

cloned := *result creates a shallow copy of the Result struct. Since SDKResponse contains Evidences []EvidenceItem (a slice, which is a reference type), both the cloned result and the original share the same underlying array. If the original result is modified after this call, m.lastResult will reflect those changes, potentially causing data corruption in concurrent scenarios.

Either implement a deep copy, or document that result must not be modified after this call.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestationloop/manager.go` around lines 115 - 118, The shallow copy
using "cloned := *result" leaves SDKResponse.Evidences shared between the
original and m.lastResult; change to a deep copy: allocate a new SDKResponse (or
Result) instance, copy primitive fields, and create a new Evidences slice then
copy or deep-clone each EvidenceItem into it before assigning to m.lastResult
(ensure you deep-copy any reference fields inside EvidenceItem if present);
update the assignment that currently uses "cloned := *result" and "m.lastResult
= &cloned" to store the fully independent copy instead (alternatively document
immutability of result if you prefer not to copy).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/agentstate/sqlite.go`:
- Around line 102-107: The code silently ignores an error from s.stateFileFn(),
which can mask path resolution failures even if openReadWrite() succeeded;
update the block around stateFile, err := s.stateFileFn() to handle err != nil
by returning or propagating it (e.g., return fmt.Errorf("stateFileFn: %w", err))
instead of swallowing it, and then call
config.SecureStateFilePermissions(stateFile) only when err == nil; reference the
s.stateFileFn function and config.SecureStateFilePermissions to locate and fix
the logic.

In `@internal/backendclient/client_test.go`:
- Around line 34-40: The test HTTP handler passed to httptest.NewTLSServer in
client_test.go uses require.* inside the handler goroutine which can call
t.FailNow() and is unsafe; change these checks to either capture request values
(method, path, Authorization, User-Agent, etc.) into local variables and perform
require/assert checks in the main test goroutine after the client request
returns, or replace require.* with assert.* inside the handler (e.g., replace
require.Equal calls for http.MethodPost, "/v1/agent/enroll", headers with
assert.Equal). Update all similar handlers noted (the handler in the
NewTLSServer call and the other handler blocks referenced around lines 62-69,
83-86, 104-107, 120-127) so assertions run safely from the main test goroutine
or use assert.*.

---

Nitpick comments:
In `@cmd/fleetint/enroll_test.go`:
- Around line 79-90: The test stubs performInventorySync but never asserts
whether it was called; update the enroll tests (the block around the existing
performInventorySync stub and the similar block at lines 120-129) to replace the
noop stub with a spy that sets a boolean (e.g., inventorySyncCalled) or
increments a counter when invoked, and add assertions (require.True/False or
assert.Equal) after app.Run to explicitly verify that performInventorySync was
(or was not) called as expected; reference the performInventorySync symbol and
the existing enrollmentCalled boolean to implement the spy and assertions.

In `@cmd/fleetint/enroll.go`:
- Around line 169-171: Replace the hardcoded "backend_base_url" string used in
pkgmetadata.SetMetadata calls with a single shared constant; either export the
existing metadataKeyBackendBaseURL from internal/agentstate (rename to
MetadataKeyBackendBaseURL) and import that constant into cmd/fleetint/enroll.go
and cmd/fleetint/unenroll.go, or define a shared constant in a common package
(e.g., pkgmetadata) and update both enroll.go and unenroll.go to use that
constant instead of the literal; update imports and any references to the old
unexported name accordingly.

In `@cmd/fleetint/unenroll.go`:
- Around line 67-78: The hard-coded query placeholder count can go out of sync
with keysToDelete; instead, generate the IN-list dynamically from
len(keysToDelete) (e.g., build a slice of "?" repeated len(keysToDelete) and
strings.Join it with commas) and then set query = fmt.Sprintf("DELETE FROM
gpud_metadata WHERE key IN (%s)", placeholdersJoined); update the code around
keysToDelete and query to use this generated placeholder string and import
strings/fmt as needed.

In `@internal/agentstate/sqlite.go`:
- Around line 56-62: Introduce a shared constant for the metadata key currently
hardcoded as "sak_token" and replace the literal in sqliteState.GetSAK and
sqliteState.SetSAK with that constant; update the other usages in
cmd/fleetint/enroll.go and cmd/fleetint/unenroll.go to import or reference the
same constant so all places (including the getMetadata and setMetadata calls in
GetSAK and SetSAK) use the single source of truth (e.g., define SAKMetadataKey
or similar and use it in GetSAK, SetSAK, enroll, and unenroll).

In `@internal/attestationloop/manager.go`:
- Around line 115-122: The code sets m.lastResult before calling
m.submitter.Submit so lastResult reflects attempted collections even when Submit
fails; move the clone-and-assign of m.lastResult to after successful submission
(i.e., only assign m.lastResult = &cloned after m.submitter.Submit(ctx, result,
jwt) returns nil), keeping the same clone behavior and mutex (use
m.mu.Lock()/Unlock() around the assignment), and leave Submit called with the
original result; this ensures lastResult represents only successfully submitted
results.
- Around line 51-68: NewManager should fail fast by validating required
dependencies instead of deferring to CollectOnce; update the constructor
(NewManager) to check that nodeIDProvider, jwtProvider, nonceProvider,
collector, submitter and interval are non-nil/valid and return an error (or
panic consistently with project conventions) when any required dependency is
missing, and update callers to handle the changed signature if you switch to
returning (Manager, error); this ensures misconfiguration is detected at
initialization rather than in CollectOnce.
- Around line 115-118: The shallow copy using "cloned := *result" leaves
SDKResponse.Evidences shared between the original and m.lastResult; change to a
deep copy: allocate a new SDKResponse (or Result) instance, copy primitive
fields, and create a new Evidences slice then copy or deep-clone each
EvidenceItem into it before assigning to m.lastResult (ensure you deep-copy any
reference fields inside EvidenceItem if present); update the assignment that
currently uses "cloned := *result" and "m.lastResult = &cloned" to store the
fully independent copy instead (alternatively document immutability of result if
you prefer not to copy).

In `@internal/backendclient/client.go`:
- Around line 213-239: The mapped errors in mapEnrollError replace the original
HTTPStatusError (statusErr) and lose diagnostic context; update each returned
error to wrap the original statusErr using the Go %w wrapping (e.g., return
fmt.Errorf("<message>: %w", statusErr)) so callers can still inspect the
underlying HTTPStatusError (status code, body) via errors.As; apply this change
to every case in mapEnrollError while leaving the default branch returning err
unchanged.

In `@internal/inventory/manager_run_test.go`:
- Around line 57-75: The test TestManagerRunStopsOnContextCancel uses time.Sleep
for synchronization which can be flaky; modify the fakeSink used by the test to
include a notification channel (e.g., add a field like exportedCh chan struct{}
to the fakeSink and close/send on it inside the sink's Export method), then in
TestManagerRunStopsOnContextCancel wait on that channel with a timeout instead
of sleeping (select on exportedCh and a time.After), then cancel the context and
assert NewManager(...).Run(ctx) returns context.Canceled and that sink.exported
is not empty; update fakeSink and the test to use exportedCh to
deterministically detect the first export.

In `@internal/inventory/sink/backend_test.go`:
- Around line 120-135: Update the test TestBackendSinkExportUsesState to verify
the baseURL passed into the clientFactory matches the sink state’s backend
endpoint: modify the clientFactory closure used in the test (the func(string)
(backendclient.Client, error) assigned to clientFactory) to capture its string
argument into a local variable (e.g., gotBaseURL) and after calling s.Export
with the inventory.Snapshot, assert that gotBaseURL equals the expected backend
endpoint from the sink state; keep the existing assertions for client.nodeID,
client.jwt and client.req.

In `@internal/inventory/source/source_test.go`:
- Around line 37-114: Add tests exercising Collect's error/guard paths: create
three new subtests that call NewMachineInfoSource with (1) a nil collector and
assert Collect returns an error, (2) a fakeMachineInfoCollector whose Collect
returns an error and assert that error is propagated, and (3) a
fakeMachineInfoCollector that returns (nil, nil) for MachineInfo and assert
Collect returns an error or empty snapshot per the current guard behavior;
reference NewMachineInfoSource, fakeMachineInfoCollector and its Collect method,
and the source.Collect call to locate where to add these assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e5a983be-90ba-4bca-9be9-31348de6a041

📥 Commits

Reviewing files that changed from the base of the PR and between c7fdf10 and c90aee0.

📒 Files selected for processing (28)
  • .github/workflows/ci.yml
  • .github/workflows/release.yml
  • .golangci.yml
  • cmd/fleetint/enroll.go
  • cmd/fleetint/enroll_test.go
  • cmd/fleetint/unenroll.go
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestationloop/backend.go
  • internal/attestationloop/manager.go
  • internal/attestationloop/types.go
  • internal/backendclient/client.go
  • internal/backendclient/client_test.go
  • internal/backendclient/errors.go
  • internal/backendclient/errors_test.go
  • internal/inventory/hash.go
  • internal/inventory/hash_test.go
  • internal/inventory/manager.go
  • internal/inventory/manager_run_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/mapper/backend.go
  • internal/inventory/mapper/backend_test.go
  • internal/inventory/sink/backend.go
  • internal/inventory/sink/backend_test.go
  • internal/inventory/source/source.go
  • internal/inventory/source/source_test.go
  • internal/inventory/types.go
✅ Files skipped from review due to trivial changes (7)
  • .golangci.yml
  • .github/workflows/release.yml
  • .github/workflows/ci.yml
  • internal/inventory/mapper/backend_test.go
  • internal/agentstate/state.go
  • internal/inventory/types.go
  • internal/backendclient/errors_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/attestationloop/types.go
  • internal/inventory/manager.go
  • internal/backendclient/errors.go

Comment thread internal/agentstate/sqlite.go
Comment thread internal/backendclient/client_test.go
@jingxiang-z jingxiang-z changed the title feat: add backend client and inventory skeletons feat: add backend client and inventory attestation call Apr 21, 2026
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@jingxiang-z jingxiang-z force-pushed the feat/agent-backend-client-skeleton branch from b014d75 to 5a8c3a1 Compare April 21, 2026 21:00
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@jingxiang-z jingxiang-z force-pushed the feat/agent-backend-client-skeleton branch from 5a8c3a1 to 8d55dd9 Compare April 21, 2026 21:01
@jingxiang-z jingxiang-z marked this pull request as ready for review April 21, 2026 21:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d55dd99e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backendclient/client.go
Comment thread internal/agentstate/sqlite.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (7)
internal/exporter/converter/csv_test.go (1)

180-183: Add an explicit assertion for Agent Version CSV output.

You set AgentVersion in fixtures, but the tests don’t currently verify the corresponding emitted row. Adding that assertion will protect this rename from regressions.

✅ Suggested assertion addition
 	assert.Greater(t, len(records), 5)
 	assert.Equal(t, []string{"attribute_name", "attribute_value"}, records[0])
+	assert.Contains(t, records, []string{"Agent Version", "0.1.5"})
 	assert.Contains(t, records, []string{"DCGM Version", "4.2.3"})

Also applies to: 282-282

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exporter/converter/csv_test.go` around lines 180 - 183, Add an
explicit assertion that the CSV output contains the "Agent Version" row
corresponding to the fixture's AgentVersion ("0.1.5"). Locate the test that
builds the fixtures (uses the AgentVersion field) and the variable holding
produced CSV output (the test's CSV string/rows), then add an assertion that a
row exists with the expected header/label (e.g., "Agent Version") and value
"0.1.5" — mirror existing assertions for DCGMVersion/OSImage/KernelVersion so
the test fails if that mapping is renamed or removed.
internal/attestation/nonce_test.go (1)

28-50: Assert the forwarded nodeID and JWT too.

This test currently passes even if backendNonceProvider.GetNonce swaps or drops the request arguments, because the stub ignores its inputs and only returns a canned response.

Proposed test hardening
 type testNonceClient struct {
-	resp *backendclient.NonceResponse
+	resp      *backendclient.NonceResponse
+	gotNodeID string
+	gotJWT    string
 }
 
-func (c *testNonceClient) GetNonce(context.Context, string, string) (*backendclient.NonceResponse, error) {
+func (c *testNonceClient) GetNonce(_ context.Context, nodeID, jwt string) (*backendclient.NonceResponse, error) {
+	c.gotNodeID = nodeID
+	c.gotJWT = jwt
 	return c.resp, nil
 }
 
 func TestBackendNonceProvider(t *testing.T) {
 	refreshTS := time.Now().UTC()
-	provider := NewBackendNonceProvider(&testNonceClient{
+	client := &testNonceClient{
 		resp: &backendclient.NonceResponse{
 			Nonce:                 "abc123",
 			NonceRefreshTimestamp: refreshTS,
 			JWTAssertion:          "new-jwt",
 		},
-	})
+	}
+	provider := NewBackendNonceProvider(client)
 
 	nonce, ts, jwt, err := provider.GetNonce(context.Background(), "node-1", "jwt-token")
 	require.NoError(t, err)
 	require.Equal(t, "abc123", nonce)
 	require.Equal(t, refreshTS, ts)
 	require.Equal(t, "new-jwt", jwt)
+	require.Equal(t, "node-1", client.gotNodeID)
+	require.Equal(t, "jwt-token", client.gotJWT)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestation/nonce_test.go` around lines 28 - 50, The test uses a
canned testNonceClient that ignores its inputs, so TestBackendNonceProvider
doesn't verify that NewBackendNonceProvider forwards nodeID and JWT; update
testNonceClient (and its GetNonce method) to capture the provided nodeID and jwt
arguments (or validate them inline) and have the test assert the captured values
after calling provider.GetNonce; reference testNonceClient.GetNonce,
TestBackendNonceProvider, and NewBackendNonceProvider to locate and change the
stub so it fails the test if nodeID or JWT are not the expected "node-1" and
"jwt-token".
internal/exporter/converter/otlp_test.go (1)

313-335: Widen this regression check beyond dcgmVersion.

The renamed test only proves one machine-info attribute disappeared. If another removed hardware field starts leaking back into the OTLP resource, this still passes. I'd tighten this with either a small allowlist for resource keys or explicit checks for the other machine-info attributes that were intentionally removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exporter/converter/otlp_test.go` around lines 313 - 335, Update the
TestOTLPConverter_Convert_IgnoresMachineInfoInResource test to assert that all
known MachineInfo fields are absent from the OTLP resource instead of only
checking "dcgmVersion"; locate the test function
TestOTLPConverter_Convert_IgnoresMachineInfoInResource and the loop over
rm.Resource.Attributes produced by NewOTLPConverter().Convert, build a set/list
of the MachineInfo keys that were intentionally removed (e.g., dcgmVersion and
the other MachineInfo field names used in machineinfo.MachineInfo) and assert
that no attribute.Key matches any of those names, or alternatively implement a
tight allowlist of acceptable resource keys and assert every attribute.Key is in
that allowlist; this ensures any leaking hardware fields from
machineinfo.MachineInfo are caught.
internal/enrollment/enrollment.go (1)

80-82: Redundant SecureStateFilePermissions call.

SecureStateFilePermissions is called twice: once on line 80 before metadata operations, and again on lines 96-98 after. The second call appears redundant since the file permissions were already secured.

♻️ Proposed fix to remove redundant call
 	if err := pkgmetadata.SetMetadata(ctx, dbRW, "backend_base_url", baseURL); err != nil {
 		return fmt.Errorf("failed to set backend base URL: %w", err)
 	}
-	if err := config.SecureStateFilePermissions(stateFile); err != nil {
-		return fmt.Errorf("failed to secure state database permissions: %w", err)
-	}
 	return nil
 }

Also applies to: 96-98

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/enrollment/enrollment.go` around lines 80 - 82, Remove the duplicate
SecureStateFilePermissions(stateFile) invocation: keep the initial call that
already secures the state file and delete the subsequent call and its associated
error handling block (the second SecureStateFilePermissions(stateFile) and its
fmt.Errorf return). Ensure no other logic depends on the second invocation and
preserve existing error handling from the first SecureStateFilePermissions call
so stateFile remains secured exactly once.
internal/inventory/manager_run_test.go (1)

57-75: Time-based synchronization may cause flaky tests in CI.

Using time.Sleep(25 * time.Millisecond) for synchronization before cancellation can be flaky under resource contention. Consider using a synchronization primitive or increasing the margin.

♻️ Proposed fix using a channel to signal readiness
 func TestManagerRunStopsOnContextCancel(t *testing.T) {
 	src := &fakeSource{
 		snapshots: []*Snapshot{{MachineID: "machine-1", Hostname: "host-a"}},
 	}
 	sink := &fakeSink{}
 	ctx, cancel := context.WithCancel(context.Background())
 
 	done := make(chan error, 1)
 	go func() {
 		done <- NewManager(src, sink, InventoryConfig{Interval: 10 * time.Millisecond}).Run(ctx)
 	}()
 
-	time.Sleep(25 * time.Millisecond)
+	// Allow sufficient iterations before canceling
+	time.Sleep(50 * time.Millisecond)
 	cancel()
 
 	err := <-done
 	require.ErrorIs(t, err, context.Canceled)
 	require.NotEmpty(t, sink.exported)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/manager_run_test.go` around lines 57 - 75, The test uses
time.Sleep to wait for the manager to perform work and then cancels, which is
flaky; modify the test to use a synchronization channel instead: add a ready
signal (e.g., a chan struct{}) to the fakeSource or fakeSink that
NewManager/Manager.Run will cause to be closed/sent when the first
snapshot/export occurs, then in TestManagerRunStopsOnContextCancel wait on that
ready channel (with a sensible timeout) before calling cancel; update the
fakeSource/fakeSink helpers to send on the ready channel when they handle the
snapshot so the test deterministically proceeds without time.Sleep.
internal/inventory/source/source.go (1)

140-150: Consider documenting the NIC interface selection criteria.

NetPrivateIP is set from the first private interface in the slice. The selection order depends on the underlying collector's ordering, which may not be deterministic across reboots.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/source/source.go` around lines 140 - 150, NetPrivateIP is
currently assigned from the first element of info.NICInfo.PrivateIPInterfaces
which relies on collector ordering; update the code and/or comments to make the
selection deterministic and explicit: implement a clear selection rule (e.g.,
prefer non-loopback, non-empty IPs, prefer interface named "eth0" or sort by
Interface/MAC and pick the first) in the block that populates
snap.Resources.NICInfo.PrivateIPInterfaces and then set snap.NetPrivateIP using
that deterministic choice, and add a brief comment above this logic describing
the selection criteria; reference the variables/functions NetPrivateIP,
info.NICInfo.PrivateIPInterfaces, and snap.Resources.NICInfo when making the
change.
internal/exporter/collector/collector.go (1)

87-87: Consider removing the unused parameter if this API is internal-only; otherwise, maintain backward compatibility.

The _ any parameter is unused and only has one call site (exporter.go:82) where nil is passed. If collector.New is an internal-only API not exposed to external packages, removing this parameter is safe and clean. However, if external consumers depend on this exported function, the removal would be a breaking change. The current _ any pattern is acceptable for maintaining API compatibility during a transition away from the attestation manager feature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exporter/collector/collector.go` at line 87, collector.New currently
declares an unused parameter `_ any`; if this API is internal-only, remove the
`_ any` parameter from the collector.New function signature and from its sole
call site (the call in exporter.go) and update any related docs/tests to reflect
the simplified signature; if the function is part of the public API, keep the
parameter to preserve compatibility but add a clear comment on the collector.New
declaration explaining the parameter is intentionally unused (for attestation
manager compatibility) so future maintainers know why it remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/fleetint/status.go`:
- Around line 173-176: The call to endpoint.ValidateBackendEndpoint(baseURL) in
readEnrollmentStatus currently returns an error that aborts the entire fleetint
status flow; change this so a malformed backend_base_url produces a warning (log
via the same logger used in readEnrollmentStatus or fmt.Fprintf(os.Stderr, ...))
instead of returning, then fall back to the legacy metadata path or treat the
enrollment as "not enrolled" so status still prints; specifically, replace the
current return on err after ValidateBackendEndpoint with a warning that includes
the raw baseURL and err, and ensure the rest of readEnrollmentStatus proceeds
using the legacy keys or a nil/empty validated value to determine enrollment.

In `@internal/attestation/backend.go`:
- Around line 102-112: The wrapper is not providing the node UUID so submissions
are unscoped; update the state-backed path to resolve and include the node
identifier before forwarding. Specifically, in NewStateBackendSubmitter /
stateSubmitter.Submit flow, read the node UUID from the provided
agentstate.State (via the state accessor on stateBackendClientFactory or
directly from the state object) and ensure the backend client you obtain
(stateBackendClientFactory.client) is created or scoped using that node UUID, or
attach the node UUID into the Result payload before calling
NewBackendSubmitter(client).Submit; modify either
stateBackendClientFactory.client to accept/derive the node ID or add a step in
stateSubmitter.Submit to fetch the node ID (e.g., state.NodeID() or equivalent)
and pass it into the backend client or result so the attestation request is
node-scoped.

In `@internal/attestation/collector.go`:
- Around line 63-66: The current logic calls cmd.Run() then only returns the
execution error if stdout is empty, which can mask process failures when partial
stdout exists; change the flow around cmd.Run()/stdout/stderr so that if
cmd.Run() returns a non-nil err you immediately return a descriptive execution
error (including stderr and the wrapped err) — i.e., treat err from cmd.Run() as
authoritative and do not attempt to parse stdout when err != nil; only attempt
JSON parsing of stdout when cmd.Run() succeeded. Refer to the cmd.Run(), stdout,
stderr variables and the existing "attestation CLI execution failed" error
message when implementing this change.

In `@internal/attestation/manager_test.go`:
- Around line 98-115: Update the test
TestCollectOnceCollectorFailureStillSubmitsFailureResult to assert that
manager.CollectOnce returns an error when the evidence collector fails while
still verifying the failure result was submitted by testSubmitter (check
submitter.submitted.result and its Success=false); similarly update the Run
retry test to assert that Collect/Submit were invoked more than once by using
the testSubmitter invocation count (or a counter on testEvidenceCollector) and
verify that Run honors RetryInterval by observing multiple submissions rather
than just a single failed collect then sleep; locate references to CollectOnce,
Run, RetryInterval, Collect, Submit, submitter/testSubmitter and update
assertions accordingly to expect an error return and multiple submits.

In `@internal/attestation/manager.go`:
- Around line 145-168: The collector error (err returned from
m.collector.Collect) is recorded in result but not propagated; change the flow
in Run so that after calling m.submitter.Submit(ctx, result, jwt) you return the
original collection error when present. Concretely, save the collector error
(err) from m.collector.Collect, still populate and submit the Result, then if
that saved error is non-nil return nil and that error (instead of nil),
otherwise return the successful result; refer to m.collector.Collect, result,
err, and m.submitter.Submit to locate the code to adjust.

In `@internal/backendclient/client.go`:
- Around line 63-70: NewWithHTTPClient currently allows baseURL == nil which
later causes a panic when code calls baseURL.JoinPath; change NewWithHTTPClient
to validate baseURL and return a typed error instead of constructing a client
with a nil URL. Update the function signature to return (Client, error), add a
package-level error (e.g., ErrNilBaseURL) or fmt.Errorf("nil baseURL"), check if
baseURL == nil and return nil, ErrNilBaseURL, and otherwise return &client{...},
nil; do the same defensive change for the other constructor that accepts baseURL
(the function around lines 156-160) and update all call sites to handle the new
(Client, error) return.

In `@internal/exporter/exporter.go`:
- Around line 349-367: In refreshJWTToken the code trusts backend_base_url and
treats a malformed legacy endpoint from readLegacyBackendBaseURL as fatal;
change the logic to validate URLs before using them (use net/url parsing) and to
tolerate bad entries: after calling pkgmetadata.ReadMetadata and getting
baseURL, parse and accept it only if valid; when iterating legacy endpoints
inside readLegacyBackendBaseURL (and the similar block around lines 396-409),
skip/log malformed entries instead of returning an error and continue to the
next candidate; only return an error if no valid URL candidates remain; ensure
newBackendClient and client.Enroll are only invoked with a validated, parsed
baseURL.
- Around line 257-283: The current logic leaves metricsEndpoint/logsEndpoint
empty when pkgmetadata.ReadMetadata returns a non-empty but invalid
backend_base_url; update the branch handling validateErr (and per-endpoint
joinErr) to fall back to legacy endpoints by calling
e.readValidatedEndpoint(ctx, "metrics_endpoint") and
e.readValidatedEndpoint(ctx, "logs_endpoint") so that
endpoint.ValidateBackendEndpoint and endpoint.JoinPath failures don't disable
HTTP export; refer to pkgmetadata.ReadMetadata,
endpoint.ValidateBackendEndpoint, endpoint.JoinPath, e.readValidatedEndpoint,
and the metricsEndpoint/logsEndpoint variables when making the change.

In `@internal/inventory/manager.go`:
- Around line 97-113: CollectOnce can trigger duplicate exports when called
concurrently because the shouldExport check occurs outside m.mu and
lastExportedHash is updated later; fix by serializing export operations: add an
export mutex (e.g., m.exportMu) and, when shouldExport is true, acquire
m.exportMu around the call to m.sink.Export(ctx, snap) and the update to
m.lastExportedHash (use m.mu only for snapshot/lastSnapshot access), so two
goroutines cannot export the same hash concurrently; alternatively, you can set
m.lastExportedHash under m.mu before releasing it and then call Export, but if
you choose that approach ensure you revert the hash on non-ErrNotReady
failures—prefer the export mutex to keep semantics simple.

---

Nitpick comments:
In `@internal/attestation/nonce_test.go`:
- Around line 28-50: The test uses a canned testNonceClient that ignores its
inputs, so TestBackendNonceProvider doesn't verify that NewBackendNonceProvider
forwards nodeID and JWT; update testNonceClient (and its GetNonce method) to
capture the provided nodeID and jwt arguments (or validate them inline) and have
the test assert the captured values after calling provider.GetNonce; reference
testNonceClient.GetNonce, TestBackendNonceProvider, and NewBackendNonceProvider
to locate and change the stub so it fails the test if nodeID or JWT are not the
expected "node-1" and "jwt-token".

In `@internal/enrollment/enrollment.go`:
- Around line 80-82: Remove the duplicate SecureStateFilePermissions(stateFile)
invocation: keep the initial call that already secures the state file and delete
the subsequent call and its associated error handling block (the second
SecureStateFilePermissions(stateFile) and its fmt.Errorf return). Ensure no
other logic depends on the second invocation and preserve existing error
handling from the first SecureStateFilePermissions call so stateFile remains
secured exactly once.

In `@internal/exporter/collector/collector.go`:
- Line 87: collector.New currently declares an unused parameter `_ any`; if this
API is internal-only, remove the `_ any` parameter from the collector.New
function signature and from its sole call site (the call in exporter.go) and
update any related docs/tests to reflect the simplified signature; if the
function is part of the public API, keep the parameter to preserve compatibility
but add a clear comment on the collector.New declaration explaining the
parameter is intentionally unused (for attestation manager compatibility) so
future maintainers know why it remains.

In `@internal/exporter/converter/csv_test.go`:
- Around line 180-183: Add an explicit assertion that the CSV output contains
the "Agent Version" row corresponding to the fixture's AgentVersion ("0.1.5").
Locate the test that builds the fixtures (uses the AgentVersion field) and the
variable holding produced CSV output (the test's CSV string/rows), then add an
assertion that a row exists with the expected header/label (e.g., "Agent
Version") and value "0.1.5" — mirror existing assertions for
DCGMVersion/OSImage/KernelVersion so the test fails if that mapping is renamed
or removed.

In `@internal/exporter/converter/otlp_test.go`:
- Around line 313-335: Update the
TestOTLPConverter_Convert_IgnoresMachineInfoInResource test to assert that all
known MachineInfo fields are absent from the OTLP resource instead of only
checking "dcgmVersion"; locate the test function
TestOTLPConverter_Convert_IgnoresMachineInfoInResource and the loop over
rm.Resource.Attributes produced by NewOTLPConverter().Convert, build a set/list
of the MachineInfo keys that were intentionally removed (e.g., dcgmVersion and
the other MachineInfo field names used in machineinfo.MachineInfo) and assert
that no attribute.Key matches any of those names, or alternatively implement a
tight allowlist of acceptable resource keys and assert every attribute.Key is in
that allowlist; this ensures any leaking hardware fields from
machineinfo.MachineInfo are caught.

In `@internal/inventory/manager_run_test.go`:
- Around line 57-75: The test uses time.Sleep to wait for the manager to perform
work and then cancels, which is flaky; modify the test to use a synchronization
channel instead: add a ready signal (e.g., a chan struct{}) to the fakeSource or
fakeSink that NewManager/Manager.Run will cause to be closed/sent when the first
snapshot/export occurs, then in TestManagerRunStopsOnContextCancel wait on that
ready channel (with a sensible timeout) before calling cancel; update the
fakeSource/fakeSink helpers to send on the ready channel when they handle the
snapshot so the test deterministically proceeds without time.Sleep.

In `@internal/inventory/source/source.go`:
- Around line 140-150: NetPrivateIP is currently assigned from the first element
of info.NICInfo.PrivateIPInterfaces which relies on collector ordering; update
the code and/or comments to make the selection deterministic and explicit:
implement a clear selection rule (e.g., prefer non-loopback, non-empty IPs,
prefer interface named "eth0" or sort by Interface/MAC and pick the first) in
the block that populates snap.Resources.NICInfo.PrivateIPInterfaces and then set
snap.NetPrivateIP using that deterministic choice, and add a brief comment above
this logic describing the selection criteria; reference the variables/functions
NetPrivateIP, info.NICInfo.PrivateIPInterfaces, and snap.Resources.NICInfo when
making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9a779fe1-8ff8-4fe1-845c-d5e6cffa8521

📥 Commits

Reviewing files that changed from the base of the PR and between c90aee0 and 8d55dd9.

📒 Files selected for processing (66)
  • .github/workflows/ci.yml
  • .github/workflows/release.yml
  • .golangci.yml
  • SECURITY.md
  • cmd/fleetint/enroll.go
  • cmd/fleetint/enroll_test.go
  • cmd/fleetint/run.go
  • cmd/fleetint/status.go
  • cmd/fleetint/unenroll.go
  • cmd/fleetint/unenroll_test.go
  • deployments/helm/fleet-intelligence-agent/README.md
  • deployments/helm/fleet-intelligence-agent/values.yaml
  • deployments/packages/systemd/fleetint.env
  • docs/configuration.md
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestation/attestation.go
  • internal/attestation/attestation_test.go
  • internal/attestation/backend.go
  • internal/attestation/backend_test.go
  • internal/attestation/collector.go
  • internal/attestation/collector_test.go
  • internal/attestation/manager.go
  • internal/attestation/manager_test.go
  • internal/attestation/nonce.go
  • internal/attestation/nonce_test.go
  • internal/attestation/types.go
  • internal/backendclient/client.go
  • internal/backendclient/client_test.go
  • internal/backendclient/errors.go
  • internal/backendclient/errors_test.go
  • internal/backendclient/types.go
  • internal/config/config.go
  • internal/config/config_test.go
  • internal/config/default.go
  • internal/endpoint/endpoint.go
  • internal/endpoint/endpoint_test.go
  • internal/enrollment/enrollment.go
  • internal/enrollment/enrollment_test.go
  • internal/exporter/collector/collector.go
  • internal/exporter/collector/collector_test.go
  • internal/exporter/converter/csv.go
  • internal/exporter/converter/csv_test.go
  • internal/exporter/converter/otlp.go
  • internal/exporter/converter/otlp_test.go
  • internal/exporter/exporter.go
  • internal/exporter/exporter_test.go
  • internal/exporter/options.go
  • internal/exporter/options_test.go
  • internal/inventory/hash.go
  • internal/inventory/hash_test.go
  • internal/inventory/manager.go
  • internal/inventory/manager_run_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/mapper/backend.go
  • internal/inventory/mapper/backend_test.go
  • internal/inventory/sink/backend.go
  • internal/inventory/sink/backend_test.go
  • internal/inventory/source/source.go
  • internal/inventory/source/source_test.go
  • internal/inventory/types.go
  • internal/machineinfo/machineinfo.go
  • internal/machineinfo/machineinfo_test.go
  • internal/server/server.go
  • internal/server/server_test.go
💤 Files with no reviewable changes (4)
  • internal/exporter/options_test.go
  • internal/exporter/options.go
  • internal/attestation/attestation_test.go
  • internal/attestation/attestation.go
✅ Files skipped from review due to trivial changes (14)
  • .github/workflows/release.yml
  • .golangci.yml
  • SECURITY.md
  • deployments/helm/fleet-intelligence-agent/README.md
  • internal/inventory/hash.go
  • internal/inventory/hash_test.go
  • internal/backendclient/errors.go
  • internal/backendclient/errors_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/mapper/backend.go
  • internal/backendclient/client_test.go
  • internal/agentstate/state.go
  • internal/inventory/mapper/backend_test.go
  • internal/backendclient/types.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • .github/workflows/ci.yml
  • cmd/fleetint/enroll.go
  • cmd/fleetint/unenroll.go
  • internal/agentstate/sqlite_test.go
  • cmd/fleetint/enroll_test.go
  • internal/inventory/types.go

Comment thread cmd/fleetint/status.go Outdated
Comment thread internal/attestation/backend.go
Comment thread internal/attestation/collector.go
Comment thread internal/attestation/manager_test.go
Comment thread internal/attestation/manager.go
Comment thread internal/backendclient/client.go
Comment thread internal/exporter/exporter.go
Comment thread internal/exporter/exporter.go
Comment thread internal/inventory/manager.go
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/backendclient/client.go (1)

66-78: Avoid mutating the caller’s *http.Client.

NewWithHTTPClient rewrites CheckRedirect on the supplied client, so reusing that same client elsewhere will silently change its redirect behavior globally. A shallow copy keeps the policy local to this backend client.

💡 Suggested fix
 func NewWithHTTPClient(baseURL *url.URL, httpClient *http.Client) Client {
 	if httpClient == nil {
 		httpClient = &http.Client{Timeout: 30 * time.Second}
 	}
-	if httpClient.CheckRedirect == nil {
-		httpClient.CheckRedirect = func(*http.Request, []*http.Request) error {
+	clientCopy := *httpClient
+	if clientCopy.CheckRedirect == nil {
+		clientCopy.CheckRedirect = func(*http.Request, []*http.Request) error {
 			return errRedirectNotAllowed
 		}
 	}
 	return &client{
-		httpClient: httpClient,
+		httpClient: &clientCopy,
 		baseURL:    baseURL,
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backendclient/client.go` around lines 66 - 78, NewWithHTTPClient
currently mutates the caller-provided *http.Client by overwriting CheckRedirect;
instead, if httpClient != nil make a shallow copy (copy := *httpClient;
httpClient = &copy) and then set CheckRedirect on that copy so only the backend
client's instance is modified; keep the existing behavior when httpClient is nil
(create a new &http.Client{Timeout:...}) and return a client struct that uses
the copied httpClient, referencing NewWithHTTPClient, the httpClient parameter,
CheckRedirect, and errRedirectNotAllowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/enrollment/enrollment.go`:
- Around line 45-60: The code currently uses the full validated URL (which may
include legacy paths) when creating the backend client and persisting config;
update Enroll so after calling endpoint.ValidateBackendEndpoint you derive a
normalized backend base URL containing only scheme and host (and port if
present) — e.g., build a URL from baseURL.Scheme + "://" + baseURL.Host — then
pass that normalized base to newBackendClient(...) and to
storeConfigInMetadata(...) (and use that same normalized base when constructing
the client), keeping the rest of the logic (client.Enroll, jwtToken handling)
unchanged; reference functions/entities: Enroll, ValidateBackendEndpoint,
newBackendClient, storeConfigInMetadata, and the validated baseURL variable.

In `@internal/inventory/manager.go`:
- Around line 56-79: The Run method currently calls m.CollectOnce() and then
sleeps, which causes an immediate second collection and skips the intended
interval; modify manager.Run so the sleep happens before each subsequent
CollectOnce: after the initial collection (and initial jitter when
m.config.JitterEnabled), enter the loop that computes nextInterval (using
m.config.Interval, m.config.RetryInterval and calculateJitter/retryJitterCap
based on the previous error), call sleepWithContext(ctx, nextInterval) first,
then call m.CollectOnce(ctx) and update the error for the next iteration; keep
use of sleepWithContext, calculateJitter, initialJitterCap, retryJitterCap, and
the retry interval logic intact but moved to execute prior to the collection.
- Around line 72-73: The retry backoff always applies
calculateJitter(retryJitterCap(...)) even when JitterEnabled is false, making
m.config.JitterEnabled ineffective; update the conditional around the retry
sleep calculation (the block that assigns nextInterval using
m.config.RetryInterval and calculateJitter/retryJitterCap) to only add
calculateJitter(...) when m.config.JitterEnabled is true, otherwise set
nextInterval deterministically to m.config.RetryInterval (or the capped value)
so the "without jitter" path remains deterministic.

---

Nitpick comments:
In `@internal/backendclient/client.go`:
- Around line 66-78: NewWithHTTPClient currently mutates the caller-provided
*http.Client by overwriting CheckRedirect; instead, if httpClient != nil make a
shallow copy (copy := *httpClient; httpClient = &copy) and then set
CheckRedirect on that copy so only the backend client's instance is modified;
keep the existing behavior when httpClient is nil (create a new
&http.Client{Timeout:...}) and return a client struct that uses the copied
httpClient, referencing NewWithHTTPClient, the httpClient parameter,
CheckRedirect, and errRedirectNotAllowed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f30bca91-cf03-4568-aefa-f7eec7bbb5e9

📥 Commits

Reviewing files that changed from the base of the PR and between 44938f0 and 47c3759.

📒 Files selected for processing (18)
  • cmd/fleetint/status.go
  • cmd/fleetint/unenroll.go
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestation/backend.go
  • internal/attestation/backend_test.go
  • internal/attestation/manager.go
  • internal/attestation/manager_test.go
  • internal/attestation/nonce_test.go
  • internal/backendclient/client.go
  • internal/backendclient/client_test.go
  • internal/enrollment/enrollment.go
  • internal/inventory/manager.go
  • internal/inventory/manager_run_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/mapper/backend_test.go
  • internal/inventory/source/source_test.go
✅ Files skipped from review due to trivial changes (4)
  • internal/inventory/mapper/backend_test.go
  • internal/agentstate/state.go
  • internal/backendclient/client_test.go
  • internal/attestation/manager_test.go
🚧 Files skipped from review as they are similar to previous changes (8)
  • cmd/fleetint/unenroll.go
  • internal/attestation/nonce_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/source/source_test.go
  • internal/agentstate/sqlite.go
  • internal/attestation/backend_test.go
  • cmd/fleetint/status.go
  • internal/attestation/manager.go

Comment thread internal/enrollment/enrollment.go
Comment thread internal/inventory/manager.go
Comment thread internal/inventory/manager.go Outdated
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
NodeResources on the backend read path uses string for these fields
(matching the existing OTel/Kafka write path). Sending integers caused
json.Unmarshal to fail silently, leaving the entire resources block empty
in the node detail API response.

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
InventoryHash is agent-internal state used for change detection only.
The backend never stores or acts on it, so there is no reason to send it.

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
The agent never collected a public IP, so this field was always empty.
Remove it from the snapshot type and the backend wire contract.

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signals to the backend receiver that inventory is written directly via
the agent API, so legacy node extraction from OTLP resource attributes
can be suppressed.

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Log loop startup intervals, nvattest execution results, inventory
export outcomes (sent, unchanged, not enrolled), to make the agent's
periodic workflows observable without debug-level logging.

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@jingxiang-z jingxiang-z self-assigned this Apr 22, 2026
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Copy link
Copy Markdown
Collaborator

@ambermingxin ambermingxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a big MR and a lot of refactoring, I recommend a throughout agent test in all installations we support, including different attestation scenarios.

Comment thread internal/enrollment/enrollment.go
Comment thread internal/enrollment/enrollment.go
Comment thread internal/agentstate/sqlite.go
Comment thread internal/exporter/collector/collector.go
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
internal/agentstate/sqlite.go (1)

138-143: ⚠️ Potential issue | 🟡 Minor

Don’t silently skip permission hardening when stateFileFn() fails here.

openReadWrite() already succeeded, so swallowing a second stateFileFn() failure hides an unexpected path-resolution problem and skips the security check on the write path.

Suggested fix
 	stateFile, err := s.stateFileFn()
-	if err == nil {
-		if err := config.SecureStateFilePermissions(stateFile); err != nil {
-			return fmt.Errorf("secure state file permissions: %w", err)
-		}
+	if err != nil {
+		return fmt.Errorf("get state file path for permissions: %w", err)
+	}
+	if err := config.SecureStateFilePermissions(stateFile); err != nil {
+		return fmt.Errorf("secure state file permissions: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/agentstate/sqlite.go` around lines 138 - 143, The code currently
swallows errors from s.stateFileFn() and skips calling
config.SecureStateFilePermissions; change the logic in the block using
stateFileFn() (the call to stateFileFn and the subsequent
SecureStateFilePermissions) to treat a non-nil err from stateFileFn() as a real
error and return it (wrap with context) instead of continuing, so that after
openReadWrite() succeeds you always resolve the state file and enforce
SecureStateFilePermissions; reference s.stateFileFn() and
config.SecureStateFilePermissions when making this change.
internal/enrollment/enrollment.go (1)

62-71: ⚠️ Potential issue | 🟠 Major

Post-enroll inventory sync is still a silent no-op without a node UUID.

This flow only persists the SAK/JWT/backend URL, but internal/inventory/sink/backend.go also requires nodeUUID before it can export. On a fresh enroll, the sink returns inventory.ErrNotReady, and internal/inventory/manager.go converts that into success, so this block never actually syncs inventory and never logs why it skipped.

Either persist nodeUUID before calling syncInventoryAfterEnroll, or explicitly skip/log this path until nodeUUID is available.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/enrollment/enrollment.go` around lines 62 - 71, The post-enroll sync
call is a no-op when node UUID is missing because storeConfigInMetadata only
persists SAK/JWT/URL but not nodeUUID, so syncInventoryAfterEnroll returns
ErrNotReady and is treated as success; fix by ensuring nodeUUID is persisted
before invoking runWithContext(syncCtx, func() error { return
syncInventoryAfterEnroll(syncCtx) }) — either extend storeConfigInMetadata to
accept and save nodeUUID (and call it with the node UUID obtained during
enrollment), or add an explicit pre-check before calling
syncInventoryAfterEnroll that reads the node UUID from metadata and if absent
logs a clear warning (use log.Logger.Warnw) and skips the sync; keep references
to storeConfigInMetadata, syncInventoryAfterEnroll, runWithContext and
inventory.ErrNotReady in your changes.
🧹 Nitpick comments (7)
internal/backendclient/client.go (1)

201-227: Consider preserving the original error for debugging.

mapEnrollError discards the original HTTPStatusError (including the response body) when mapping to user-friendly messages. For debugging/logging purposes, consider wrapping the original error.

💡 Example using error wrapping
 	switch statusErr.StatusCode {
 	case http.StatusBadRequest:
-		return fmt.Errorf("the token used in the enrollment is not in the correct format")
+		return fmt.Errorf("the token used in the enrollment is not in the correct format: %w", statusErr)
 	case http.StatusUnauthorized:
-		return fmt.Errorf("the token used in the enrollment is incorrect")
+		return fmt.Errorf("the token used in the enrollment is incorrect: %w", statusErr)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/backendclient/client.go` around lines 201 - 227, mapEnrollError
currently returns new fmt.Errorf messages that discard the original
HTTPStatusError and its context; update each case in mapEnrollError to wrap the
original error (statusErr) using Go error wrapping (e.g.,
fmt.Errorf("user-friendly msg: %w", statusErr)) so the original HTTPStatusError
and response body remain available for logging/inspection while still returning
a friendly message; ensure the default case still returns the original err.
internal/exporter/collector/collector.go (1)

50-59: HealthData.MachineInfo is referenced downstream but never populated.

The field is declared (line 54) but not populated in Collect(). Downstream code in internal/exporter/converter/csv.go (lines 87, 274) does reference it and checks for nil before use. However, since Collect() never assigns a value to this field, it will always be nil at runtime, making the downstream check ineffective. Either remove the field if machine info collection is no longer needed, or populate it in Collect() if it should be included.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/exporter/collector/collector.go` around lines 50 - 59,
HealthData.MachineInfo is declared but never set in Collect(), so either remove
the field or populate it; to fix, update the Collect() function to call the
machineinfo package to build a *machineinfo.MachineInfo and assign it to
HealthData.MachineInfo (handle and log any error and leave nil on failure),
ensure the machineinfo package is imported, and keep the downstream nil checks
in internal/exporter/converter/csv.go working; alternatively, if machine info is
no longer needed, remove the MachineInfo field from the HealthData struct and
delete related uses.
internal/attestation/nonce_test.go (1)

40-58: Consider adding test coverage for error scenarios.

The happy path is well-tested, but the test suite would benefit from cases covering:

  • Client returning an error
  • Client returning a nil response (which should trigger the "nonce response is nil" error in the provider)
💡 Example error case test
func TestBackendNonceProvider_ClientError(t *testing.T) {
	client := &testNonceClient{
		resp: nil, // or add an err field to testNonceClient
	}
	provider := NewBackendNonceProvider(client)

	_, _, _, err := provider.GetNonce(context.Background(), "node-1", "jwt-token")
	require.Error(t, err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/attestation/nonce_test.go` around lines 40 - 58, Add tests to cover
error scenarios for BackendNonceProvider: extend or modify testNonceClient to
simulate an RPC error (e.g., an err field) and add a
TestBackendNonceProvider_ClientError that asserts GetNonce returns an error when
the client returns an error; add another test
TestBackendNonceProvider_NilResponse where testNonceClient returns (nil, nil) or
resp==nil and assert GetNonce returns the "nonce response is nil" error. Use
NewBackendNonceProvider to create the provider and call provider.GetNonce(ctx,
"node-1", "jwt-token") in both tests; assert errors via require.Error and verify
no panic.
cmd/fleetint/enroll_test.go (1)

130-136: Deadline assertion may be slightly fragile under CI load.

The assertion require.Greater(t, time.Until(deadline), 55*time.Second) assumes the test reaches the mock within 5 seconds of context creation. Under heavy CI load, this could occasionally fail.

Consider using a slightly more generous lower bound (e.g., 50 seconds) or restructuring to check the deadline duration relative to a captured start time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/fleetint/enroll_test.go` around lines 130 - 136, The test's deadline
assertion in performEnrollWorkflow is fragile under CI because it requires
time.Until(deadline) > 55s; relax this to a safer lower bound (e.g., 50s) or
compute remaining time relative to a captured start time to avoid flaky
failures—update the performEnrollWorkflow closure to assert require.Greater(t,
time.Until(deadline), 50*time.Second) (or change to compare deadline.Sub(start)
against defaultEnrollTimeout) while keeping the other deadline checks and
defaultEnrollTimeout reference intact.
internal/inventory/manager_test.go (1)

116-149: Good concurrency test, but consider adding a timing element.

The test verifies concurrent CollectOnce calls result in a single export, which validates the exportMu mutex. However, without any artificial delay in the sink, all goroutines may serialize naturally. Consider adding a small sleep in fakeSink.Export to increase the likelihood of actual concurrent contention.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/inventory/manager_test.go` around lines 116 - 149, The test
TestManagerCollectOnceConcurrentExportSingleHash should simulate real contention
by introducing a small artificial delay in the sink; modify the fakeSink.Export
implementation to sleep briefly (e.g., time.Sleep(5-20 * time.Millisecond))
before recording the export to ensure concurrent goroutines hit the exportMu,
then run the same concurrent calls to CollectOnce to verify only one export
occurs; update any test imports to include time if needed.
internal/server/server.go (2)

466-467: Consider making retry intervals configurable.

The hardcoded retry intervals (attestationRetryInterval = 5 * time.Minute, inventoryRetryInterval = 1 * time.Minute) work for typical scenarios, but making them configurable would allow operators to tune retry behavior for their environments (e.g., faster retries in development, slower in production to reduce load during outages).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/server.go` around lines 466 - 467, Replace the hardcoded
constants attestationRetryInterval and inventoryRetryInterval with configurable
time.Duration values: expose them as configurable fields (e.g., in the server
config struct) and/or read from flags or environment variables (use default
values of 5m and 1m) parsing via time.ParseDuration; update any references to
these constants in functions that perform attestation retries and inventory
retries to use the new configurable fields so operators can tune retry intervals
without changing code.

451-451: Consider sharing a single agentstate.NewSQLite() instance between loops.

Both startInventoryLoop (line 451) and startAttestationLoop (line 479) create separate agentstate.NewSQLite() instances, which both reference the same file (config.DefaultStateFile). While SQLite handles concurrent file access, using a single shared instance would reduce redundant connections and make the shared state relationship explicit.

Also applies to: 479-479

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/server.go` at line 451, Both loops create independent
agentstate.NewSQLite() instances pointing at the same file; instead initialize
one shared state := agentstate.NewSQLite() and pass that same instance into
inventorysink.NewBackendSink(...) and into whatever the attestation loop
constructs (the call in startInventoryLoop and startAttestationLoop). Update
startInventoryLoop and startAttestationLoop to accept or capture the shared
agentstate instance rather than calling agentstate.NewSQLite() twice so both
loops use the same SQLite backend and avoid redundant connections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/fleetint/status.go`:
- Around line 166-208: The code treats pkgmetadata.ReadMetadata errors only as
sql.ErrNoRows; update readEnrollmentStatus and readLegacyEndpoint to also
consider the SQLite "no such table" error as a non-fatal "missing metadata" case
(i.e., treat it like sql.ErrNoRows and continue with empty values).
Specifically, in readEnrollmentStatus (the ReadMetadata call for
agentstate.MetadataKeyBackendBaseURL) and in readLegacyEndpoint (the generic
ReadMetadata call), detect and swallow the SQLite absent-table error (e.g., by
checking errors.Is(err, sql.ErrNoRows) || strings.Contains(err.Error(), "no such
table") or by calling the existing helper from internal/agentstate/sqlite.go if
available), and only return an error for other failures.

In `@internal/agentstate/sqlite.go`:
- Around line 49-74: The code currently returns a corrupt
MetadataKeyBackendBaseURL value as-is and stops the legacy-key scan on the first
malformed legacy value; change the logic so that when reading
MetadataKeyBackendBaseURL you validate it via endpoint.DeriveBackendBaseURL
(call DeriveBackendBaseURL on the retrieved value and only return it if that
succeeds), and in the legacy keys loop treat a DeriveBackendBaseURL error as a
malformed value that is skipped (continue scanning the remaining keys) rather
than aborting with an error; use the existing pkgmetadata.ReadMetadata,
MetadataKeyBackendBaseURL, and endpoint.DeriveBackendBaseURL symbols to locate
and implement these checks.

In `@internal/attestation/backend.go`:
- Around line 111-123: When result.NodeUUID is empty, avoid dereferencing a nil
s.factory.state: first check if s.factory.state == nil and if so return
fmt.Errorf("%w: node UUID not available in agent state", ErrNotEnrolled)
(matching the existing error style); only call s.factory.state.GetNodeUUID(ctx)
if the state is non-nil, then proceed with the existing ok/err checks and
cloning logic. This preserves the ErrNotEnrolled-like behavior instead of
panicking and keeps the rest of the flow (cloning result and calling
s.factory.client(ctx)) unchanged.

In `@internal/attestation/manager.go`:
- Around line 131-153: The goroutine currently defers cancel(), which can race
with the select and cause runCtx.Done() to be triggered after the goroutine
sends a successful result; remove the worker-side defer cancel() inside the
goroutine so only the caller-owned cancel returned by context.WithTimeout is
responsible for cancelling; keep the defer m.runMu.Unlock() in the goroutine and
preserve the outer cancel usage in the select branch that handles runCtx.Done()
so timeouts still work correctly; locate the goroutine that calls m.CollectOnce,
the runCtx/cancel pair created by context.WithTimeout, and the done channel to
apply this change.

In `@internal/inventory/manager_run_test.go`:
- Around line 171-195: The test TestManagerRunWaitsIntervalBeforeSecondCollect
currently only asserts "not before 20ms" and never verifies a second collection
occurs; update the second wait to expect a second event after
InventoryConfig{Interval: 50 * time.Millisecond} elapses by waiting for at least
Interval plus a small margin (e.g., 50ms + 20ms) on countingSource.collectCh and
failing if no event arrives, then cancel the context and assert Run returned
context.Canceled; locate the logic around NewManager(...,
InventoryConfig{Interval: ...}).Run(ctx) and the second select that reads from
src.collectCh and adjust its timeout/expectation accordingly.

In `@internal/inventory/manager.go`:
- Around line 87-109: The goroutine's deferred cancel() is racing with the
select because after sending the result to done it also cancels runCtx, making
runCtx.Done() ready and causing the select to possibly return a timeout error;
remove the worker-side cancel() in the goroutine (i.e., drop the deferred
cancel() inside the anonymous goroutine) and keep the cancel() only in the outer
scope (or call cancel() only on the timeout/exit path), leaving the channel send
and deferred m.runMu.Unlock() and the CollectOnce call unchanged so that runCtx
is only cancelled by the caller's deferred cancel rather than by the worker.

---

Duplicate comments:
In `@internal/agentstate/sqlite.go`:
- Around line 138-143: The code currently swallows errors from s.stateFileFn()
and skips calling config.SecureStateFilePermissions; change the logic in the
block using stateFileFn() (the call to stateFileFn and the subsequent
SecureStateFilePermissions) to treat a non-nil err from stateFileFn() as a real
error and return it (wrap with context) instead of continuing, so that after
openReadWrite() succeeds you always resolve the state file and enforce
SecureStateFilePermissions; reference s.stateFileFn() and
config.SecureStateFilePermissions when making this change.

In `@internal/enrollment/enrollment.go`:
- Around line 62-71: The post-enroll sync call is a no-op when node UUID is
missing because storeConfigInMetadata only persists SAK/JWT/URL but not
nodeUUID, so syncInventoryAfterEnroll returns ErrNotReady and is treated as
success; fix by ensuring nodeUUID is persisted before invoking
runWithContext(syncCtx, func() error { return syncInventoryAfterEnroll(syncCtx)
}) — either extend storeConfigInMetadata to accept and save nodeUUID (and call
it with the node UUID obtained during enrollment), or add an explicit pre-check
before calling syncInventoryAfterEnroll that reads the node UUID from metadata
and if absent logs a clear warning (use log.Logger.Warnw) and skips the sync;
keep references to storeConfigInMetadata, syncInventoryAfterEnroll,
runWithContext and inventory.ErrNotReady in your changes.

---

Nitpick comments:
In `@cmd/fleetint/enroll_test.go`:
- Around line 130-136: The test's deadline assertion in performEnrollWorkflow is
fragile under CI because it requires time.Until(deadline) > 55s; relax this to a
safer lower bound (e.g., 50s) or compute remaining time relative to a captured
start time to avoid flaky failures—update the performEnrollWorkflow closure to
assert require.Greater(t, time.Until(deadline), 50*time.Second) (or change to
compare deadline.Sub(start) against defaultEnrollTimeout) while keeping the
other deadline checks and defaultEnrollTimeout reference intact.

In `@internal/attestation/nonce_test.go`:
- Around line 40-58: Add tests to cover error scenarios for
BackendNonceProvider: extend or modify testNonceClient to simulate an RPC error
(e.g., an err field) and add a TestBackendNonceProvider_ClientError that asserts
GetNonce returns an error when the client returns an error; add another test
TestBackendNonceProvider_NilResponse where testNonceClient returns (nil, nil) or
resp==nil and assert GetNonce returns the "nonce response is nil" error. Use
NewBackendNonceProvider to create the provider and call provider.GetNonce(ctx,
"node-1", "jwt-token") in both tests; assert errors via require.Error and verify
no panic.

In `@internal/backendclient/client.go`:
- Around line 201-227: mapEnrollError currently returns new fmt.Errorf messages
that discard the original HTTPStatusError and its context; update each case in
mapEnrollError to wrap the original error (statusErr) using Go error wrapping
(e.g., fmt.Errorf("user-friendly msg: %w", statusErr)) so the original
HTTPStatusError and response body remain available for logging/inspection while
still returning a friendly message; ensure the default case still returns the
original err.

In `@internal/exporter/collector/collector.go`:
- Around line 50-59: HealthData.MachineInfo is declared but never set in
Collect(), so either remove the field or populate it; to fix, update the
Collect() function to call the machineinfo package to build a
*machineinfo.MachineInfo and assign it to HealthData.MachineInfo (handle and log
any error and leave nil on failure), ensure the machineinfo package is imported,
and keep the downstream nil checks in internal/exporter/converter/csv.go
working; alternatively, if machine info is no longer needed, remove the
MachineInfo field from the HealthData struct and delete related uses.

In `@internal/inventory/manager_test.go`:
- Around line 116-149: The test TestManagerCollectOnceConcurrentExportSingleHash
should simulate real contention by introducing a small artificial delay in the
sink; modify the fakeSink.Export implementation to sleep briefly (e.g.,
time.Sleep(5-20 * time.Millisecond)) before recording the export to ensure
concurrent goroutines hit the exportMu, then run the same concurrent calls to
CollectOnce to verify only one export occurs; update any test imports to include
time if needed.

In `@internal/server/server.go`:
- Around line 466-467: Replace the hardcoded constants attestationRetryInterval
and inventoryRetryInterval with configurable time.Duration values: expose them
as configurable fields (e.g., in the server config struct) and/or read from
flags or environment variables (use default values of 5m and 1m) parsing via
time.ParseDuration; update any references to these constants in functions that
perform attestation retries and inventory retries to use the new configurable
fields so operators can tune retry intervals without changing code.
- Line 451: Both loops create independent agentstate.NewSQLite() instances
pointing at the same file; instead initialize one shared state :=
agentstate.NewSQLite() and pass that same instance into
inventorysink.NewBackendSink(...) and into whatever the attestation loop
constructs (the call in startInventoryLoop and startAttestationLoop). Update
startInventoryLoop and startAttestationLoop to accept or capture the shared
agentstate instance rather than calling agentstate.NewSQLite() twice so both
loops use the same SQLite backend and avoid redundant connections.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 36681a09-b63c-45c2-b01d-9cb184c83709

📥 Commits

Reviewing files that changed from the base of the PR and between 44938f0 and fec6618.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (43)
  • cmd/fleetint/enroll.go
  • cmd/fleetint/enroll_test.go
  • cmd/fleetint/status.go
  • cmd/fleetint/unenroll.go
  • go.mod
  • internal/agentstate/sqlite.go
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/attestation/backend.go
  • internal/attestation/backend_test.go
  • internal/attestation/collector.go
  • internal/attestation/collector_test.go
  • internal/attestation/manager.go
  • internal/attestation/manager_test.go
  • internal/attestation/nonce.go
  • internal/attestation/nonce_test.go
  • internal/attestation/types.go
  • internal/backendclient/client.go
  • internal/backendclient/client_test.go
  • internal/backendclient/types.go
  • internal/config/config.go
  • internal/config/config_test.go
  • internal/config/default.go
  • internal/enrollment/enrollment.go
  • internal/enrollment/enrollment_test.go
  • internal/exporter/collector/collector.go
  • internal/exporter/collector/collector_test.go
  • internal/exporter/exporter.go
  • internal/exporter/exporter_test.go
  • internal/exporter/options.go
  • internal/exporter/options_test.go
  • internal/exporter/writer/http.go
  • internal/inventory/manager.go
  • internal/inventory/manager_run_test.go
  • internal/inventory/manager_test.go
  • internal/inventory/mapper/backend.go
  • internal/inventory/mapper/backend_test.go
  • internal/inventory/sink/backend.go
  • internal/inventory/sink/backend_test.go
  • internal/inventory/source/source_test.go
  • internal/inventory/types.go
  • internal/server/server.go
  • internal/server/server_test.go
💤 Files with no reviewable changes (2)
  • internal/exporter/options_test.go
  • internal/exporter/options.go
✅ Files skipped from review due to trivial changes (4)
  • internal/exporter/writer/http.go
  • internal/server/server_test.go
  • internal/backendclient/types.go
  • internal/attestation/collector_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • internal/inventory/mapper/backend.go
  • internal/inventory/sink/backend_test.go
  • internal/agentstate/state.go
  • internal/attestation/collector.go
  • internal/inventory/types.go
  • internal/exporter/exporter.go

Comment thread cmd/fleetint/status.go
Comment thread internal/agentstate/sqlite.go
Comment thread internal/attestation/backend.go
Comment thread internal/attestation/manager.go
Comment thread internal/inventory/manager_run_test.go
Comment thread internal/inventory/manager.go
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Copy link
Copy Markdown
Contributor

@mukilsh mukilsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM given we've tested

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@jingxiang-z jingxiang-z merged commit f1e5740 into main Apr 28, 2026
9 checks passed
@jingxiang-z jingxiang-z deleted the feat/agent-backend-client-skeleton branch April 28, 2026 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants