Skip to content

Fix zone 1.3#212

Closed
jingxiang-z wants to merge 4 commits into
mainfrom
fix-zone-1.3
Closed

Fix zone 1.3#212
jingxiang-z wants to merge 4 commits into
mainfrom
fix-zone-1.3

Conversation

@jingxiang-z
Copy link
Copy Markdown
Collaborator

@jingxiang-z jingxiang-z commented May 28, 2026

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Summary by CodeRabbit

Release Notes

  • Documentation

    • Updated system requirements: RAM increased to <500MB.
    • Expanded GPU support documentation to include newer NVIDIA architectures (Ampere, Ada Lovelace, Hopper, Blackwell, Rubin).
    • Clarified that documentation links are relative to the currently viewed branch/tag.
  • New Features

    • Added validation for enrollment metadata fields with character restrictions and length limits.
  • Removed Features

    • Removed NVIDIA GPU per-process monitoring component.

Review Change Stack

jingxiang-z and others added 4 commits May 26, 2026 14:20
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Amber Xue <ambermingxin@nvidia.com>
Signed-off-by: Amber Xue <ambermingxin@nvidia.com>
Signed-off-by: Amber Xue <ambermingxin@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Co-authored-by: Jingxiang Zhang <jingzhang@nvidia.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ad258737-5477-4281-99a2-4d4c9f47a4cd

📥 Commits

Reviewing files that changed from the base of the PR and between fd3575b and 3b01032.

📒 Files selected for processing (21)
  • README.md
  • cmd/fleetint/enroll.go
  • cmd/fleetint/enroll_test.go
  • deployments/helm/fleet-intelligence-agent/values.yaml
  • docs/configuration.md
  • docs/usage.md
  • internal/agentstate/sqlite_test.go
  • internal/agentstate/state.go
  • internal/enrollment/enrollment.go
  • internal/exporter/exporter.go
  • internal/inventory/sink/backend.go
  • internal/registry/registry.go
  • internal/registry/registry_test.go
  • third_party/fleet-intelligence-sdk/components/accelerator/nvidia/processes/component.go
  • third_party/fleet-intelligence-sdk/components/accelerator/nvidia/processes/component_test.go
  • third_party/fleet-intelligence-sdk/components/accelerator/nvidia/processes/metrics.go
  • third_party/fleet-intelligence-sdk/components/accelerator/nvidia/processes/process_env.go
  • third_party/fleet-intelligence-sdk/components/accelerator/nvidia/processes/processes.go
  • third_party/fleet-intelligence-sdk/components/accelerator/nvidia/processes/processes_test.go
  • third_party/fleet-intelligence-sdk/pkg/metadata/metadata.go
  • third_party/fleet-intelligence-sdk/pkg/metadata/metadata_test.go

📝 Walkthrough

Walkthrough

This PR enforces stricter validation for enrollment metadata flags, normalizes metadata storage key naming, removes the NVIDIA GPU processes component, and updates documentation to reflect expanded GPU support and adjusted resource thresholds.

Changes

Fleet Intelligence Agent Multi-Feature Update

Layer / File(s) Summary
Enrollment metadata validation and parsing
cmd/fleetint/enroll.go, cmd/fleetint/enroll_test.go
Adds reserved sentinel "Unassigned" and regex-based format validation for --node-group and --compute-zone flags via new validatedOptionalMetadataFlagValue helper. Enforces max 255 chars, alphanumeric start, allowed special chars, and rejects reserved names. Comprehensive unit and integration tests verify explicit empty strings, reserved name rejection, invalid characters, length limits, and independent field updates.
Metadata key constant and usage updates
internal/agentstate/state.go, internal/enrollment/enrollment.go, internal/exporter/exporter.go, internal/inventory/sink/backend.go, internal/agentstate/sqlite_test.go
MetadataKeyNodeGroup constant updated from "nodegroup" to "node_group". All related log messages, error strings, and test assertions updated to use underscore notation. SQLite state round-trip test adds verification that persisted metadata uses updated key.
Metadata storage optimization with SQL upsert
third_party/fleet-intelligence-sdk/pkg/metadata/metadata.go, metadata_test.go
SetMetadata rewritten to use single SQL INSERT ... ON CONFLICT(key) DO UPDATE upsert instead of read-before-write. Metric timing preserved around database execution. New test verifies empty-to-non-empty-to-non-empty transitions preserve final value.
Remove accelerator-nvidia-processes component
internal/registry/registry.go, internal/registry/registry_test.go, docs/configuration.md
Registry stops importing and registering componentsprocesses component (removed from All() list). Registry test updated to assert removed component absent. Component list documentation updated to remove references.
Documentation and configuration updates
README.md, docs/usage.md, deployments/helm/fleet-intelligence-agent/values.yaml
README expanded GPU support list (Ampere, Ada Lovelace added) and updated RAM threshold from <100MB to <500MB. Usage docs reordered GPU architectures and updated RAM threshold. Helm values updated enrollment metadata field documentation with guidance on clearing via empty strings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/fleet-intelligence-agent#206: Both PRs modify cmd/fleetint/enroll.go and cmd/fleetint/enroll_test.go to add/extend strict validation for --node-group/--compute-zone including the reserved "Unassigned" name, max length, and regex-based normalization.
  • NVIDIA/fleet-intelligence-agent#207: Both PRs update README.md/docs to change the RAM threshold to <500MB, expand/reorder supported GPU architectures (including Ampere and Ada Lovelace), and adjust documentation linking guidance.
  • NVIDIA/fleet-intelligence-agent#209: Both PRs directly remove/disable the accelerator-nvidia-processes component from the registry and delete its third_party/fleet-intelligence-sdk/components/accelerator/nvidia/processes/* implementation with corresponding docs/tests updates.

Suggested reviewers

  • ebalduf

Poem

🐰 Hops with glee at validation strict,
Metadata keys now underscore-prefixed,
Processes gone, docs shining bright,
GPU gardens now properly typed!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-zone-1.3

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3b010324cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

MetadataKeySAKToken = "sak_token"
MetadataKeyEnrolledAt = "enrolled_at"
MetadataKeyNodeGroup = "nodegroup"
MetadataKeyNodeGroup = "node_group"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve existing nodegroup metadata on upgrade

When upgrading an already enrolled node that has nodegroup stored by the previous release, this constant now makes every reader/writer use only node_group and there is no fallback or migration for the old key. As a result, inventory exports and telemetry resource attributes silently drop the node group until the node is re-enrolled with --node-group; consider reading/migrating the legacy nodegroup value before treating the field as absent.

Useful? React with 👍 / 👎.

@jingxiang-z jingxiang-z deleted the fix-zone-1.3 branch May 28, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants