Skip to content

chore: remove unused gpu-count flag and improve configuration docs#146

Merged
jingxiang-z merged 8 commits intomainfrom
docs/configuration-run-reference
Mar 31, 2026
Merged

chore: remove unused gpu-count flag and improve configuration docs#146
jingxiang-z merged 8 commits intomainfrom
docs/configuration-run-reference

Conversation

@jingxiang-z
Copy link
Copy Markdown
Collaborator

@jingxiang-z jingxiang-z commented Mar 30, 2026

Summary

  • document the configurable environment variables used by fleetint run
  • document the supported fleetint run flags and their defaults
  • explain how to configure these settings on bare metal and in Kubernetes

Testing

  • not run (docs only)

Summary by CodeRabbit

  • Documentation

    • Restructured configuration guide to focus on the fleetint run command with platform-aware bare-metal (systemd) and Kubernetes (Helm) examples; consolidated startup env/flags, added DCGM variables, duration syntax, offline/testing flags, and clarified component-selection semantics; updated verification steps.
  • Chores

    • Added DCGM environment variables and a configurable retentionPeriod (with runtime flag) for the agent.
    • Removed the gpu-count flag from the scan command and related default GPU-count handling in run.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refocused configuration docs on fleetint run with platform-specific (bare‑metal systemd and Kubernetes Helm) guidance; added DCGM env vars to systemd env; introduced Helm retentionPeriod and wired it into the DaemonSet; removed gpu-count from the scan command and deleted GPU-count defaulting logic from run.

Changes

Cohort / File(s) Summary
Configuration docs
docs/configuration.md
Rewrote to center on fleetint run: added bare‑metal (/etc/default/fleetint + systemd ExecStart) and Kubernetes (Helm env:/values) sections, consolidated env‑var table (including DCGM entries and platform mapping), expanded flags table (--log-file, --retention-period, --gpu-count, --offline-mode, --enable-fault-injection), documented hidden test IB flags, clarified --components semantics, and updated verification commands.
Systemd env additions
deployments/packages/systemd/fleetint.env
Added DCGM environment variables: DCGM_URL="localhost" and DCGM_URL_IS_UNIX_SOCKET="false".
Helm chart — values, template, docs
deployments/helm/fleet-intelligence-agent/values.yaml, deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml, deployments/helm/fleet-intelligence-agent/README.md
Added retentionPeriod: 24h to values.yaml, wired --retention-period={{ .Values.retentionPeriod }} into the DaemonSet container args, and documented retentionPeriod in the chart README.
CLI changes
cmd/fleetint/root.go, cmd/fleetint/run.go
Removed the gpu-count CLI flag from the scan command (root.go) and removed reading/logic that set default expected GPU counts in run.go (deleted SetDefaultExpectedGPUCounts(...) and related log). No replacement logic added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through docs and daemon dreams,
Tucked DCGM keys in systemd seams,
Helm keeps retention snug and small,
GPU-counts leapt out the hall,
Fleetint runs tidy — I nibble a beam.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title refers to removing a gpu-count flag and improving docs, both of which are present in the changeset, but does not capture the main documentation improvements for fleetint run configuration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/configuration-run-reference

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/configuration.md`:
- Around line 201-205: Replace the unsafe full-file dump command "sudo cat
/etc/default/fleetint" in the verification docs with a redacted/key-only check
to avoid exposing secrets; update the snippet so it uses a targeted grep (for
example "sudo grep -E '^(FLEETINT_|HTTP_PROXY|HTTPS_PROXY)'
/etc/default/fleetint") or another pattern that lists configured keys without
printing full raw values, and keep the subsequent commands ("sudo systemctl
status fleetintd" and "sudo journalctl -u fleetintd -f") unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f5520e42-cf47-46be-9ed0-f2df14b22a6f

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa2d5d and 67fac69.

📒 Files selected for processing (1)
  • docs/configuration.md

Comment thread docs/configuration.md
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67fac697d8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/configuration.md Outdated
Comment thread docs/configuration.md
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/configuration.md (1)

208-208: ⚠️ Potential issue | 🟠 Major

Avoid dumping the full runtime env file in verification commands.

sudo cat /etc/default/fleetint can leak sensitive values (for example proxy credentials). Please switch to a key-only or redacted check in this snippet.

Suggested doc fix
 ```bash
-sudo cat /etc/default/fleetint
+sudo grep -E '^(FLEETINT_|HTTP_PROXY|HTTPS_PROXY|DCGM_)' /etc/default/fleetint
 sudo systemctl status fleetintd
 sudo journalctl -u fleetintd -f
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/configuration.md at line 208, Replace the risky "sudo cat
/etc/default/fleetint" check in the docs with a key-only or redacted
verification command: instead of dumping the whole file, run a grep that only
prints lines matching the FLEETINT_, HTTP_PROXY, HTTPS_PROXY or DCGM_ prefixes
(or show keys with values redacted) so secrets are not exposed; update the
snippet that currently contains "sudo cat /etc/default/fleetint" to use this
filtered/redacted approach and leave the subsequent "systemctl status" and
"journalctl -u" lines unchanged.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @docs/configuration.md:

  • Line 64: Update the heading text "Bare Metal Example" to use the hyphenated
    compound modifier: change the heading string "Bare Metal Example" to "Bare-Metal
    Example" in the docs where the heading appears (look for the heading "Bare Metal
    Example").

Duplicate comments:
In @docs/configuration.md:

  • Line 208: Replace the risky "sudo cat /etc/default/fleetint" check in the docs
    with a key-only or redacted verification command: instead of dumping the whole
    file, run a grep that only prints lines matching the FLEETINT_, HTTP_PROXY,
    HTTPS_PROXY or DCGM_ prefixes (or show keys with values redacted) so secrets are
    not exposed; update the snippet that currently contains "sudo cat
    /etc/default/fleetint" to use this filtered/redacted approach and leave the
    subsequent "systemctl status" and "journalctl -u" lines unchanged.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `3bd3b853-a3af-44f6-9b11-12a3059fcd6f`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 67fac697d8ba93a5ae6769d511d9c406e5128ff8 and 9cee72988c14c3d9ddd33086b6a929684bc3bf3a.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `deployments/packages/systemd/fleetint.env`
* `docs/configuration.md`

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (1)</summary>

* deployments/packages/systemd/fleetint.env

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread docs/configuration.md
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/configuration.md (2)

64-64: ⚠️ Potential issue | 🟡 Minor

Use hyphenated form "Bare-Metal Example" for grammatical correctness.

The compound adjective "Bare-Metal" should be hyphenated when modifying "Example".

📝 Proposed fix
-### Bare Metal Example
+### Bare-Metal Example
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/configuration.md` at line 64, Update the heading text "Bare Metal
Example" to the hyphenated form "Bare-Metal Example" in the docs file (look for
the heading string "Bare Metal Example" in configuration.md) so the compound
adjective is grammatically correct.

207-211: ⚠️ Potential issue | 🟠 Major

Avoid printing full runtime env files to prevent exposing sensitive values.

Line 208's sudo cat /etc/default/fleetint command may expose sensitive values such as proxy credentials during troubleshooting. Prefer documenting a redacted check that shows configured keys without printing full raw values.

🔒 Proposed fix
 ```bash
-sudo cat /etc/default/fleetint
+# Review configured keys without printing full raw values
+sudo grep -E '^(FLEETINT_|DCGM_|HTTP_PROXY|HTTPS_PROXY)' /etc/default/fleetint
 sudo systemctl status fleetintd
 sudo journalctl -u fleetintd -f
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/configuration.md around lines 207 - 211, Replace the risky literal
"sudo cat /etc/default/fleetint" line with a redacted-check that only shows
configured keys (match prefixes like FLEETINT_, DCGM_, HTTP_PROXY, HTTPS_PROXY)
and does not print raw secret values; implement this by filtering the file for
those keys and emitting either only the key names or the key names with values
replaced by a fixed placeholder (e.g., "=REDACTED") so troubleshooting still
shows which variables are set without exposing secrets in the docs.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (2)</summary><blockquote>

<details>
<summary>docs/configuration.md (2)</summary><blockquote>

`115-115`: **Consider simplifying verbose default descriptions for readability.**

The default description "unset by CLI; packaged bare-metal default is `warn` via `FLEETINT_FLAGS`" is quite detailed and makes the table harder to scan quickly. Consider simplifying to just the CLI default and moving deployment-specific details to notes.



<details>
<summary>♻️ Suggested simplification</summary>

```diff
-| `--log-level` | Log level: `debug`, `info`, `warn`, `error`. | unset by CLI; packaged bare-metal default is `warn` via `FLEETINT_FLAGS` | `FLEETINT_FLAGS="--log-level=..."` | `logLevel` |
+| `--log-level` | Log level: `debug`, `info`, `warn`, `error`. | none | `FLEETINT_FLAGS="--log-level=..."` | `logLevel` |

Then add a note after the table:

> **Note:** The packaged bare-metal systemd service sets `--log-level=warn` by default via `FLEETINT_FLAGS` in `/etc/default/fleetint`.

Similarly for line 119:

-| `--components` | Comma-separated component selection. Use `all`, `*`, explicit names, and `-name` exclusions. | empty flag value, which means enable all components by default | `FLEETINT_FLAGS="--components=..."` | `components` |
+| `--components` | Comma-separated component selection. Use `all`, `*`, explicit names, and `-name` exclusions. | empty (enables all) | `FLEETINT_FLAGS="--components=..."` | `components` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/configuration.md` at line 115, Simplify the table cell for the
`--log-level` / `logLevel` entry to show only the CLI default (e.g., "unset by
CLI") and remove the deployment-specific phrase; then add a short note after the
table containing the deployment detail (for example: "Note: The packaged
bare-metal systemd service sets `--log-level=warn` by default via
`FLEETINT_FLAGS` in `/etc/default/fleetint`.") so the deployment-specific
default is preserved but moved out of the main table for readability.

42-42: Consider simplifying the Default column for better table readability.

The platform-specific defaults for DCGM_URL make this row's Default column notably longer than others. For consistency, consider moving the platform distinctions to the Description column or adding a note below the table.

♻️ Suggested refactor

Option 1: Move platform details to Description

-| `DCGM_URL` | DCGM HostEngine address used by the agent for DCGM-backed components. | bare metal: `localhost`, Helm chart: `nvidia-dcgm.gpu-operator.svc:5555` | `/etc/default/fleetint` | `env.DCGM_URL` |
+| `DCGM_URL` | DCGM HostEngine address used by the agent for DCGM-backed components. Bare metal default: `localhost`. Helm chart default: `nvidia-dcgm.gpu-operator.svc:5555`. | platform-dependent | `/etc/default/fleetint` | `env.DCGM_URL` |

Option 2: Add a note below the table

-| `DCGM_URL` | DCGM HostEngine address used by the agent for DCGM-backed components. | bare metal: `localhost`, Helm chart: `nvidia-dcgm.gpu-operator.svc:5555` | `/etc/default/fleetint` | `env.DCGM_URL` |
+| `DCGM_URL` | DCGM HostEngine address used by the agent for DCGM-backed components.* | see note* | `/etc/default/fleetint` | `env.DCGM_URL` |

Then add after line 57:

\* `DCGM_URL` defaults to `localhost` on bare metal and `nvidia-dcgm.gpu-operator.svc:5555` in the Helm chart.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/configuration.md` at line 42, The Default column for DCGM_URL is too
long; shorten it by moving the platform-specific defaults into the Description
column or by keeping the Default as a single canonical value and adding a short
note under the table. Update the table row for DCGM_URL so the Default cell is
concise (e.g., `localhost` or `platform-specific`) and append a footnote or
sentence below the table explaining that DCGM_URL defaults to localhost on bare
metal and nvidia-dcgm.gpu-operator.svc:5555 in the Helm chart; ensure you update
any inline reference `DCGM_URL` text in the Description to reflect the moved
platform details.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/configuration.md`:
- Around line 121-125: The bare-metal example strings for flags
(`--offline-mode`, `--path`, `--duration`, `--format`,
`--enable-fault-injection`) have a space before the trailing ellipsis; remove
that space and place the ellipsis immediately after the closing quote (e.g.,
change FLEETINT_FLAGS="--offline-mode ..." to
FLEETINT_FLAGS="--offline-mode"...), updating each table cell in the rows for
those flags so the ellipsis directly follows the closing quote for consistency.

---

Duplicate comments:
In `@docs/configuration.md`:
- Line 64: Update the heading text "Bare Metal Example" to the hyphenated form
"Bare-Metal Example" in the docs file (look for the heading string "Bare Metal
Example" in configuration.md) so the compound adjective is grammatically
correct.
- Around line 207-211: Replace the risky literal "sudo cat
/etc/default/fleetint" line with a redacted-check that only shows configured
keys (match prefixes like FLEETINT_, DCGM_, HTTP_PROXY, HTTPS_PROXY) and does
not print raw secret values; implement this by filtering the file for those keys
and emitting either only the key names or the key names with values replaced by
a fixed placeholder (e.g., "=REDACTED") so troubleshooting still shows which
variables are set without exposing secrets in the docs.

---

Nitpick comments:
In `@docs/configuration.md`:
- Line 115: Simplify the table cell for the `--log-level` / `logLevel` entry to
show only the CLI default (e.g., "unset by CLI") and remove the
deployment-specific phrase; then add a short note after the table containing the
deployment detail (for example: "Note: The packaged bare-metal systemd service
sets `--log-level=warn` by default via `FLEETINT_FLAGS` in
`/etc/default/fleetint`.") so the deployment-specific default is preserved but
moved out of the main table for readability.
- Line 42: The Default column for DCGM_URL is too long; shorten it by moving the
platform-specific defaults into the Description column or by keeping the Default
as a single canonical value and adding a short note under the table. Update the
table row for DCGM_URL so the Default cell is concise (e.g., `localhost` or
`platform-specific`) and append a footnote or sentence below the table
explaining that DCGM_URL defaults to localhost on bare metal and
nvidia-dcgm.gpu-operator.svc:5555 in the Helm chart; ensure you update any
inline reference `DCGM_URL` text in the Description to reflect the moved
platform details.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dc42143d-b0c2-43d1-bead-887d80ae99a8

📥 Commits

Reviewing files that changed from the base of the PR and between 9cee729 and 00d1335.

📒 Files selected for processing (1)
  • docs/configuration.md

Comment thread docs/configuration.md
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
docs/configuration.md (3)

64-64: ⚠️ Potential issue | 🟡 Minor

Use hyphenated compound modifier in the heading.

Line 64 should be ### Bare-Metal Example for consistency and grammar.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/configuration.md` at line 64, Update the markdown heading "### Bare
Metal Example" to use a hyphenated compound modifier by changing it to "###
Bare-Metal Example"; locate the heading text "### Bare Metal Example" in the
docs/configuration.md content and replace it with "### Bare-Metal Example" to
ensure consistent grammar and style.

200-203: ⚠️ Potential issue | 🟠 Major

Avoid full env-file dumps in troubleshooting steps.

Line 201 still uses sudo cat /etc/default/fleetint, which can leak sensitive values (for example proxy credentials). Prefer a key-focused/redacted check instead.

Suggested doc patch
 ```bash
-sudo cat /etc/default/fleetint
+# Review configured keys without printing full raw values
+sudo grep -E '^(FLEETINT_|HTTP_PROXY|HTTPS_PROXY)' /etc/default/fleetint
 sudo systemctl status fleetintd
 sudo journalctl -u fleetintd -f
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/configuration.md around lines 200 - 203, Replace the full env-file dump
in the troubleshooting snippet: in docs/configuration.md change the command
"sudo cat /etc/default/fleetint" to a key-focused/redacted check such as "sudo
grep -E '^(FLEETINT_|HTTP_PROXY|HTTPS_PROXY)' /etc/default/fleetint" so the doc
shows only relevant keys (avoid printing full sensitive values) and keep the
surrounding code-block formatting intact.


</details>

---

`122-125`: _⚠️ Potential issue_ | _🟡 Minor_

**Normalize ellipsis spacing in bare-metal flag examples.**

Lines 122-125 still include a space before `...` inside the quoted `FLEETINT_FLAGS` examples; use a consistent format without that extra space.

<details>
<summary>Suggested doc patch</summary>

```diff
-| `--offline-mode` | Disable the HTTP API server and write telemetry to files instead. | `false` | `FLEETINT_FLAGS="--offline-mode ..."` | not exposed by chart by default |
-| `--path` | Output directory for offline mode. Required with `--offline-mode`. | empty | `FLEETINT_FLAGS="--path=/path ..."` | not exposed by chart by default |
-| `--duration` | Offline-mode collection duration in `HH:MM:SS` format. Required with `--offline-mode`. | empty | `FLEETINT_FLAGS="--duration=00:05:00 ..."` | not exposed by chart by default |
-| `--format` | Offline-mode output format: `json` or `csv`. | `json` | `FLEETINT_FLAGS="--format=csv ..."` | not exposed by chart by default |
+| `--offline-mode` | Disable the HTTP API server and write telemetry to files instead. | `false` | `FLEETINT_FLAGS="--offline-mode..."` | not exposed by chart by default |
+| `--path` | Output directory for offline mode. Required with `--offline-mode`. | empty | `FLEETINT_FLAGS="--path=/path..."` | not exposed by chart by default |
+| `--duration` | Offline-mode collection duration in `HH:MM:SS` format. Required with `--offline-mode`. | empty | `FLEETINT_FLAGS="--duration=00:05:00..."` | not exposed by chart by default |
+| `--format` | Offline-mode output format: `json` or `csv`. | `json` | `FLEETINT_FLAGS="--format=csv..."` | not exposed by chart by default |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/configuration.md` around lines 122 - 125, Doc examples for the
bare-metal FLEETINT_FLAGS contain inconsistent spacing before the ellipsis;
update the quoted examples for the flags (--offline-mode, --path, --duration,
--format) so the ellipsis is attached directly (e.g.
"FLEETINT_FLAGS=\"--offline-mode...\"", "FLEETINT_FLAGS=\"--path=/path...\"",
etc.) by removing the space before ... in each table cell to normalize
formatting across the four rows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/configuration.md`:
- Line 64: Update the markdown heading "### Bare Metal Example" to use a
hyphenated compound modifier by changing it to "### Bare-Metal Example"; locate
the heading text "### Bare Metal Example" in the docs/configuration.md content
and replace it with "### Bare-Metal Example" to ensure consistent grammar and
style.
- Around line 200-203: Replace the full env-file dump in the troubleshooting
snippet: in docs/configuration.md change the command "sudo cat
/etc/default/fleetint" to a key-focused/redacted check such as "sudo grep -E
'^(FLEETINT_|HTTP_PROXY|HTTPS_PROXY)' /etc/default/fleetint" so the doc shows
only relevant keys (avoid printing full sensitive values) and keep the
surrounding code-block formatting intact.
- Around line 122-125: Doc examples for the bare-metal FLEETINT_FLAGS contain
inconsistent spacing before the ellipsis; update the quoted examples for the
flags (--offline-mode, --path, --duration, --format) so the ellipsis is attached
directly (e.g. "FLEETINT_FLAGS=\"--offline-mode...\"",
"FLEETINT_FLAGS=\"--path=/path...\"", etc.) by removing the space before ... in
each table cell to normalize formatting across the four rows.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c5d279d9-d225-4541-bdc1-debe1cde52f9

📥 Commits

Reviewing files that changed from the base of the PR and between 00d1335 and b8e3a8b.

📒 Files selected for processing (4)
  • deployments/helm/fleet-intelligence-agent/README.md
  • deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml
  • deployments/helm/fleet-intelligence-agent/values.yaml
  • docs/configuration.md
✅ Files skipped from review due to trivial changes (3)
  • deployments/helm/fleet-intelligence-agent/README.md
  • deployments/helm/fleet-intelligence-agent/templates/daemonset.yaml
  • deployments/helm/fleet-intelligence-agent/values.yaml

@jingxiang-z jingxiang-z requested a review from ebalduf March 30, 2026 20:40
@jingxiang-z jingxiang-z self-assigned this Mar 30, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
docs/configuration.md (3)

64-64: ⚠️ Potential issue | 🟡 Minor

Use hyphenated heading: “Bare-Metal Example”.

This repeats prior feedback and remains unresolved on Line 64.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/configuration.md` at line 64, Heading text "Bare Metal Example" should
be changed to the hyphenated form "Bare-Metal Example"; locate the heading in
docs/configuration.md (the section currently titled "Bare Metal Example") and
update it to "Bare-Metal Example" so it matches the requested style and resolves
the duplicate feedback.

190-193: ⚠️ Potential issue | 🟠 Major

Avoid printing full /etc/default/fleetint in verification steps.

Line 191 (sudo cat /etc/default/fleetint) can expose sensitive values (proxy creds/tokens) in terminal history or shared logs. Use a key-only/redacted check instead. This repeats prior unresolved feedback.

Suggested doc fix
 ```bash
-sudo cat /etc/default/fleetint
+sudo grep -E '^(FLEETINT_|HTTP_PROXY|HTTPS_PROXY|DCGM_URL|DCGM_URL_IS_UNIX_SOCKET)' /etc/default/fleetint
 sudo systemctl status fleetintd
 sudo journalctl -u fleetintd -f
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/configuration.md around lines 190 - 193, Replace the unsafe full-file
print command that uses "sudo cat /etc/default/fleetint" with a
redaction/key-only check that exposes only variable names; update the doc steps
where the command appears to instead run a grep-like check for keys such as
FLEETINT_, HTTP_PROXY, HTTPS_PROXY, DCGM_URL, DCGM_URL_IS_UNIX_SOCKET (i.e.,
remove the "cat" invocation and use a pattern match to list only matching keys),
leaving the remaining verification commands (sudo systemctl status fleetintd and
sudo journalctl -u fleetintd -f) unchanged.


</details>

---

`121-124`: _⚠️ Potential issue_ | _🟡 Minor_

**Normalize ellipsis spacing in bare-metal flag examples.**

The examples still include a space before `...` (e.g., `--offline-mode ...`). Keep one consistent style without the extra space. This is the same unresolved formatting issue previously raised.

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/configuration.md around lines 121 - 124, The flag example lines for
--offline-mode, --path, --duration, and --format contain a space before
the ellipsis in the FLEETINT_FLAGS examples (e.g., --offline-mode ...); update
each example to remove the extra space so ellipses are --offline-mode...,
--path=/path..., --duration=00:05:00..., --format=csv... for consistent
formatting and apply the same change across all occurrences in this table.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @docs/configuration.md:

  • Around line 114-126: Docs table is missing the --gpu-count flag which is still
    registered by the CLI in cmd/fleetint/root.go; update the flags table in
    docs/configuration.md to add a row for --gpu-count (describe it, default, and
    how to set it in bare-metal and k8s) or explicitly mark it as
    undocumented/deprecated if you intend to remove it, ensuring the table entry
    references the same behavior implemented by the flag registration in
    cmd/fleetint/root.go.
  • Around line 38-43: Update the wording in the configuration docs for DCGM_URL
    and DCGM_URL_IS_UNIX_SOCKET to avoid implying they are parsed directly by
    cmd/fleetint/run.go; change the phrase "read by fleetint run at startup" to
    something like "used by components started by fleetint run" (or similar).
    Mention the specific env vars DCGM_URL and DCGM_URL_IS_UNIX_SOCKET and note
    that their read path lives in the nvidia-query/dcgm instance code (instance.go)
    so the docs reflect ownership by the DCGM-backed components rather than direct
    parsing in the run command.

Duplicate comments:
In @docs/configuration.md:

  • Line 64: Heading text "Bare Metal Example" should be changed to the hyphenated
    form "Bare-Metal Example"; locate the heading in docs/configuration.md (the
    section currently titled "Bare Metal Example") and update it to "Bare-Metal
    Example" so it matches the requested style and resolves the duplicate feedback.
  • Around line 190-193: Replace the unsafe full-file print command that uses
    "sudo cat /etc/default/fleetint" with a redaction/key-only check that exposes
    only variable names; update the doc steps where the command appears to instead
    run a grep-like check for keys such as FLEETINT_, HTTP_PROXY, HTTPS_PROXY,
    DCGM_URL, DCGM_URL_IS_UNIX_SOCKET (i.e., remove the "cat" invocation and use a
    pattern match to list only matching keys), leaving the remaining verification
    commands (sudo systemctl status fleetintd and sudo journalctl -u fleetintd -f)
    unchanged.
  • Around line 121-124: The flag example lines for --offline-mode, --path,
    --duration, and --format contain a space before the ellipsis in the
    FLEETINT_FLAGS examples (e.g., --offline-mode ...); update each example to
    remove the extra space so ellipses are --offline-mode..., --path=/path...,
    --duration=00:05:00..., --format=csv... for consistent formatting and apply
    the same change across all occurrences in this table.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `b28c6cac-9031-49d2-9f6f-fa3d1f8cfd28`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between b8e3a8b78d9752164acd7ced5cd88feeaca3e0d3 and 051320b7d64a93c67f73da6c177196aa04db2259.

</details>

<details>
<summary>📒 Files selected for processing (3)</summary>

* `cmd/fleetint/root.go`
* `cmd/fleetint/run.go`
* `docs/configuration.md`

</details>

<details>
<summary>💤 Files with no reviewable changes (2)</summary>

* cmd/fleetint/run.go
* cmd/fleetint/root.go

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread docs/configuration.md
Comment thread docs/configuration.md
Copy link
Copy Markdown
Collaborator

@ebalduf ebalduf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@jingxiang-z jingxiang-z changed the title docs: document fleetint run configuration chore: remove unused gpu-count flag and improve configuration docs Mar 30, 2026
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
@jingxiang-z jingxiang-z force-pushed the docs/configuration-run-reference branch from c276b1a to 3446477 Compare March 31, 2026 16:53
@jingxiang-z jingxiang-z merged commit 4b2faa4 into main Mar 31, 2026
9 checks passed
@jingxiang-z jingxiang-z deleted the docs/configuration-run-reference branch March 31, 2026 17:03
jingxiang-z added a commit that referenced this pull request Apr 14, 2026
)

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
jingxiang-z added a commit that referenced this pull request Apr 14, 2026
)

Signed-off-by: Jingxiang Zhang <jingzhang@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants