Skip to content

docs: update spark-install guide with real-world DGX Spark findings#304

Open
cluster2600 wants to merge 4 commits intoNVIDIA:mainfrom
cluster2600:docs/spark-install-cleanup
Open

docs: update spark-install guide with real-world DGX Spark findings#304
cluster2600 wants to merge 4 commits intoNVIDIA:mainfrom
cluster2600:docs/spark-install-cleanup

Conversation

@cluster2600
Copy link

@cluster2600 cluster2600 commented Mar 18, 2026

Summary

  • Remove WIP notice — the guide is complete and tested on a real DGX Spark
  • Add one-command install via nvidia.com/nemoclaw.sh
  • Clarify hardware details (aarch64, Grace CPU + GB10 GPU, 128 GB unified memory)
  • Add known issues discovered during real deployments:
    • pip system packages conflict
    • Port 3000 conflict with AI Workbench
    • Network policy blocking NVIDIA cloud inference API
  • Add section on running local LLMs with llama.cpp (sm_121 build, GGUF models)
  • Clarify NIM arm64 compatibility: some NIM containers (e.g., Nemotron-3-Super-120B) have native arm64 images, but many are amd64-only
  • Update architecture diagram

Context

Based on hands-on experience setting up NemoClaw on a DGX Spark (GB10, Ubuntu 24.04, Driver 580.126.09, CUDA 13.0). Tested with Nemotron-3-Super-120B (arm64 NIM), Nemotron-70B-Instruct (amd64-only NIM, failed with exec format error), Mistral Small 4 (GGUF via llama.cpp), and Nemotron-3-Nano-30B (GGUF via llama.cpp).

Test plan

  • All commands verified on actual DGX Spark hardware
  • Known issues confirmed and workarounds tested
  • Verified arm64 NIM works for Nemotron-3-Super-120B-A12B
  • Confirmed amd64-only NIMs fail with exec format error

Summary by CodeRabbit

  • New Features

    • Reviewed one-command Spark install flow and explicit Spark quick-start steps
    • Local LLM support guide for building/running llama.cpp-style models and optional local model server
  • Documentation

    • Platform specs and compatibility (Ubuntu 24.04, aarch64/GB10, Docker 28–29) and reduced k8s emphasis
    • Web Dashboard usage and connectivity notes (127.0.0.1 constraint, token location)
    • Expanded Known Issues: Python/package install quirks, AI Workbench port conflict, sandbox egress and image architecture caveats

@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaced WIP installer with a reviewed download-and-run installer flow; added nemoclaw setup-spark onboarding and Spark quick-start; documented Ubuntu 24.04 aarch64 (Grace GB10) and Docker 28/29; added Known Issues, local llama.cpp/llama-server guidance, and revised DGX Spark architecture notes.

Changes

Cohort / File(s) Summary
Spark Installation & Configuration Docs
spark-install.md
Replaced OpenShell curl

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Installer as "Downloaded installer\n(nemoclaw.sh)"
    participant CLI as "nemoclaw CLI"
    participant Gateway as "Gateway Web UI"
    participant Docker as "Docker"
    participant Spark as "Spark cluster"
    participant Sandbox as "Sandbox / Workbench"
    participant Llama as "local llama-server / llama.cpp"

    User->>Installer: inspect (less) and run (`sudo bash`)
    Installer->>CLI: install binaries & add commands
    User->>CLI: `nemoclaw setup-spark`
    CLI->>Docker: pull/start containers (gateway, sandbox, spark services)
    Docker->>Spark: launch Spark services
    CLI->>Gateway: surface Web Dashboard (token at ~/.openclaw)
    alt user provides local LLM
        User->>Llama: build/run llama.cpp (sm_121) or llama-server
        Llama->>Sandbox: register local model endpoint (openclaw.json)
        Sandbox->>Gateway: expose model via gateway
    end
    Sandbox->>Spark: route inference requests via gateway
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibble docs beneath Noble skies,
I hide the script, then give it wise tries.
Spark sparks, Llamas hum on GB10 ground,
Tokens, ports, and dashboards all around.
Hop, inspect, run — the cluster’s unbound!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: updating documentation for a Spark installation guide based on real-world DGX Spark testing and deployment findings.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
spark-install.md (2)

108-108: Prioritize venv over --break-system-packages in the workaround.

While --break-system-packages is mentioned as a workaround, it can break system tools that rely on pip-managed packages. The venv approach is safer and should be emphasized as the preferred solution.

📝 Suggested clarification
-| `pip install` fails with system packages | Known | Use `--break-system-packages` or a venv for Python packages inside the sandbox |
+| `pip install` fails with system packages | Known | Use a venv (recommended) or `--break-system-packages` (not recommended) for Python packages inside the sandbox |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` at line 108, Update the workaround guidance for the table
row with "`pip install` fails with system packages" to prioritize the venv
approach over mentioning `--break-system-packages`; change wording so the
preferred solution is to create and use a Python virtual environment (venv) for
sandbox installs, and present `--break-system-packages` only as a last-resort
option with a warning that it can break system tools that depend on pip-managed
packages.

6-8: The installation script URL is valid and from an official source; consider the curl | bash security pattern as an advisory.

The URL correctly resolves to the official NVIDIA/NemoClaw GitHub repository install script. While the curl | bash pattern with sudo is generally worth being cautious about, it is the standard approach for this installation method. If you wish to emphasize security best practices, consider adding a note suggesting users review the script first, but this is an optional improvement rather than a required fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 6 - 8, The README currently shows a one-line
"curl | sudo bash" install command which can be risky; update the installation
section to advise users to review the fetched script before executing it and
provide an alternative safer instruction: suggest downloading the script first
(e.g., via curl -O or curl to a file), inspecting its contents, then running it
with sudo if acceptable, and optionally include a note about verifying the
source (official NVIDIA/NemoClaw repo) and checksum/signature verification;
reference the existing one-command install line containing the curl -fsSL
https://nvidia.com/nemoclaw.sh | sudo bash so reviewers can replace it with the
safer two-step guidance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@spark-install.md`:
- Line 144: Update the documentation to include the exact JSON fields and
example to set the local provider URL in ~/.openclaw/openclaw.json: instruct
users to add a top-level "provider" object with "url" (the base local endpoint,
e.g. http://host.docker.internal:8000/v1) and "type" (e.g. "openai-compatible")
and show the full JSON structure so it's clear which keys to modify.
- Line 150: Add a language identifier to the fenced code block that contains the
ASCII line "DGX Spark (Ubuntu 24.04, aarch64, cgroup v2, 128 GB unified
memory)"; update the opening fence from ``` to ```text so the block is declared
as plaintext (or `text`/`plaintext`) for proper rendering and accessibility.

---

Nitpick comments:
In `@spark-install.md`:
- Line 108: Update the workaround guidance for the table row with "`pip install`
fails with system packages" to prioritize the venv approach over mentioning
`--break-system-packages`; change wording so the preferred solution is to create
and use a Python virtual environment (venv) for sandbox installs, and present
`--break-system-packages` only as a last-resort option with a warning that it
can break system tools that depend on pip-managed packages.
- Around line 6-8: The README currently shows a one-line "curl | sudo bash"
install command which can be risky; update the installation section to advise
users to review the fetched script before executing it and provide an
alternative safer instruction: suggest downloading the script first (e.g., via
curl -O or curl to a file), inspecting its contents, then running it with sudo
if acceptable, and optionally include a note about verifying the source
(official NVIDIA/NemoClaw repo) and checksum/signature verification; reference
the existing one-command install line containing the curl -fsSL
https://nvidia.com/nemoclaw.sh | sudo bash so reviewers can replace it with the
safer two-step guidance.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3f6dd974-7f58-4591-907c-209f384cd4fe

📥 Commits

Reviewing files that changed from the base of the PR and between 1e23347 and 885eaf8.

📒 Files selected for processing (1)
  • spark-install.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
spark-install.md (2)

144-144: ⚠️ Potential issue | 🟡 Minor

Add explicit openclaw.json fields and example JSON.

This is still underspecified for users; please document exact keys/shape for the local provider URL config.

📝 Suggested doc snippet
-Then configure your sandbox to use the local model by updating `~/.openclaw/openclaw.json` inside the sandbox with the local provider URL.
+Then configure your sandbox to use the local model by updating `~/.openclaw/openclaw.json` inside the sandbox:
+
+```json
+{
+  "provider": {
+    "url": "http://host.docker.internal:8000/v1",
+    "type": "openai-compatible"
+  }
+}
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` at line 144, The docs mention updating
~/.openclaw/openclaw.json but don't specify the expected JSON shape; update the
text around the sentence to include the exact keys and an example showing the
provider object with "provider.url" (the local provider base URL) and
"provider.type" (e.g., "openai-compatible"), and show a full sample JSON payload
so users can copy/paste into openclaw.json; reference the keys "provider.url"
and "provider.type" and ensure the example uses the same host format mentioned
(e.g., host.docker.internal) to match the earlier sentence.

150-156: ⚠️ Potential issue | 🟡 Minor

Specify a language for the fenced architecture block.

The code fence is still missing a language identifier (MD040).

✅ Minimal fix
-```
+```text
 DGX Spark (Ubuntu 24.04, aarch64, cgroup v2, 128 GB unified memory)
   └── Docker (28.x/29.x, cgroupns=host)
   │    └── OpenShell gateway container (k3s embedded)
   │         └── nemoclaw sandbox pod
   │              └── OpenClaw agent + NemoClaw plugin
   └── llama-server (optional, local inference on GB10 GPU)
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @spark-install.md around lines 150 - 156, The fenced code block containing
the architecture tree (starting with "DGX Spark (Ubuntu 24.04, aarch64, cgroup
v2, 128 GB unified memory)" and the tree lines) lacks a language identifier
(MD040); fix it by changing the opening to include a language token such astext (or console) so the fence becomes text and leave the block content
unchanged to satisfy the linter.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @spark-install.md:

  • Around line 6-10: Replace the unsafe implicit pipe-to-shell example ("curl
    -fsSL https://nvidia.com/nemoclaw.sh | sudo bash") with a safer default flow:
    show how to download the installer to a file (referencing the current curl URL),
    verify its integrity via checksum or GPG signature (include where to fetch the
    expected checksum/signature and a one-line verification step), then execute the
    verified file with sudo; keep the original pipe-to-shell command only as an
    explicitly labeled "quick/dev-only" option. Also update the git clone step ("git
    clone https://github.com/NVIDIA/NemoClaw.git") to mention verifying release
    artifacts (checksums/signatures or signed commits/tags) after cloning and before
    running install scripts. Ensure the docs include links/placeholders for
    checksum/signature locations and a short note about supply-chain risk.

Duplicate comments:
In @spark-install.md:

  • Line 144: The docs mention updating ~/.openclaw/openclaw.json but don't
    specify the expected JSON shape; update the text around the sentence to include
    the exact keys and an example showing the provider object with "provider.url"
    (the local provider base URL) and "provider.type" (e.g., "openai-compatible"),
    and show a full sample JSON payload so users can copy/paste into openclaw.json;
    reference the keys "provider.url" and "provider.type" and ensure the example
    uses the same host format mentioned (e.g., host.docker.internal) to match the
    earlier sentence.
  • Around line 150-156: The fenced code block containing the architecture tree
    (starting with "DGX Spark (Ubuntu 24.04, aarch64, cgroup v2, 128 GB unified
    memory)" and the tree lines) lacks a language identifier (MD040); fix it by
    changing the opening to include a language token such astext (or
    console) so the fence becomes text and leave the block content unchanged
    to satisfy the linter.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `7fffb78c-e794-472b-a333-79167406af31`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 885eaf881da3bcf8180fa8cfd0a9178a49aa465a and d64dde5399e546968bb4b16bcaaa3ecae0a537db.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `spark-install.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

spark-install.md Outdated
Comment on lines 6 to 10
# One-command install
curl -fsSL https://nvidia.com/nemoclaw.sh | sudo bash

# Or clone and install manually
git clone https://github.com/NVIDIA/NemoClaw.git
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid piping remote script directly into sudo bash.

This install path is high-risk for supply-chain compromise. Please add a safer default (download + checksum/signature verification) and keep pipe-to-shell as an explicit “quick/dev-only” option.

🔐 Suggested documentation change
-# One-command install
-curl -fsSL https://nvidia.com/nemoclaw.sh | sudo bash
+# Quick install (verify before running as root)
+curl -fsSL https://nvidia.com/nemoclaw.sh -o /tmp/nemoclaw.sh
+# Optional: verify checksum/signature published by NVIDIA
+# sha256sum /tmp/nemoclaw.sh
+sudo bash /tmp/nemoclaw.sh
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# One-command install
curl -fsSL https://nvidia.com/nemoclaw.sh | sudo bash
# Or clone and install manually
git clone https://github.com/NVIDIA/NemoClaw.git
# Quick install (verify before running as root)
curl -fsSL https://nvidia.com/nemoclaw.sh -o /tmp/nemoclaw.sh
# Optional: verify checksum/signature published by NVIDIA
# sha256sum /tmp/nemoclaw.sh
sudo bash /tmp/nemoclaw.sh
# Or clone and install manually
git clone https://github.com/NVIDIA/NemoClaw.git
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 6 - 10, Replace the unsafe implicit
pipe-to-shell example ("curl -fsSL https://nvidia.com/nemoclaw.sh | sudo bash")
with a safer default flow: show how to download the installer to a file
(referencing the current curl URL), verify its integrity via checksum or GPG
signature (include where to fetch the expected checksum/signature and a one-line
verification step), then execute the verified file with sudo; keep the original
pipe-to-shell command only as an explicitly labeled "quick/dev-only" option.
Also update the git clone step ("git clone
https://github.com/NVIDIA/NemoClaw.git") to mention verifying release artifacts
(checksums/signatures or signed commits/tags) after cloning and before running
install scripts. Ensure the docs include links/placeholders for
checksum/signature locations and a short note about supply-chain risk.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
spark-install.md (1)

6-8: ⚠️ Potential issue | 🟠 Major

Avoid pipe-to-shell as the default install path.

Line 7 still recommends curl ... | sudo bash as the primary flow. Please make verified download + integrity check the default, and keep pipe-to-shell as an explicit quick/dev-only option.

🔐 Suggested documentation update
-# One-command install
-curl -fsSL https://nvidia.com/nemoclaw.sh | sudo bash
+# Recommended install (verify before running as root)
+curl -fsSL https://nvidia.com/nemoclaw.sh -o /tmp/nemoclaw.sh
+# Verify checksum/signature from NVIDIA release channel
+# sha256sum /tmp/nemoclaw.sh
+sudo bash /tmp/nemoclaw.sh
+
+# Quick/dev-only
+# curl -fsSL https://nvidia.com/nemoclaw.sh | sudo bash
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 6 - 8, The "One-command install" section
currently promotes `curl -fsSL https://nvidia.com/nemoclaw.sh | sudo bash` as
the primary flow — change this so the default path is a verified download +
integrity check: show commands to curl/download to a file (e.g., save to
nemoclaw.sh), fetch and verify a published checksum/signature or GPG key,
validate the file, and only then instruct to run `sudo bash nemoclaw.sh`; retain
the original pipe-to-shell `curl ... | sudo bash` but move it to a clearly
labeled "Quick/dev-only" or "One-liner (not recommended)" subsection and add a
warning about security risks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@spark-install.md`:
- Around line 164-166: The blockquote contains a blank line that breaks the
Markdown blockquote (MD028); remove the empty line and ensure every line of the
quoted notes remains prefixed with ">" so the two consecutive notes stay inside
the same blockquote. Edit the block that starts with "> **Note**: The sandbox
egress proxy..." and the following "> **Note**: Some NIM containers..." so there
is no bare blank line between them and each line (including the separator) keeps
the ">" prefix.

---

Duplicate comments:
In `@spark-install.md`:
- Around line 6-8: The "One-command install" section currently promotes `curl
-fsSL https://nvidia.com/nemoclaw.sh | sudo bash` as the primary flow — change
this so the default path is a verified download + integrity check: show commands
to curl/download to a file (e.g., save to nemoclaw.sh), fetch and verify a
published checksum/signature or GPG key, validate the file, and only then
instruct to run `sudo bash nemoclaw.sh`; retain the original pipe-to-shell `curl
... | sudo bash` but move it to a clearly labeled "Quick/dev-only" or "One-liner
(not recommended)" subsection and add a warning about security risks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e05e8e7a-9372-4a96-aca5-19f506b01ffd

📥 Commits

Reviewing files that changed from the base of the PR and between d64dde5 and f7b65b4.

📒 Files selected for processing (1)
  • spark-install.md

Comment on lines +164 to +166
> **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.

> **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix blockquote spacing to satisfy markdownlint MD028.

There is a blank line inside a blockquote around Line 165. Keep the separator line prefixed with >.

📝 Suggested fix
 > **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.
-
+>
 > **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
> **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.
> **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
> **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.
>
> **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 165-165: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 164 - 166, The blockquote contains a blank
line that breaks the Markdown blockquote (MD028); remove the empty line and
ensure every line of the quoted notes remains prefixed with ">" so the two
consecutive notes stay inside the same blockquote. Edit the block that starts
with "> **Note**: The sandbox egress proxy..." and the following "> **Note**:
Some NIM containers..." so there is no bare blank line between them and each
line (including the separator) keeps the ">" prefix.

@wscurran wscurran added Platform: DGX Spark Support for DGX Spark documentation Improvements or additions to documentation Getting Started Use this label to identify setup, installation, or onboarding issues. labels Mar 18, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
spark-install.md (2)

178-180: ⚠️ Potential issue | 🟡 Minor

Remove the blank line between consecutive blockquote notes.

There is a blank line in the blockquote sequence (MD028). Keep the separator line prefixed with > so both notes remain in the same blockquote.

Suggested fix
 > **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.
-
+>
 > **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 178 - 180, The markdown contains two
consecutive blockquote notes separated by a blank line which triggers MD028;
merge them into a single contiguous blockquote by removing the empty line so the
second line also starts with ">" (i.e., update the two note lines so both begin
with ">" with no intervening blank line), ensuring both notes remain in the same
blockquote sequence.

134-136: ⚠️ Potential issue | 🟡 Minor

Add a language identifier to the fenced URL block.

Line 134 uses a fenced code block without a language, which triggers MD040. Please switch it to ```text (or plaintext).

Suggested fix
-```
+```text
 http://127.0.0.1:18789/#token=<your-gateway-token>
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @spark-install.md around lines 134 - 136, The fenced code block that contains
the URL "http://127.0.0.1:18789/#token=" is missing a
language identifier which triggers MD040; update that block to use a plain text
language identifier by changing the opening fence to "text" (or "plaintext") so it reads, for example, ```text followed by the URL and then

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@spark-install.md`:
- Around line 178-180: The markdown contains two consecutive blockquote notes
separated by a blank line which triggers MD028; merge them into a single
contiguous blockquote by removing the empty line so the second line also starts
with ">" (i.e., update the two note lines so both begin with ">" with no
intervening blank line), ensuring both notes remain in the same blockquote
sequence.
- Around line 134-136: The fenced code block that contains the URL
"http://127.0.0.1:18789/#token=<your-gateway-token>" is missing a language
identifier which triggers MD040; update that block to use a plain text language
identifier by changing the opening fence to "```text" (or "```plaintext") so it
reads, for example, ```text followed by the URL and then ``` to close; no other
content changes are required.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb7605a8-0161-4ccd-a206-209c9ed1f9ee

📥 Commits

Reviewing files that changed from the base of the PR and between f7b65b4 and b8f4386.

📒 Files selected for processing (1)
  • spark-install.md

@psorensen-nvidia
Copy link

@cluster2600 Thank you for the recommendations, they are under review by engineering for incorporation via our documentation update process to improve DGX Spark experience.

@cluster2600
Copy link
Author

@cluster2600 Thank you for the recommendations, they are under review by engineering for incorporation via our documentation update process to improve DGX Spark experience.

@psorensen-nvidia happy to help! Happy to help even more if there are any opportunities within NVIDIA 😁

@psorensen-nvidia psorensen-nvidia self-assigned this Mar 19, 2026
@psorensen-nvidia psorensen-nvidia added the priority: medium Issue that should be addressed in upcoming releases label Mar 19, 2026
@psorensen-nvidia psorensen-nvidia removed their assignment Mar 19, 2026
@kjw3 kjw3 requested review from lbliii and miyoungc March 23, 2026 17:28
@kjw3
Copy link
Contributor

kjw3 commented Mar 23, 2026

@cluster2600 can you rebase on main and get your changes integrated nicely with the current state of the spark install guide? tag me back when ready.

@kjw3 kjw3 assigned kjw3 and unassigned miyoungc Mar 23, 2026
Copy link
Contributor

@cv cv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful real-world content — the llama.cpp build instructions, known issues table entries, and architecture clarifications are valuable for DGX Spark users.

Stale against main

spark-install.md has been modified multiple times since this PR was opened. Main has a different section structure (added Prerequisites, reorganized Quick Start). This will conflict.

Content concerns

  1. curl -fsSL https://nvidia.com/nemoclaw.sh | sudo bash — the Quick Start adds a curl-pipe-sudo-bash pattern. PR #696 (ericksoa) is specifically about eliminating these patterns from the codebase. The download-then-review approach shown below it is better, but the sudo bash nemoclaw-install.sh line also needs scrutiny — the installer shouldn't need root for everything.

  2. The llama.cpp section is helpful but may go stale. Build flags (-DCMAKE_CUDA_ARCHITECTURES=121), model filenames, and memory estimates will change. Consider linking to llama.cpp docs rather than inlining the full build instructions.

  3. The openclaw.json manual editing instructions conflict with the gateway config flow. #114 and #721 both change how gateway config is managed — manual editing may not survive a gateway restart.

  4. The note about inference.local vs host.containers.internal is confusing — it gives two contradictory approaches without clear guidance on when to use which.

No lint/CI

Only test-unit and test-e2e-sandbox ran (both passed), but no lint or commit-lint checks.

Rebase onto current main and reconcile the Quick Start with the download-then-execute approach from #696.

- Remove WIP notice (content is complete and tested)
- Add one-command install via nvidia.com/nemoclaw.sh
- Clarify Spark hardware details (aarch64, Grace + GB10)
- Add known issues from real deployments: pip system packages,
  port 3000 conflict with AI Workbench, network policy for
  NVIDIA cloud API
- Add section on running local LLMs with llama.cpp on GB10
- Note that NIM containers are amd64-only on Spark
- Update architecture diagram with unified memory and local
  inference

Signed-off-by: Maxime Grenu <maxime@cluster2600.com>
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
Not all NIM containers are amd64-only. Some models (e.g.,
Nemotron-3-Super-120B-A12B) ship native arm64 images that run
on DGX Spark. Clarify the note to reflect this and recommend
checking image architecture before pulling.

Signed-off-by: Maxime Grenu <maxime@cluster2600.com>
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
- Prioritise venv over --break-system-packages in known issues
- Add explicit JSON example for local provider config
- Add note about egress proxy and inference.local
- Add language identifier to architecture code block (MD040)

Signed-off-by: Maxime Grenu <maxime@cluster2600.com>
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
- Replace pipe-to-shell with download-then-review install flow
- Add Web Dashboard section with OpenClaw Control UI instructions
- Document 127.0.0.1 vs localhost origin requirement
- Note external dashboard limitation with link to upstream issue

Signed-off-by: Maxime Grenu <maxime@cluster2600.com>
Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@cluster2600 cluster2600 force-pushed the docs/spark-install-cleanup branch from b8f4386 to c7d7cff Compare March 24, 2026 07:38
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
spark-install.md (3)

128-130: ⚠️ Potential issue | 🟡 Minor

Specify a language for the fenced code block.

The URL snippet at Line 128 uses an untyped fence. Use text (or plaintext) to satisfy MD040 and improve rendering consistency.

Suggested doc patch
-```
+```text
 http://127.0.0.1:18789/#token=<your-gateway-token>
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @spark-install.md around lines 128 - 130, Change the untyped fenced code
block that contains the URL "http://127.0.0.1:18789/#token="
to specify a language (use "text" or "plaintext") so the block becomes text ... , which satisfies MD040 and ensures consistent rendering of that URL
snippet in spark-install.md.


</details>

---

`172-174`: _⚠️ Potential issue_ | _🟡 Minor_

**Remove the blank line between blockquotes (MD028).**

There is a blank separator between consecutive blockquote notes around Line 173. Keep them contiguous or prefix the separator with `>`.

 

<details>
<summary>Suggested doc patch</summary>

```diff
 > **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.
-
 > **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 172 - 174, The two consecutive blockquotes
starting with "Note: The sandbox egress proxy blocks direct access to the host
network..." and "Note: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B)
ship native arm64 images..." are separated by a blank line which violates MD028;
remove the blank line so the blockquotes are contiguous (or convert the
separator to a quoted line by prefixing it with ">") to merge them into a single
contiguous block of blockquotes.
```

</details>

---

`15-18`: _⚠️ Potential issue_ | _🟠 Major_

**Add checksum/signature verification before running installer as root.**

On Lines 15-18, the script is reviewed with `less` but still executed via `sudo bash` without cryptographic integrity verification. Please add a checksum/signature verification step (with source of expected digest/signature) before execution.

 

<details>
<summary>Suggested doc patch</summary>

```diff
 # Download and review the installer before running
 curl -fsSL https://nvidia.com/nemoclaw.sh -o nemoclaw-install.sh
 less nemoclaw-install.sh  # review the script
+# Verify integrity (example)
+# curl -fsSL https://nvidia.com/nemoclaw.sh.sha256 -o nemoclaw-install.sh.sha256
+# sha256sum -c nemoclaw-install.sh.sha256
 sudo bash nemoclaw-install.sh
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 15 - 18, Add a cryptographic verification step
before running the downloaded installer: after running "curl -fsSL ... -o
nemoclaw-install.sh" fetch the vendor-provided checksum or signature (e.g.,
nemoclaw-install.sh.sha256 or nemoclaw-install.sh.sig) from a trusted URL,
verify the archive with a checksum tool (sha256sum or shasum) or verify the
signature with "gpg --verify" against the vendor's public key, and only proceed
to "sudo bash nemoclaw-install.sh" if the check passes; if verification fails,
abort and log the mismatch. Ensure these steps reference the downloaded file
name "nemoclaw-install.sh" and the verification artifact names so reviewers can
find and validate the change.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @spark-install.md:

  • Around line 128-130: Change the untyped fenced code block that contains the
    URL "http://127.0.0.1:18789/#token=" to specify a language
    (use "text" or "plaintext") so the block becomes text ... , which
    satisfies MD040 and ensures consistent rendering of that URL snippet in
    spark-install.md.
  • Around line 172-174: The two consecutive blockquotes starting with "Note: The
    sandbox egress proxy blocks direct access to the host network..." and "Note:
    Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64
    images..." are separated by a blank line which violates MD028; remove the blank
    line so the blockquotes are contiguous (or convert the separator to a quoted
    line by prefixing it with ">") to merge them into a single contiguous block of
    blockquotes.
  • Around line 15-18: Add a cryptographic verification step before running the
    downloaded installer: after running "curl -fsSL ... -o nemoclaw-install.sh"
    fetch the vendor-provided checksum or signature (e.g.,
    nemoclaw-install.sh.sha256 or nemoclaw-install.sh.sig) from a trusted URL,
    verify the archive with a checksum tool (sha256sum or shasum) or verify the
    signature with "gpg --verify" against the vendor's public key, and only proceed
    to "sudo bash nemoclaw-install.sh" if the check passes; if verification fails,
    abort and log the mismatch. Ensure these steps reference the downloaded file
    name "nemoclaw-install.sh" and the verification artifact names so reviewers can
    find and validate the change.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `06be2987-a1a0-4d6c-b27f-03b49c74eed1`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between b8f438613d734a6544f2e52ff1004a4ad15b539e and c7d7cffa919cea97d165559f125410698f4c204b.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `spark-install.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@cluster2600
Copy link
Author

Rebased and cleaned up — should be good to go now. @kjw3

@miyoungc
Copy link
Contributor

There was another PR that added the Ollama path. Your PR still has items worth cherry-picking into main:

  • The three additional Known Issues (pip system packages, port 3000 AI Workbench conflict, NVIDIA cloud API egress)
  • The Web Dashboard access section
  • The NIM arm64 compatibility warning
  • Hardware details (aarch64, 128 GB unified memory, Docker 29.x)
  • llama.cpp as a second local-inference option (though consider linking rather than inlining full build instructions, per reviewer feedback)

The install-flow change (download-then-review) is being handled separately in PR #696, so that can be skipped. The manual openclaw.json editing should be validated against the current gateway config flow before merging.

As @cv requested, hope you can rebase once more and make the additional changes.

Also still not seeing the complete set of CI triggering after your recent push. It might be easier if you open a fresh PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation Getting Started Use this label to identify setup, installation, or onboarding issues. Platform: DGX Spark Support for DGX Spark priority: medium Issue that should be addressed in upcoming releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants