Skip to content

docs: use CUDA base image for GPU smoke test in spark-install#1226

Open
latenighthackathon wants to merge 2 commits intoNVIDIA:mainfrom
latenighthackathon:docs/spark-install-gpu-image
Open

docs: use CUDA base image for GPU smoke test in spark-install#1226
latenighthackathon wants to merge 2 commits intoNVIDIA:mainfrom
latenighthackathon:docs/spark-install-gpu-image

Conversation

@latenighthackathon
Copy link
Copy Markdown
Contributor

@latenighthackathon latenighthackathon commented Apr 1, 2026

Summary

  • Replace ubuntu with nvidia/cuda:12.8.0-base-ubuntu24.04 in the GPU verification command

Related Issue

Closes #1166

Changes

The GPU smoke test in spark-install.md uses the ubuntu Docker image, which does not include nvidia-smi. The command always fails even when GPUs are correctly configured, making it useless as a validation step.

Switched to nvidia/cuda:12.8.0-base-ubuntu24.04 which includes nvidia-smi and correctly validates GPU access through the NVIDIA container runtime.

Testing

  • Verified nvidia/cuda:12.8.0-base-ubuntu24.04 exists on Docker Hub
  • Standard NVIDIA GPU validation image used in NVIDIA documentation

Checklist

  • Conventional commit format
  • Scoped to issue, no unrelated changes
  • No secrets or credentials

Signed-off-by: latenighthackathon latenighthackathon@users.noreply.github.com

Summary by CodeRabbit

  • Documentation
    • Clarified NVIDIA container runtime verification instructions to specify an explicit CUDA base image version for clearer, more consistent GPU-enabled container setup. The verification step and GPU check remain intact, improving reproducibility of the documented workflow.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The GPU smoke-test Docker command in documentation now uses the nvidia/cuda:12.8.0-base-ubuntu24.04 image instead of ubuntu, so nvidia-smi is executed inside a container that includes the binary.

Changes

Cohort / File(s) Summary
Documentation Fix
spark-install.md
Replaced ubuntu with nvidia/cuda:12.8.0-base-ubuntu24.04 in the GPU smoke-test docker run --rm --runtime=nvidia --gpus all ... nvidia-smi command so the nvidia-smi binary exists inside the container.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰
I hopped in docs with nimble paw,
Swapped base to CUDA—fixed the flaw.
Now nvidia-smi wakes with glee,
The smoke test hums—no mystery! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: replacing the ubuntu image with a CUDA base image for the GPU smoke test in spark-install documentation.
Linked Issues check ✅ Passed The PR directly addresses the primary objective from issue #1166 by replacing the ubuntu image with nvidia/cuda:12.8.0-base-ubuntu24.04, which includes nvidia-smi and enables proper GPU validation.
Out of Scope Changes check ✅ Passed All changes are scoped to the GPU smoke test command in spark-install.md and directly address the documented issue; no out-of-scope modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@wscurran wscurran added documentation Improvements or additions to documentation NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). labels Apr 1, 2026
@wscurran
Copy link
Copy Markdown
Contributor

wscurran commented Apr 1, 2026

✨ Thanks for submitting this pull request, which proposes a way to improve the documentation for the GPU smoke test in spark-install. This could help users verify their GPU configuration more accurately.


Possibly related open issues:

@prekshivyas
Copy link
Copy Markdown
Contributor

prekshivyas commented Apr 14, 2026

@latenighthackathon thanks ! pls update your branch, add a sign-off in your PR description, otherwise this looks good for approval

@prekshivyas
Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@wscurran wscurran added the status: rebase PR needs to be rebased against main before review can continue label Apr 14, 2026
@latenighthackathon latenighthackathon force-pushed the docs/spark-install-gpu-image branch from 7d1f4c8 to f86b936 Compare April 14, 2026 22:51
@latenighthackathon
Copy link
Copy Markdown
Contributor Author

@prekshivyas @wscurran Thanks for the review! I've rebased onto latest origin/main — single clean commit now (f86b936a). The Signed-off-by trailer is on both the commit and at the bottom of the PR body. Let me know if you'd like it somewhere else in the description.

Cheers!

@wscurran wscurran removed the status: rebase PR needs to be rebased against main before review can continue label Apr 15, 2026
@prekshivyas prekshivyas self-assigned this Apr 16, 2026
Copy link
Copy Markdown
Contributor

@prekshivyas prekshivyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — ubuntu image doesn't have nvidia-smi. CUDA base image is the standard for GPU validation. @latenighthackathon can you rebase onto main?

@latenighthackathon latenighthackathon force-pushed the docs/spark-install-gpu-image branch from 3e75050 to 521dff6 Compare April 17, 2026 01:33
@latenighthackathon
Copy link
Copy Markdown
Contributor Author

@prekshivyas Rebased onto latest origin/main — single clean commit (521dff6f), merge commit dropped. Sign-off present on both commit and PR body.

Cheers!

@prekshivyas
Copy link
Copy Markdown
Contributor

@latenighthackathon is it okay if you can give us access to update branches on your fork ?
would help speed up the merge requests you have - most of them are ready for approval but blocked on Update Branch

@latenighthackathon
Copy link
Copy Markdown
Contributor Author

@prekshivyas Done — flipped "Allow edits by maintainers" on my open NemoClaw PRs (#1226, #1725, #1286, #1897, #1898) and on OpenShell #871 while I was at it. You should be able to Update Branch directly on any of them now.

Cheers!

The ubuntu image does not include nvidia-smi, so the GPU verification
command always fails even when GPUs are correctly configured. Switch
to nvidia/cuda:12.8.0-base-ubuntu24.04 which includes nvidia-smi.

Closes NVIDIA#1166

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
@latenighthackathon latenighthackathon force-pushed the docs/spark-install-gpu-image branch from 0dc82c3 to 3e9b2bb Compare April 18, 2026 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). Platform: DGX Spark Support for DGX Spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[NemoClaw][spark 1.120.38] [doc] spark-install .md command fails because ubuntu image lacks nvidia-smi

3 participants