fix(onboard): allow proc writes for Docker GPU patch#3543
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughcreateSandbox resolves the Docker GPU sandbox-create plan before policy creation and conditionally logs its message; separately, the OpenShell readiness exec call now passes ChangesDocker GPU create plan and exec suppression
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)
5636-5661:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove this orchestration back out of
src/lib/onboard.tsto clear the entrypoint budget.The added
useDockerGpuPatchbranching and log selection here is what trips the blockingCI / Onboard Entrypoint Budgetjob (src/lib/onboard.tsgrew by 7 lines). Please hide this behind a helper in the GPU onboarding modules so this file stays orchestration-only.As per coding guidelines
src/lib/onboard.ts: This file contains core onboarding logic. Changes here affect the full sandbox creation and configuration flow.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` around lines 5636 - 5661, Extract the GPU-specific branching and log-selection into a helper inside the GPU onboarding module (e.g., add a function in dockerGpuSandboxCreate or a new gpuOnboard helper) that accepts the GPU config and environment flags and returns { useDockerGpuPatch, logMessage } or at least useDockerGpuPatch plus a descriptive message; replace the local calls to dockerGpuSandboxCreate.shouldUseDockerGpuPatchForCreate and the conditional console.log selection in src/lib/onboard.ts with a single call to that helper and a single console.log of the returned message, leaving prepareInitialSandboxCreatePolicy, initialSandboxPolicy handling (cleanup/appliedPresets) and effectiveSandboxGpuConfig usage unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 5636-5661: Extract the GPU-specific branching and log-selection
into a helper inside the GPU onboarding module (e.g., add a function in
dockerGpuSandboxCreate or a new gpuOnboard helper) that accepts the GPU config
and environment flags and returns { useDockerGpuPatch, logMessage } or at least
useDockerGpuPatch plus a descriptive message; replace the local calls to
dockerGpuSandboxCreate.shouldUseDockerGpuPatchForCreate and the conditional
console.log selection in src/lib/onboard.ts with a single call to that helper
and a single console.log of the returned message, leaving
prepareInitialSandboxCreatePolicy, initialSandboxPolicy handling
(cleanup/appliedPresets) and effectiveSandboxGpuConfig usage unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: dd99aa85-d1e5-4c12-b190-55caa60af8eb
📒 Files selected for processing (3)
src/lib/onboard.tssrc/lib/onboard/initial-policy.tstest/onboard.test.ts
Selective E2E Results — ✅ All requested jobs passedRun: 25889896109
|
Selective E2E Results —
|
| Job | Result |
|---|---|
| gpu-e2e | ⏭️ skipped |
Selective E2E Results — ✅ All requested jobs passedRun: 25891068954
|
## Summary Refreshes the NemoClaw docs for the 0.0.43 release window, covering GPU onboarding fixes, installer CDI repair behavior, and Linux uninstall cleanup. Updates the docs version metadata and regenerates the user skill references from the source docs. ## Related Issue None. ## Changes - #3428 -> `docs/reference/troubleshooting.md`: Documents the installer path that repairs missing NVIDIA CDI device specs before onboarding. - #3515 and #3543 -> `docs/about/release-notes.md` and `docs/reference/troubleshooting.md`: Documents the Linux Docker-driver GPU proof permission fix for `/proc/<pid>/task/<tid>/comm` writes. - #3536 -> `docs/reference/commands.md`: Documents that `nemoclaw uninstall` removes Linux gateway state under `~/.local/state/nemoclaw`. - Refreshes generated `nemoclaw-user-*` skill references from the updated source docs. - Bumps `docs/project.json` and `docs/versions1.json` to 0.0.43. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved GPU onboarding on Linux Docker-driver with automatic CDI spec repair and fallback mechanisms. * Fixed permission issues affecting GPU proof writes during Linux onboarding. * Enhanced uninstall to properly clean up gateway state and auth proxy processes on Linux. * **Documentation** * Updated release notes, command references, and troubleshooting guides for v0.0.43. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3613) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
Summary
Allow the Linux Docker GPU patch path to mirror OpenShell's
/procGPU policy enrichment when NemoClaw recreates a sandbox container with GPU flags. This fixes DGX Spark installs wherenvidia-smisucceeds but the direct GPU proof fails on/proc/<pid>/task/<tid>/commwithPermission denied.Changes
/proctofilesystem_policy.read_writeonly for the Docker GPU patch path, while preserving the native OpenShell--gpupolicy behavior.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit
New Features
Improvements
Tests