fix(onboard,uninstall): replace misleading recovery messages (#3456)#3520
Conversation
Resolve the two output threads in #3456 left after the core dead-loop fix landed via #3459 + #3434: Sub-bug #3 — `src/lib/onboard.ts` printed `nemoclaw <name> destroy --yes && nemoclaw onboard --gpu` with a literal `<name>` placeholder, and assumed at least one sandbox was registered. When the GPU-passthrough mismatch hit on the State B re-run path with an empty registry (the dead-loop case), the hint was not actionable. Replace with a registry-aware helper at `src/lib/onboard/gpu-recovery.ts` that renders the right shape: - empty registry → suggest `nemoclaw uninstall && nemoclaw onboard --gpu` - one sandbox → suggest destroy --yes --cleanup-gateway for that name - multiple sandboxes → list each, only the last gets --cleanup-gateway Sub-bug #4 — `src/lib/actions/uninstall/run-plan.ts` printed `Destroyed gateway 'nemoclaw' skipped` when the openshell destroy no-op'd (gateway already gone) — the "Destroyed … skipped" wording was self-contradictory. Extend `runOptional` with an `onSkip` option; route the gateway destroy to emit `Gateway 'nemoclaw' already removed or unreachable` on no-op. Tests: - `src/lib/onboard/gpu-recovery.test.ts` (6 tests): forbid literal `<name>` placeholder anywhere in the output; cover empty / single / multi-sandbox cases; defensive filter on whitespace names so a `nemoclaw destroy` rendering can never happen. - `src/lib/actions/uninstall/run-plan.test.ts`: assert the new "already removed or unreachable" wording and the absence of the "Destroyed gateway 'nemoclaw' skipped" string. The core dead loop itself (sub-bugs #1, #2 and State B GPU mismatch) is already addressed by #3459 + #3434 + #3483; #3456 will close once this lands. See the #3456 status comment for the full mapping. Refs #3456. Mirrors (and tightens) the approach in the closed PR #3464, which left the literal `<name>` placeholder in tests per CodeRabbit feedback that was never addressed. Signed-off-by: Charan Jagwani <charjags100@gmail.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
…nto gpu-recovery (#3456) The first commit grew `src/lib/onboard.ts` by +13 lines, which trips the `onboard-entrypoint-budget` policy on `main` (entrypoint must be net-neutral or smaller; new logic belongs under `src/lib/onboard/**`). Extract the inline registry-lookup + emit loop into two new exports in `src/lib/onboard/gpu-recovery.ts`: - `getRegisteredSandboxNamesForGpuRecovery()` — registry read with a graceful empty-list fallback. - `reportGpuPassthroughRecovery(emit, loadNames?)` — emits the hint lines via `emit`. `loadNames` defaults to the registry reader; tests inject their own list. The onboard.ts callsite collapses from 14 lines to 1: `reportGpuPassthroughRecovery(console.error);` Net change in `src/lib/onboard.ts`: -1 line. Budget passes. Adds two tests for the new wrapper covering empty-registry and multi-sandbox cases. 24 tests pass (was 22). Signed-off-by: Charan Jagwani <charjags100@gmail.com>
`.agents/skills/nemoclaw-maintainer-issue-autopilot/SKILL.md` was accidentally included in the previous commit (afe5427). It's a local-only maintainer skill, not relevant to the #3456 fix. Drop it from the PR so the Stage 9 perfect-match audit shows no surplus. Signed-off-by: Charan Jagwani <charjags100@gmail.com>
READY FOR HUMAN REVIEWStage 9 perfect-match acceptance audit, with no surplus. Acceptance clause → evidence
Surplus scanNo surplus. PR-scoped diff: 5 files, +247/-12, every file traces to clause #3 or #4. Final gates
RFR draft (cc reviewers): PR closes the two remaining output threads in #3456 — registry-aware GPU-passthrough recovery hint + non-contradictory uninstall destroy wording. Five files, +247/-12. Adopts the CodeRabbit refinement from the closed #3464 (forbid |
Selective E2E Results — ✅ All requested jobs passedRun: 25875995216
|
…3536) ## Summary Adds `~/.local/state/nemoclaw/` to the uninstall plan so `nemoclaw uninstall` cleans the Linux Docker-driver gateway's state directory (pid file, SQLite db, audit log, `vm-driver/` state). Closes #3535 — the verified residual sub-bug #5c from #3456, which auto-closed when #3520 merged. ## Acceptance criteria mapping | Clause from #3535 | Evidence | |---|---| | `~/.local/state/nemoclaw/` is removed by `nemoclaw uninstall` on Linux | `src/lib/actions/uninstall/run-plan.ts:588` adds `removePath(paths.gatewayLocalStateDir, runtime)` to the `executePlan` removal block | | `paths.test.ts` confirms the path is in `uninstallStatePaths()` return | `src/lib/domain/uninstall/paths.test.ts` — new test `#3456: exposes the Linux Docker-driver gateway state dir so uninstall can clean it` | | No regression in existing uninstall tests | `npx vitest run src/lib/domain/uninstall/paths.test.ts src/lib/domain/uninstall/plan.test.ts src/lib/actions/uninstall/run-plan.test.ts` — 22 pass (1 new + 21 existing) | ## Behavior matrix | Before | After | |---|---| | `uninstallStatePaths()` returns 3 dirs: `~/.nemoclaw`, `~/.config/openshell`, `~/.config/nemoclaw` | Returns 4: adds `~/.local/state/nemoclaw` | | `executePlan` removes 3 of 4 NemoClaw-owned dirs; leaves Linux gateway state behind | Removes all 4 | | `defaultUninstallPaths()` does not surface `gatewayLocalStateDir` | Surfaces it (matches `NEMOCLAW_OPENSHELL_GATEWAY_STATE_DIR` doc) | ## Test plan ``` npm run typecheck:cli npx vitest run src/lib/domain/uninstall/paths.test.ts src/lib/domain/uninstall/plan.test.ts src/lib/actions/uninstall/run-plan.test.ts ``` Result: 22/22 pass. ## Notes for reviewers - Path source-of-truth: `docs/reference/commands.md:1150` documents `NEMOCLAW_OPENSHELL_GATEWAY_STATE_DIR` as `~/.local/state/nemoclaw/openshell-docker-gateway`. The parent dir (`~/.local/state/nemoclaw/`) is what we own; cleaning the parent removes the gateway dir + `vm-driver/` + future siblings. - The `Pick<>` on `uninstallStatePaths` was extended to include the new field; existing callers in `executePlan` already pass full `UninstallPaths` so they pick it up automatically. - Out of scope: sub-bug #5d (`.openshell-installed-version`). My follow-up comment on #3456 explains — no source in NemoClaw writes that sentinel on current `main`, so filing a speculative fix isn't appropriate yet. Closes #3535 Refs #3456 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Uninstall now also removes the Linux gateway local state directory, ensuring more complete cleanup on uninstall. * **Tests** * Added a regression test to verify the gateway local state directory is exposed and included in uninstall cleanup. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3536) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Charan Jagwani <cjagwani@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
## Summary Refreshes the NemoClaw documentation for the local `main` changes included in the 0.0.42 release. The update adds release notes, updates the affected user-facing setup and troubleshooting pages, bumps docs metadata to 0.0.42, and regenerates the matching user skills. ## Changes - #3537 -> `docs/reference/commands.md`, `docs/reference/troubleshooting.md`: Documented host-level status fields, cloudflared state-specific recovery hints, and Local Ollama auth proxy status diagnostics. - #3454 -> `docs/get-started/prerequisites.md`, `docs/get-started/quickstart.md`: Documented macOS Docker-driver onboarding and removed the expectation that standard macOS setup needs a VM driver helper. - #3514 -> `docs/inference/use-local-inference.md`: Documented compatible-endpoint retry behavior for reasoning-only smoke responses. - #3448 -> `docs/reference/commands.md`, `docs/manage-sandboxes/messaging-channels.md`: Documented canonical channel names and policy preset hints after `channels add`. - #3520 -> `docs/about/release-notes.md`: Captured clearer GPU recovery and uninstall wording in the 0.0.42 release notes. - #3313 -> `docs/get-started/quickstart.md`, `docs/reference/troubleshooting.md`: Documented stronger dashboard port detection and rollback when a forward cannot start. - #3502 -> `docs/about/release-notes.md`: Captured batched onboarding policy preset application in the 0.0.42 release notes. - #3505 -> `docs/reference/troubleshooting.md`: Documented the top-level Colima socket path. - #3421 -> `docs/about/release-notes.md`: Captured idempotent installer shim logging in the 0.0.42 release notes. - Updated `docs/project.json`, `docs/versions1.json`, and regenerated `.agents/skills/nemoclaw-user-*` outputs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - v0.0.42 * **Documentation** * Enhanced macOS onboarding guidance for Docker gateway setup * Improved dashboard port conflict handling with automatic rollback * Better local Ollama inference diagnostics and authentication proxy checks * Clarified status command output and recovery procedures * Refined messaging channel setup documentation * **Chores** * Version bump to 0.0.42 <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3540) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
Closes the two remaining output threads in #3456 after the core dead-loop fix already landed on
main(via #3459, #3434, #3483). Full sub-bug mapping in the #3456 status comment.nemoclaw <name> destroy --yesrecovery hint replaced with a registry-aware helper.Destroyed gateway 'nemoclaw' skippedself-contradictory wording replaced withGateway 'nemoclaw' already removed or unreachable.Acceptance criteria mapping
<name>placeholdersrc/lib/onboard/gpu-recovery.ts+onboard.ts:10387-10405src/lib/actions/uninstall/run-plan.ts:210-228, 407-414Behavior matrix
gpuPassthroughRecoveryLines(names):null/[]nemoclaw uninstall && nemoclaw onboard --gpunemoclaw <name> destroy --yes --cleanup-gateway && nemoclaw onboard --gpudestroy --yes, only the last gets--cleanup-gatewayTest plan
22 tests pass (6 new + 16 existing).
Notes for reviewers
<name>placeholder to be forbidden in tests via negative assertion. This PR adopts that refinement.runOptionalextension is backwards-compatible — existing callers withoutonSkipget the original wording.Closes #3456 once merged.