fix(cli): clear sandbox registry when gateway is destroyed during onboard (fixes: #532)#634
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds Changes
Sequence Diagram(s)(Skipped — changes do not introduce a new multi-component control flow requiring visualization.) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Thanks for submitting this proposed fix to clear the sandbox registry when a gateway is destroyed, which may help prevent issues with stale entries and improve the overall user experience. |
|
Rebased onto current Happy to re-do anything if needed on your end. |
…oard Implemented feature with help from Claude Code When onboard detects a stale gateway or starts a fresh one, it runs `openshell gateway destroy` which deletes all sandboxes in OpenShell. However the local NemoClaw registry (~/.nemoclaw/sandboxes.json) was not cleared, leaving stale entries that caused `nemoclaw list` to show sandboxes that no longer exist and `nemoclaw <name> connect` to fail with "sandbox not found". - Add `clearAll()` to registry module to reset all sandbox entries. - Call `registry.clearAll()` after gateway destroy in both preflight cleanup and startGateway (covers all destroy paths). - Add 3 tests for clearAll: multi-sandbox clear, disk persistence, and idempotent call on empty registry. Fixes NVIDIA#532 Signed-off-by: Craig <craig@epic28.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5dc2102 to
5b21901
Compare
|
@wscurran Thanks for the feedback! Just rebased onto current main — everything merges cleanly. When you get a chance, would you mind giving it a formal review? |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
bin/lib/registry.js (1)
240-243: Missing lock acquisition may cause race condition with concurrent registry operations.The
clearAll()function directly callssave()without usingwithLock(), unlike all other mutating functions in this module (registerSandbox,updateSandbox,removeSandbox,setDefault). If another process performs a read-modify-write operation concurrently, the clear could be silently lost.In practice,
clearAll()is only called during gateway destruction in onboarding, where concurrent modifications are unlikely. However, for consistency and defensive coding, wrapping the call inwithLock()is safer.♻️ Proposed fix to add lock
function clearAll() { - save({ sandboxes: {}, defaultSandbox: null }); + withLock(() => { + save({ sandboxes: {}, defaultSandbox: null }); + }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/registry.js` around lines 240 - 243, The clearAll() function calls save() directly and should acquire the same registry lock as other mutating functions to avoid races; update clearAll to call withLock() and perform save({ sandboxes: {}, defaultSandbox: null }) inside the locked callback (mirroring registerSandbox/updateSandbox/removeSandbox/setDefault patterns) so the registry mutation is atomic under the same lock used by other operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@bin/lib/registry.js`:
- Around line 240-243: The clearAll() function calls save() directly and should
acquire the same registry lock as other mutating functions to avoid races;
update clearAll to call withLock() and perform save({ sandboxes: {},
defaultSandbox: null }) inside the locked callback (mirroring
registerSandbox/updateSandbox/removeSandbox/setDefault patterns) so the registry
mutation is atomic under the same lock used by other operations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7aa1efdf-2cab-4702-923e-a73d12c14e85
📒 Files selected for processing (3)
bin/lib/onboard.jsbin/lib/registry.jstest/registry.test.js
✅ Files skipped from review due to trivial changes (1)
- test/registry.test.js
…tations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ript modules (#1240) ## Summary - Extract ~210 lines of pure, side-effect-free functions from the 3,800-line `onboard.js` into **5 typed TypeScript modules** under `src/lib/`: - `gateway-state.ts` — gateway/sandbox state classification from openshell output - `validation.ts` — failure classification, API key validation, model ID checks - `url-utils.ts` — URL normalization, text compaction, env formatting - `build-context.ts` — Docker build context filtering, recovery hints - `dashboard.ts` — dashboard URL resolution and construction - Add **56 co-located unit tests** (`src/lib/*.test.ts`) for the extracted modules - Set up CLI TypeScript compilation: `tsconfig.src.json` compiles `src/` → `dist/` as CJS - `onboard.js` imports from compiled `dist/lib/` output — transparent to callers - Pre-commit hook updated to build TS and include `dist/lib/` in coverage These functions are **not touched by any #924 blocker PR** (#781, #782, #819, #672, #634, #890), so this extraction is safe to land immediately. ## Test plan - [x] 598 CLI tests pass (542 existing + 56 new) - [x] `tsc -p tsconfig.src.json` compiles cleanly - [x] `tsc -p tsconfig.cli.json` type-checks cleanly - [x] `tsc -p jsconfig.json` type-checks cleanly - [x] Coverage ratchet passes with `dist/lib/` included Closes #1237. Relates to #924 (shell consolidation). 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Improved sandbox-creation recovery hints and targeted remediation commands. * Smarter dashboard URL resolution and control-UI URL construction. * **Bug Fixes** * More accurate gateway and sandbox state detection. * Enhanced classification of validation/apply failures and safer model/key validation. * Better provider URL normalization and loopback handling. * **Tests** * Added comprehensive tests covering new utilities. * **Chores** * CLI now builds before CLI tests; CI/commit hooks updated to run the CLI build. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ript modules (#1240) ## Summary - Extract ~210 lines of pure, side-effect-free functions from the 3,800-line `onboard.js` into **5 typed TypeScript modules** under `src/lib/`: - `gateway-state.ts` — gateway/sandbox state classification from openshell output - `validation.ts` — failure classification, API key validation, model ID checks - `url-utils.ts` — URL normalization, text compaction, env formatting - `build-context.ts` — Docker build context filtering, recovery hints - `dashboard.ts` — dashboard URL resolution and construction - Add **56 co-located unit tests** (`src/lib/*.test.ts`) for the extracted modules - Set up CLI TypeScript compilation: `tsconfig.src.json` compiles `src/` → `dist/` as CJS - `onboard.js` imports from compiled `dist/lib/` output — transparent to callers - Pre-commit hook updated to build TS and include `dist/lib/` in coverage These functions are **not touched by any #924 blocker PR** (#781, #782, #819, #672, #634, #890), so this extraction is safe to land immediately. ## Test plan - [x] 598 CLI tests pass (542 existing + 56 new) - [x] `tsc -p tsconfig.src.json` compiles cleanly - [x] `tsc -p tsconfig.cli.json` type-checks cleanly - [x] `tsc -p jsconfig.json` type-checks cleanly - [x] Coverage ratchet passes with `dist/lib/` included Closes #1237. Relates to #924 (shell consolidation). 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Improved sandbox-creation recovery hints and targeted remediation commands. * Smarter dashboard URL resolution and control-UI URL construction. * **Bug Fixes** * More accurate gateway and sandbox state detection. * Enhanced classification of validation/apply failures and safer model/key validation. * Better provider URL normalization and loopback handling. * **Tests** * Added comprehensive tests covering new utilities. * **Chores** * CLI now builds before CLI tests; CI/commit hooks updated to run the CLI build. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Closing in favor of #1245, which reimplements this fix on top of current main. The original approach here has gone stale — #1245 covers the same registry cleanup with CodeRabbit feedback addressed. Thanks @craigamcw for the original work. |
Implemented feature with help from Claude Code
When onboard detects a stale gateway or starts a fresh one, it runs
openshell gateway destroywhich deletes all sandboxes in OpenShell. However the local NemoClaw registry (~/.nemoclaw/sandboxes.json) was not cleared, leaving stale entries that causednemoclaw listto show sandboxes that no longer exist andnemoclaw <name> connectto fail with "sandbox not found".Summary
Clear the local NemoClaw sandbox registry when a stale gateway is destroyed during onboard preflight. Previously, openshell gateway destroy deleted sandboxes in OpenShell but left stale entries in ~/.nemoclaw/sandboxes.json, causing nemoclaw list to show sandboxes that no longer exist.
Related Issue
#532
Changes
clearAll()to registry module to reset all sandbox entries.registry.clearAll()after gateway destroy in both preflight cleanup and startGateway (covers all destroy paths).Type of Change
Testing
make checkpasses.npm testpasses.make docsbuilds without warnings. (for doc-only changes)Checklist
General
Code Changes
make formatapplied (TypeScript and Python).Doc Changes
update-docsagent skill to draft changes while complying with the style guide. For example, prompt your agent with "/update-docscatch up the docs for the new changes I made in this PR."Summary by CodeRabbit
Bug Fixes
Tests