Skip to content

fix(cli): clear sandbox registry when gateway is destroyed during onboard (fixes: #532)#634

Closed
craigamcw wants to merge 4 commits intoNVIDIA:mainfrom
craigamcw:fix/clear-registry-on-gateway-destroy-532
Closed

fix(cli): clear sandbox registry when gateway is destroyed during onboard (fixes: #532)#634
craigamcw wants to merge 4 commits intoNVIDIA:mainfrom
craigamcw:fix/clear-registry-on-gateway-destroy-532

Conversation

@craigamcw
Copy link
Copy Markdown
Contributor

@craigamcw craigamcw commented Mar 22, 2026

Implemented feature with help from Claude Code

When onboard detects a stale gateway or starts a fresh one, it runs openshell gateway destroy which deletes all sandboxes in OpenShell. However the local NemoClaw registry (~/.nemoclaw/sandboxes.json) was not cleared, leaving stale entries that caused nemoclaw list to show sandboxes that no longer exist and nemoclaw <name> connect to fail with "sandbox not found".

Summary

Clear the local NemoClaw sandbox registry when a stale gateway is destroyed during onboard preflight. Previously, openshell gateway destroy deleted sandboxes in OpenShell but left stale entries in ~/.nemoclaw/sandboxes.json, causing nemoclaw list to show sandboxes that no longer exist.

Related Issue

#532

Changes

  • Add clearAll() to registry module to reset all sandbox entries.
  • Call registry.clearAll() after gateway destroy in both preflight cleanup and startGateway (covers all destroy paths).
  • Add 3 tests for clearAll: multi-sandbox clear, disk persistence, and idempotent call on empty registry.

Type of Change

  • Code change for a new feature, bug fix, or refactor.
  • Code change with doc updates.
  • Doc only. Prose changes without code sample modifications.
  • Doc only. Includes code sample changes.

Testing

  • make check passes.
  • npm test passes.
  • make docs builds without warnings. (for doc-only changes)

Checklist

General

Code Changes

  • make format applied (TypeScript and Python).
  • Tests added or updated for new or changed behavior.
  • No secrets, API keys, or credentials committed.
  • Doc pages updated for any user-facing behavior changes (new commands, changed defaults, new features, bug fixes that contradict existing docs).

Doc Changes

  • Follows the style guide. Try running the update-docs agent skill to draft changes while complying with the style guide. For example, prompt your agent with "/update-docs catch up the docs for the new changes I made in this PR."
  • New pages include SPDX license header and frontmatter, if creating a new page.
  • Cross-references and links verified.

Summary by CodeRabbit

  • Bug Fixes

    • Improved cleanup: the local sandbox registry is now fully cleared when gateways are removed or replaced and during startup, preventing stale sandbox entries from persisting after restarts or removals.
  • Tests

    • Added tests verifying registry reset behavior, that persistent storage is updated to an empty state, and that clearing is safe/idempotent when the registry is already empty.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f6c41d52-50c1-4f08-abf1-9fc5bdfa81a8

📥 Commits

Reviewing files that changed from the base of the PR and between c819df6 and c8ceb6d.

📒 Files selected for processing (1)
  • bin/lib/onboard.js
✅ Files skipped from review due to trivial changes (1)
  • bin/lib/onboard.js

📝 Walkthrough

Walkthrough

Adds registry.clearAll() to wipe persisted sandbox registry and calls it from bin/lib/onboard.js when a prior nemoclaw gateway is destroyed; includes Vitest cases ensuring clearAll() clears in-memory state, updates the on-disk registry, and is idempotent.

Changes

Cohort / File(s) Summary
Registry Module
bin/lib/registry.js
Adds and exports clearAll() which acquires the registry lock and saves an empty state ({ sandboxes: {}, defaultSandbox: null }); adds JSDoc comments to existing functions.
Onboard Script
bin/lib/onboard.js
Calls registry.clearAll() during stale/unnamed gateway cleanup in preflight() and when destroying a stale gateway in startGatewayWithOptions(); updates inline comments to document the registry-clearing behavior.
Tests
test/registry.test.js
Adds three Vitest cases validating clearAll() removes all sandboxes and resets default, persists an empty sandboxes.json, and is safe/idempotent when registry already empty.

Sequence Diagram(s)

(Skipped — changes do not introduce a new multi-component control flow requiring visualization.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hopped through sandboxes, one by one,
When gateways faded, my tidy job begun.
I nudged the registry, left defaults null,
Clean traces behind — calm, neat, and full.
Tiny paws, steady work, registry done.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding registry clearing when a gateway is destroyed during onboarding, directly addressing the issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@wscurran wscurran added bug Something isn't working NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). labels Mar 23, 2026
@wscurran
Copy link
Copy Markdown
Contributor

Thanks for submitting this proposed fix to clear the sandbox registry when a gateway is destroyed, which may help prevent issues with stale entries and improve the overall user experience.

@craigamcw
Copy link
Copy Markdown
Contributor Author

Rebased onto current main — merges cleanly locally (fast-forward). GitHub's merge check is showing a conflict but git merge-tree confirms no actual conflicts in the changed files (bin/lib/onboard.js, bin/lib/registry.js, test/registry.test.js).

Happy to re-do anything if needed on your end.

craigamcw and others added 2 commits April 1, 2026 07:52
…oard

Implemented feature with help from Claude Code

When onboard detects a stale gateway or starts a fresh one, it runs
`openshell gateway destroy` which deletes all sandboxes in OpenShell.
However the local NemoClaw registry (~/.nemoclaw/sandboxes.json) was
not cleared, leaving stale entries that caused `nemoclaw list` to show
sandboxes that no longer exist and `nemoclaw <name> connect` to fail
with "sandbox not found".

- Add `clearAll()` to registry module to reset all sandbox entries.
- Call `registry.clearAll()` after gateway destroy in both preflight
  cleanup and startGateway (covers all destroy paths).
- Add 3 tests for clearAll: multi-sandbox clear, disk persistence, and
  idempotent call on empty registry.

Fixes NVIDIA#532

Signed-off-by: Craig <craig@epic28.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@craigamcw craigamcw force-pushed the fix/clear-registry-on-gateway-destroy-532 branch from 5dc2102 to 5b21901 Compare April 1, 2026 06:53
@craigamcw
Copy link
Copy Markdown
Contributor Author

@wscurran Thanks for the feedback! Just rebased onto current main — everything merges cleanly. When you get a chance, would you mind giving it a formal review?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
bin/lib/registry.js (1)

240-243: Missing lock acquisition may cause race condition with concurrent registry operations.

The clearAll() function directly calls save() without using withLock(), unlike all other mutating functions in this module (registerSandbox, updateSandbox, removeSandbox, setDefault). If another process performs a read-modify-write operation concurrently, the clear could be silently lost.

In practice, clearAll() is only called during gateway destruction in onboarding, where concurrent modifications are unlikely. However, for consistency and defensive coding, wrapping the call in withLock() is safer.

♻️ Proposed fix to add lock
 function clearAll() {
-  save({ sandboxes: {}, defaultSandbox: null });
+  withLock(() => {
+    save({ sandboxes: {}, defaultSandbox: null });
+  });
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/registry.js` around lines 240 - 243, The clearAll() function calls
save() directly and should acquire the same registry lock as other mutating
functions to avoid races; update clearAll to call withLock() and perform save({
sandboxes: {}, defaultSandbox: null }) inside the locked callback (mirroring
registerSandbox/updateSandbox/removeSandbox/setDefault patterns) so the registry
mutation is atomic under the same lock used by other operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@bin/lib/registry.js`:
- Around line 240-243: The clearAll() function calls save() directly and should
acquire the same registry lock as other mutating functions to avoid races;
update clearAll to call withLock() and perform save({ sandboxes: {},
defaultSandbox: null }) inside the locked callback (mirroring
registerSandbox/updateSandbox/removeSandbox/setDefault patterns) so the registry
mutation is atomic under the same lock used by other operations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7aa1efdf-2cab-4702-923e-a73d12c14e85

📥 Commits

Reviewing files that changed from the base of the PR and between 5dc2102 and 5b21901.

📒 Files selected for processing (3)
  • bin/lib/onboard.js
  • bin/lib/registry.js
  • test/registry.test.js
✅ Files skipped from review due to trivial changes (1)
  • test/registry.test.js

…tations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cv added a commit that referenced this pull request Apr 1, 2026
…ript modules (#1240)

## Summary

- Extract ~210 lines of pure, side-effect-free functions from the
3,800-line `onboard.js` into **5 typed TypeScript modules** under
`src/lib/`:
- `gateway-state.ts` — gateway/sandbox state classification from
openshell output
- `validation.ts` — failure classification, API key validation, model ID
checks
  - `url-utils.ts` — URL normalization, text compaction, env formatting
  - `build-context.ts` — Docker build context filtering, recovery hints
  - `dashboard.ts` — dashboard URL resolution and construction
- Add **56 co-located unit tests** (`src/lib/*.test.ts`) for the
extracted modules
- Set up CLI TypeScript compilation: `tsconfig.src.json` compiles `src/`
→ `dist/` as CJS
- `onboard.js` imports from compiled `dist/lib/` output — transparent to
callers
- Pre-commit hook updated to build TS and include `dist/lib/` in
coverage

These functions are **not touched by any #924 blocker PR** (#781, #782,
#819, #672, #634, #890), so this extraction is safe to land immediately.

## Test plan

- [x] 598 CLI tests pass (542 existing + 56 new)
- [x] `tsc -p tsconfig.src.json` compiles cleanly
- [x] `tsc -p tsconfig.cli.json` type-checks cleanly
- [x] `tsc -p jsconfig.json` type-checks cleanly
- [x] Coverage ratchet passes with `dist/lib/` included

Closes #1237. Relates to #924 (shell consolidation).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Improved sandbox-creation recovery hints and targeted remediation
commands.
  * Smarter dashboard URL resolution and control-UI URL construction.

* **Bug Fixes**
  * More accurate gateway and sandbox state detection.
* Enhanced classification of validation/apply failures and safer
model/key validation.
  * Better provider URL normalization and loopback handling.

* **Tests**
  * Added comprehensive tests covering new utilities.

* **Chores**
* CLI now builds before CLI tests; CI/commit hooks updated to run the
CLI build.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
laitingsheng pushed a commit that referenced this pull request Apr 2, 2026
…ript modules (#1240)

## Summary

- Extract ~210 lines of pure, side-effect-free functions from the
3,800-line `onboard.js` into **5 typed TypeScript modules** under
`src/lib/`:
- `gateway-state.ts` — gateway/sandbox state classification from
openshell output
- `validation.ts` — failure classification, API key validation, model ID
checks
  - `url-utils.ts` — URL normalization, text compaction, env formatting
  - `build-context.ts` — Docker build context filtering, recovery hints
  - `dashboard.ts` — dashboard URL resolution and construction
- Add **56 co-located unit tests** (`src/lib/*.test.ts`) for the
extracted modules
- Set up CLI TypeScript compilation: `tsconfig.src.json` compiles `src/`
→ `dist/` as CJS
- `onboard.js` imports from compiled `dist/lib/` output — transparent to
callers
- Pre-commit hook updated to build TS and include `dist/lib/` in
coverage

These functions are **not touched by any #924 blocker PR** (#781, #782,
#819, #672, #634, #890), so this extraction is safe to land immediately.

## Test plan

- [x] 598 CLI tests pass (542 existing + 56 new)
- [x] `tsc -p tsconfig.src.json` compiles cleanly
- [x] `tsc -p tsconfig.cli.json` type-checks cleanly
- [x] `tsc -p jsconfig.json` type-checks cleanly
- [x] Coverage ratchet passes with `dist/lib/` included

Closes #1237. Relates to #924 (shell consolidation).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Improved sandbox-creation recovery hints and targeted remediation
commands.
  * Smarter dashboard URL resolution and control-UI URL construction.

* **Bug Fixes**
  * More accurate gateway and sandbox state detection.
* Enhanced classification of validation/apply failures and safer
model/key validation.
  * Better provider URL normalization and loopback handling.

* **Tests**
  * Added comprehensive tests covering new utilities.

* **Chores**
* CLI now builds before CLI tests; CI/commit hooks updated to run the
CLI build.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cv
Copy link
Copy Markdown
Contributor

cv commented Apr 2, 2026

Closing in favor of #1245, which reimplements this fix on top of current main. The original approach here has gone stale — #1245 covers the same registry cleanup with CodeRabbit feedback addressed. Thanks @craigamcw for the original work.

@cv cv closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants