fix(cli): resolve sandbox env for global status service lookup#4756
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR integrates environment-variable based default sandbox resolution into inventory status flows. It adds ChangesDefault Sandbox Resolution Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/inventory/index.test.ts`:
- Around line 546-565: The test for SANDBOX_NAME should also isolate
higher-precedence environment variables NEMOCLAW_SANDBOX_NAME and
NEMOCLAW_SANDBOX to avoid flakiness: in the test wrapping the call to
showStatusCommand (the "resolves service status sandbox from SANDBOX_NAME env"
case), save current values of process.env.NEMOCLAW_SANDBOX_NAME and
process.env.NEMOCLAW_SANDBOX before setting process.env.SANDBOX_NAME, then clear
or delete those NemoClaw variables for the duration of the test, call
showStatusCommand as before, and finally restore the original
NEMOCLAW_SANDBOX_NAME and NEMOCLAW_SANDBOX values in the same finally block
along with SANDBOX_NAME to ensure the environment is fully restored.
In `@src/lib/inventory/index.ts`:
- Around line 403-404: The code computes resolvedDefault via
resolveDefaultSandboxName(deps.listSandboxes) but later uses raw defaultSandbox
for messaging health/logs, causing inconsistent sandbox selection; update all
code paths in showStatusCommand and the related health/logs branches (where
defaultSandbox is used — e.g., the blocks handling messaging health/logs and the
sandboxes length branch) to use resolvedDefault instead of defaultSandbox so the
same resolved sandbox is used for both table/service output and health/logs;
ensure any variable passed into functions that fetch health or logs (messaging
health, messaging logs) receives resolvedDefault and keep
resolveDefaultSandboxName(deps.listSandboxes) as the single source of truth.
- Around line 365-367: The destructured variable defaultSandbox in
getStatusReport is unused; update the destructuring of deps.listSandboxes() to
either remove defaultSandbox or rename it to _defaultSandbox to satisfy lint
rules, ensuring resolvedDefault continues to use
resolveDefaultSandboxName(deps.listSandboxes) and liveInference calculation
(sandboxes/getLiveInference) remains unchanged; adjust the const line that
currently declares { sandboxes, defaultSandbox } accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e8a8385b-b28a-457b-9889-3b71eb04cbd2
📒 Files selected for processing (2)
src/lib/inventory/index.test.tssrc/lib/inventory/index.ts
Manual repro (DGX / WSL — matches #1077 reports)
* On builds after host telegram-bridge removal (PR #1081), status host lines are cloudflared-only; the PID-dir bug class is the same. Out of scope for this PR: early |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/tunnel/services.test.ts`:
- Around line 149-159: The tests fail when an external NEMOCLAW_SANDBOX env var
interferes; update the suite to snapshot and restore the actual NEMOCLAW_SANDBOX
env var (not just NEMOCLAW_SANDBOX_NAME) so resolveDefaultSandboxName() uses the
expected env during tests: create a savedNemoclawSandbox (or extend
savedNemoclawSandboxName) variable at top, clear process.env.NEMOCLAW_SANDBOX in
the test setup/teardown and restore it in afterEach (mirror how savedSandboxName
/ process.env.SANDBOX_NAME are handled), and apply the same change in the other
affected blocks (the other afterEach/beforeEach locations referenced) so tests
consistently control NEMOCLAW_SANDBOX.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6a001269-81c0-45dd-becc-10c5739d3b8c
📒 Files selected for processing (1)
src/lib/tunnel/services.test.ts
Align nemoclaw status with start/stop by using resolveDefaultSandboxName so host service PID dirs match SANDBOX_NAME and NEMOCLAW_SANDBOX_NAME. Fixes NVIDIA#1077 Signed-off-by: Thabhelo <50872400+Thabhelo@users.noreply.github.com>
Exercise real getServiceStatuses/showStatus against /tmp/nemoclaw-services-* dirs to prove status must pass env-resolved sandbox names, not registry only. Signed-off-by: Thabhelo <50872400+Thabhelo@users.noreply.github.com>
Use resolvedDefault for messaging health/logs, harden env isolation in tests, and gitignore local contribution-research notes. Signed-off-by: Thabhelo <50872400+Thabhelo@users.noreply.github.com>
9ef88e6 to
2e45d41
Compare
Fixes strict typecheck failure in CI checks job. Signed-off-by: Thabhelo <50872400+Thabhelo@users.noreply.github.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
|
✨ Thanks for submitting this detailed PR about fixing the sandbox env resolution for global status service lookup, which improves the consistency of the status and health flows. This proposes a bug fix in the CLI that affects the sandbox management and error reporting. Related open issues: |
prekshivyas
left a comment
There was a problem hiding this comment.
Approving. Verified the fix resolves #1077 correctly:
- Both status paths (
getStatusReport,showStatusCommand) now useresolveDefaultSandboxNameconsistently — including the messaging-bridge-health and gateway-log calls — and it's the same resolverrunStartCommand/runStopCommanduse (NEMOCLAW_SANDBOX_NAME ?? NEMOCLAW_SANDBOX ?? SANDBOX_NAME, regex-validated, else the registry default). So globalstatus/status --jsonnow honor the env override exactly like start/stop, and the no-env case is unchanged (just hardened with the safe-name check).
A couple of non-blocking notes:
- The remaining raw-default use is in
getSandboxInventory(line ~202), which backsnemoclaw list— a separate command fromstatus. Leavingliston the registry default looks like an intentional scope boundary; flag only so it's a conscious decision. (This is what CodeRabbit's "consistency across status paths" comment is pointing at.) - The new tests (
index.test.ts:645,services.test.ts:162) don't snapshot/clear the higher-precedence env vars, so they could go flaky on a runner whereNEMOCLAW_SANDBOX/NEMOCLAW_SANDBOX_NAMEis set. CI is green today; worth abeforeEach/afterEachsave-restore to harden them.
Neither blocks merge. Nice, well-scoped fix.
|
Thanks @prekshivyas for the review and approval, glad the fix lines up with #1077. I've noted the two non-blocking items. Will address after this is merged. |
## Summary Follow-up to #4756 / #1077: `nemoclaw list` and inventory JSON now resolve the default sandbox from `SANDBOX_NAME` / `NEMOCLAW_SANDBOX_NAME` / `NEMOCLAW_SANDBOX`, matching `status`, `start`, and `stop`. Also consolidates #1077 test env isolation into shared `beforeEach`/`afterEach` hooks. ## Related Issue Related to #1077 (follow-up to merged #4756) ## Changes - Wire `resolveDefaultSandboxName()` into `getSandboxInventory()` for `defaultSandbox` and `isDefault` rows - Add list/inventory tests for env-resolved default sandbox marker (`*`) - Refactor `#1077` inventory tests into a nested describe with shared env save/restore - Add `beforeEach` env clearing in `services.test.ts` `#1077` describe block ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [x] `npm test` passes (targeted: `src/lib/inventory/index.test.ts`, `src/lib/tunnel/services.test.ts`, including `-t '#1077'`) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Thabhelo <50872400+Thabhelo@users.noreply.github.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Improved test organization and environment variable cleanup for better test reliability. * **Refactor** * Enhanced internal sandbox resolution logic for improved consistency. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Thabhelo <50872400+Thabhelo@users.noreply.github.com>
## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
Global
nemoclaw statusandstatus --jsonnow resolve the target sandbox the same way asnemoclaw start/stop, so host service PID lookup honorsSANDBOX_NAME,NEMOCLAW_SANDBOX, andNEMOCLAW_SANDBOX_NAMEwhen set.Fixes #1077
Changes
resolveDefaultSandboxNameinshowStatusCommandandgetStatusReportservices.test.tsuse real PID dirs under/tmp/nemoclaw-services-*Test plan
npm test -- src/lib/inventory/index.test.tsnpm test -- src/lib/tunnel/services.test.ts -t '#1077'Signed-off-by: Thabhelo 50872400+Thabhelo@users.noreply.github.com
Summary by CodeRabbit
Bug Fixes
Tests