Skip to content

fix: skip gateway marker for docker-driver sandboxes#4748

Merged
cv merged 3 commits into
NVIDIA:mainfrom
glenn-agent:fix/4710-docker-driver-healthcheck-marker
Jun 5, 2026
Merged

fix: skip gateway marker for docker-driver sandboxes#4748
cv merged 3 commits into
NVIDIA:mainfrom
glenn-agent:fix/4710-docker-driver-healthcheck-marker

Conversation

@glenn-agent
Copy link
Copy Markdown
Contributor

@glenn-agent glenn-agent commented Jun 4, 2026

Summary

Fixes #4710.

Validation

  • npm test -- --run test/nemoclaw-start.test.ts -t "gateway healthcheck marker"
  • git diff --check
  • git verify-commit HEAD

Note: running the full test/nemoclaw-start.test.ts file locally currently fails in unrelated baseline fixture cases because nemoclaw/node_modules/json5 is absent in this checkout; the targeted marker regression passes.

Signed-off-by: Glenn-Agent glenn_agent@163.com

Summary by CodeRabbit

Bug Fixes

  • Fixed gateway healthcheck behavior to correctly handle different driver modes, ensuring proper health status detection when running under Docker drivers versus standard configurations.

Signed-off-by: Glenn-Agent <glenn_agent@163.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

Warning

Review limit reached

@cv, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 7 minutes and 12 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1be72f4d-b8b1-4906-902a-4ef7f97d1072

📥 Commits

Reviewing files that changed from the base of the PR and between 1b6b1b3 and 343ca83.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/nemoclaw-start.test.ts
📝 Walkthrough

Walkthrough

This PR fixes a bug where the /tmp/nemoclaw-gateway-local marker file is unconditionally created in Docker-driver sandboxes, preventing the Docker HEALTHCHECK from correctly detecting that the gateway runs on the host. The fix adds driver detection to gate marker creation, and extends test coverage to validate Docker-driver and mixed-driver execution paths.

Changes

Docker-driver gateway marker fix

Layer / File(s) Summary
Driver detection and marker gating logic
scripts/nemoclaw-start.sh
Detects whether the container runs under an OpenShell docker driver via OPENSHELL_DRIVERS, sets _NEMOCLAW_DOCKER_DRIVER accordingly, and gates marker creation to only proceed when NEMOCLAW_CMD is empty AND the driver is not docker.
Test refactoring and Docker-driver scenario coverage
test/nemoclaw-start.test.ts
Refactors runScenario helper to accept optional environment variables, enabling simulation of different OPENSHELL_DRIVERS values; updates test documentation to clarify Docker HEALTHCHECK behavior; adds assertions for docker and mixed docker/vm driver modes verifying marker remains absent in those paths.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4600: Both PRs change scripts/nemoclaw-start.sh's /tmp/nemoclaw-gateway-local marker emission logic, directly affecting Docker HEALTHCHECK behavior tested here.

Suggested labels

fix, Docker, Sandbox, platform: container

Suggested reviewers

  • jyaunches
  • cv

Poem

🐰 A marker file that knew no bounds,
Now checks the driver, looks around!
When docker drives, the gate stays clear,
The health check shines, the path is clear. 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main fix: preventing the gateway marker from being created in Docker-driver sandboxes, which directly addresses issue #4710.
Linked Issues check ✅ Passed Changes correctly implement the fix for #4710: detecting Docker-driver mode via OPENSHELL_DRIVERS, gating marker creation accordingly, and extending tests to verify the behavior.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing issue #4710: modifying the marker logic and updating related tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)

218-221: ⚡ Quick win

Harden docker-driver detection against whitespace/case in OPENSHELL_DRIVERS

scripts/nemoclaw-start.sh uses a raw, case-sensitive match (case ",${OPENSHELL_DRIVERS:-}," in *,docker,*) ...) with no normalization. This repo’s own/tested values are always "docker" or "vm,docker" (no spaces), so the current behavior aligns with in-repo expectations; however, if OPENSHELL_DRIVERS ever comes in as "vm, docker" or "Docker", the match will fail and marker gating could be wrong—normalize (trim, lowercase, remove whitespace) before the case.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/nemoclaw-start.sh` around lines 218 - 221, Normalize
OPENSHELL_DRIVERS before the case match so whitespace and case won’t break
detection: create a normalized var (derived from OPENSHELL_DRIVERS) that is
lowercased and has all whitespace removed or trimmed around commas, then run the
existing case on that normalized value to set _NEMOCLAW_DOCKER_DRIVER; update
the snippet that currently uses case ",${OPENSHELL_DRIVERS:-}," in to reference
the normalized variable and keep the same pattern matching (e.g., look for
",docker,") so both "vm, docker" and "Docker" will be detected correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 218-221: Normalize OPENSHELL_DRIVERS before the case match so
whitespace and case won’t break detection: create a normalized var (derived from
OPENSHELL_DRIVERS) that is lowercased and has all whitespace removed or trimmed
around commas, then run the existing case on that normalized value to set
_NEMOCLAW_DOCKER_DRIVER; update the snippet that currently uses case
",${OPENSHELL_DRIVERS:-}," in to reference the normalized variable and keep the
same pattern matching (e.g., look for ",docker,") so both "vm, docker" and
"Docker" will be detected correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 85063e10-623d-4e68-b646-0925e5e317c9

📥 Commits

Reviewing files that changed from the base of the PR and between 17734b1 and 1b6b1b3.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/nemoclaw-start.test.ts

@cv cv merged commit 3c0340a into NVIDIA:main Jun 5, 2026
17 checks passed
@wscurran wscurran added area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression labels Jun 5, 2026
@wscurran
Copy link
Copy Markdown
Contributor

wscurran commented Jun 5, 2026

✨ Thanks for submitting this detailed PR about skipping the gateway marker for Docker-driver sandboxes, which improves the healthcheck behavior in these environments. This proposes a bug fix for the sandbox that affects the gateway marker detection.


Related open issues:

miyoungc added a commit that referenced this pull request Jun 6, 2026
## Summary
- Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the
dev announcement from discussion #4877.
- Fills the source-doc gaps found during release-prep review across
inference, policy tiers, command behavior, security boundaries, Hermes
dashboard/tooling, runtime context, and troubleshooting.
- Refreshes generated agent skills under `.agents/skills/` from the
current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`.

## Source summary
- #4037 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
system-only runtime context that stays out of visible chat.
- #4875 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
try-first sandbox network/filesystem guidance and clearer failure
classification.
- #4788 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents shared OpenClaw
device-approval policy for startup and connect.
- #4768 -> `docs/reference/network-policies.mdx`,
`docs/network-policy/integration-policy-examples.mdx`,
`docs/get-started/quickstart.mdx`,
`docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`:
Documents `weather`, `public-reference`, and Hermes managed-tool gateway
preset behavior.
- #3788 and #4864 -> `docs/reference/network-policies.mdx`,
`docs/reference/commands.mdx`: Documents non-interactive policy-tier
fail-fast behavior and interactive prompt fallback.
- #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware
default sandbox resolution for `list`, `status`, and `tunnel` commands.
- #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel
status` behavior.
- #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy
preset descriptions in `policy-list`.
- #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents
package-managed OpenShell gateway service and Docker-driver
gateway-marker behavior.
- #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent
gateway/dashboard cleanup isolation by sandbox name and port.
- #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU
patch rollback behavior.
- #4610 -> `docs/reference/troubleshooting.mdx`,
`docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission
guidance aligned and removes skipped experimental wording.
- #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling
for custom `onboard --from <Dockerfile>` contexts in generated skills.
- #4870 -> `docs/reference/commands.mdx`,
`docs/manage-sandboxes/runtime-controls.mdx`: Documents
`NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage.
- #4641 -> `docs/inference/inference-options.mdx`,
`docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM
platform-digest pulls and served-model id adoption.
- #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents
stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash
coverage.
- #4852 -> `docs/inference/use-local-inference.mdx`,
`docs/reference/troubleshooting.mdx`: Documents Ollama model fit
filtering, 16K context floor, cold-load retry, and failed-model
exclusion.
- #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents
API-family sync, Hermes `api_mode`, and Bedrock Runtime exception.
- #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents
Nemotron managed-inference native tool-search fallback.
- #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents
interactive multimodal input prompting.
- #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass
normalization in generated troubleshooting coverage.
- #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents
prebuilt Hermes dashboard assets and TUI recovery without runtime
rebuilds.
- #4854 -> `docs/inference/switch-inference-providers.mdx`,
`docs/reference/commands.mdx`: Documents Hermes proxy API-key
placeholder preservation during inference switches.
- #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`.agents/skills/`: Keeps messaging enrollment behavior aligned with
manifest-hook implementation.
- #4771 -> `docs/security/best-practices.mdx`,
`docs/security/credential-storage.mdx`: Documents Hermes
placeholder-only secret boundary for sandbox-visible runtime files.
- #4787 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents expanded memory scanner
examples for OpenAI project keys and Slack app-level tokens.
- #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill
install mirroring into the agent home directory.
- #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep
structure and generated `.agents/skills/` refresh as the template for
this release.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/
--prefix nemoclaw-user --doc-platform fern-mdx --dry-run`
- `npm run docs`
- `git diff --check`
- skip-term scan across `docs/`, `.agents/skills/`, and `skills/`
- `npm run build:cli`
- `npm run typecheck:cli`
- Commit and pre-push hook suites, including markdownlint, gitleaks,
env-var docs gate, docs-to-skills verification, and skills YAML tests

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* DeepSeek-V4-Flash now available as default inference model for DGX
Station.
* Hermes dashboard improved with dedicated port and OAuth-authenticated
tool gateway selection.
* Added weather and public-reference policy presets for expanded agent
capabilities.
* Enhanced Ollama model selection with GPU memory filtering and
automatic retry for timeouts.

* **Bug Fixes**
  * Improved policy tier validation to prevent invalid configurations.
* Better sandbox cleanup scoping by port to prevent conflicts across
deployments.
  * Added GPU patch failure recovery with automatic rollback.

* **Documentation**
* Expanded troubleshooting guides for inference, security, and sandbox
lifecycle.
  * Added .dockerignore best practices for custom deployments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression v0.0.60 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Sandbox] Docker-driver HEALTHCHECK always (unhealthy) — marker file always created

3 participants