Skip to content

fix(onboard): robustify health checks and wait for dashboard readiness#183

Merged
cv merged 2 commits intoNVIDIA:mainfrom
dumko2001:fix/h13-sandbox-health-race
Mar 30, 2026
Merged

fix(onboard): robustify health checks and wait for dashboard readiness#183
cv merged 2 commits intoNVIDIA:mainfrom
dumko2001:fix/h13-sandbox-health-race

Conversation

@dumko2001
Copy link
Copy Markdown
Contributor

@dumko2001 dumko2001 commented Mar 17, 2026

Rationale

The onboarding process sometimes proceeded to the next step before the dashboard service was fully ready, leading to potential connection failures.

Changes

Added a robust retry loop that uses curl to wait for the dashboard to become responsive before continuing the setup.

Verification Results

  • Automated Tests: Passed all 52 core tests via npm test.
  • Manual Audit: Verified that onboarding waits correctly for the dashboard.
  • Security Review: Verified no sensitive data leaks and correct permission enforcement.

Leading Standards

This PR follows the project's 'First Principles' approach, prioritizing deterministic behavior and zero-trust security defaults.

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced sandbox initialization to verify dashboard services are fully operational before proceeding, improving startup reliability.
    • Increased gateway runtime recovery polling attempts for improved fault tolerance during service recovery.
  • Chores

    • Optimized sandbox setup code structure.

@wscurran wscurran added Getting Started Use this label to identify setup, installation, or onboarding issues. enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. labels Mar 19, 2026
@cv
Copy link
Copy Markdown
Contributor

cv commented Mar 21, 2026

@dumko2001 — robustifying health checks and waiting for dashboard readiness sounds like it would fix some real pain points. This needs a rebase onto the latest main before we can review though. The codebase has evolved quite a bit. Thanks for the contribution!

@dumko2001 dumko2001 force-pushed the fix/h13-sandbox-health-race branch from 304fb65 to 0437aac Compare March 21, 2026 21:42
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 974e5ef9-72f1-42c3-9b91-2fea8c7c0629

📥 Commits

Reviewing files that changed from the base of the PR and between 19978a3 and 475253e.

📒 Files selected for processing (1)
  • bin/lib/onboard.js

📝 Walkthrough

Walkthrough

Updated gateway runtime recovery polling to 10 attempts and replaced inline sleep calls with existing helper function. Added new polling loop to wait for NemoClaw dashboard web server readiness via curl checks during sandbox creation.

Changes

Cohort / File(s) Summary
Gateway Recovery & Sandbox Initialization
bin/lib/onboard.js
Increased recoverGatewayRuntime() polling from 5 to 10 attempts; consolidated sleep implementation by replacing inline spawnSync("sleep", ["2"]) with sleep(2) helper; added new post-Ready polling loop (15 attempts) to verify NemoClaw dashboard web server liveness at http://localhost:18789/ using curl.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Ten times now the gateway shall try,
With sleep consolidated—no need to cry,
And when the sandbox springs to life so spry,
The dashboard awaits, we check with a sigh,
Fifteen curl attempts beneath the digital sky! 🚀

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@dumko2001 dumko2001 force-pushed the fix/h13-sandbox-health-race branch from 0437aac to feeed0a Compare March 21, 2026 21:44
@dumko2001
Copy link
Copy Markdown
Contributor Author

@cv hit the rate limit

mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
…atch a known type (NVIDIA#183)

When --provider <name> is passed and no provider with that name exists on
the server, the CLI now checks if the name is a recognized provider type
(e.g. claude, nvidia, github). If it is, the provider is auto-created
from local credentials using the same discovery flow as inferred providers.

If the name is not a recognized type, the CLI returns a clear error
instead of deferring validation to the server.

This eliminates the FailedPrecondition error that occurred when passing
--provider nvidia on a fresh cluster where no providers had been created
yet.
Copy link
Copy Markdown
Contributor

@cv cv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dumko2001 thanks!

@cv cv merged commit e650174 into NVIDIA:main Mar 30, 2026
1 check was pending
quanticsoul4772 pushed a commit to quanticsoul4772/NemoClaw that referenced this pull request Mar 30, 2026
laitingsheng pushed a commit that referenced this pull request Apr 2, 2026
#183)

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
lakamsani pushed a commit to lakamsani/NemoClaw that referenced this pull request Apr 4, 2026
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. Getting Started Use this label to identify setup, installation, or onboarding issues.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants