fix(onboard): robustify health checks and wait for dashboard readiness#183
fix(onboard): robustify health checks and wait for dashboard readiness#183cv merged 2 commits intoNVIDIA:mainfrom
Conversation
|
@dumko2001 — robustifying health checks and waiting for dashboard readiness sounds like it would fix some real pain points. This needs a rebase onto the latest main before we can review though. The codebase has evolved quite a bit. Thanks for the contribution! |
304fb65 to
0437aac
Compare
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughUpdated gateway runtime recovery polling to 10 attempts and replaced inline sleep calls with existing helper function. Added new polling loop to wait for NemoClaw dashboard web server readiness via curl checks during sandbox creation. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
0437aac to
feeed0a
Compare
|
@cv hit the rate limit |
…atch a known type (NVIDIA#183) When --provider <name> is passed and no provider with that name exists on the server, the CLI now checks if the name is a recognized provider type (e.g. claude, nvidia, github). If it is, the provider is auto-created from local credentials using the same discovery flow as inferred providers. If the name is not a recognized type, the CLI returns a clear error instead of deferring validation to the server. This eliminates the FailedPrecondition error that occurred when passing --provider nvidia on a fresh cluster where no providers had been created yet.
NVIDIA#183) Co-authored-by: Carlos Villela <cvillela@nvidia.com>
#183) Co-authored-by: Carlos Villela <cvillela@nvidia.com>
NVIDIA#183) Co-authored-by: Carlos Villela <cvillela@nvidia.com>
NVIDIA#183) Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Rationale
The onboarding process sometimes proceeded to the next step before the dashboard service was fully ready, leading to potential connection failures.
Changes
Added a robust retry loop that uses
curlto wait for the dashboard to become responsive before continuing the setup.Verification Results
npm test.Leading Standards
This PR follows the project's 'First Principles' approach, prioritizing deterministic behavior and zero-trust security defaults.
Summary by CodeRabbit
Bug Fixes
Chores