fix(cli): restore pre-#2398 gateway recovery (fixes E2E hangs)#2471
fix(cli): restore pre-#2398 gateway recovery (fixes E2E hangs)#2471
Conversation
Reverts the nemoclaw.ts changes from #2398 (dashboard delivery chain refactor) which introduced hangs in `nemoclaw status` that cause sandbox-survival, skip-permissions, and sandbox-operations E2E failures. Restores the original implementations of: - isSandboxGatewayRunning (probes via curl -sf, not http_code) - recoverSandboxProcesses (direct SSH recovery, no CORS/download) - ensureSandboxPortForward (simple forward stop/start) - checkAndRecoverSandboxProcesses (uses the above, no dashboard chain) The dashboard-contract, dashboard-health, and dashboard-recover modules from #2398 are left in place since onboard.ts depends on them. Only the nemoclaw.ts consumer (status/recovery path) is reverted. Bisect evidence confirmed #2398 as sole culprit (run 24921496960).
📝 WalkthroughWalkthroughThe sandbox gateway health check mechanism is refactored to probe a health endpoint and interpret explicit Changes
Sequence Diagram(s)sequenceDiagram
participant Host as Host Process
participant Check as checkAndRecover<br/>SandboxProcesses
participant Health as isSandboxGateway<br/>Running
participant Recover as recoverSandbox<br/>Processes
participant Sandbox as Sandbox<br/>Environment
participant Gateway as Gateway HTTP<br/>Endpoint
participant Forward as ensureSandbox<br/>PortForward
Host->>Check: Initiate recovery check
Check->>Health: Probe gateway health
Health->>Gateway: curl -sf (health endpoint)
Gateway-->>Health: Response (RUNNING/STOPPED)
Health-->>Check: Gateway status
alt Gateway is DOWN
Check->>Recover: Trigger recovery
Recover->>Sandbox: Execute recovery script<br/>(agent-provided or fallback)
Sandbox->>Sandbox: Clean lock/log files
Sandbox->>Sandbox: Launch: openclaw gateway run
Recover-->>Check: Recovery complete
Check->>Health: Re-validate gateway
Health->>Gateway: curl -sf (health endpoint)
Gateway-->>Health: RUNNING
Health-->>Check: Gateway restored
Check->>Forward: Re-establish port forward
Forward->>Host: host → sandbox forward
Forward-->>Check: Forward active
else Gateway is RUNNING
Check-->>Host: No recovery needed
end
Check->>Host: Emit completion log
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Summary
nemoclaw.tschanges from refactor(cli): extract dashboard delivery chain into contract/health/recover modules #2398 that causednemoclaw statusto hang indefinitelyisSandboxGatewayRunning,recoverSandboxProcesses,ensureSandboxPortForward, andcheckAndRecoverSandboxProcessesimplementationsRoot cause
PR #2398 replaced the simple gateway recovery path with
recoverDashboardChain()which introduced unbounded calls that hang in CI. The bisect confirmed #2398 as the sole culprit:de97a00d(Apr 24 16:06)9fbfbaca(#2398 only)79c8e2a9(Apr 25 00:10)sandbox-survival, skip-permissions, sandbox-operations, and cloud-e2e all hang at
nemoclaw <name> statusafter #2398.What this reopens
curl -sf /probe returns failure on 401. This needs a targeted fix, not a refactor.Test plan
npx tsc -p tsconfig.src.json --noEmitpassesSummary by CodeRabbit