Skip to content

Onboarding hangs after sandbox creation on Fedora — sandboxes.json never written #899

@vignesh-kumar-v

Description

@vignesh-kumar-v

Description

What happened:
Running nemoclaw onboard on Fedora hangs indefinitely after the sandbox image is successfully pushed to the gateway. The CLI prints Image is available in the gateway but never proceeds to steps 6/7, never prints a success message, and never writes the sandbox entry to ~/.nemoclaw/sandboxes.json. The sandbox itself is fully healthy (confirmed via kubectl get pods showing 1/1 Running), but the NemoClaw CLI wrapper is unaware of it.

A secondary issue: Docker on Fedora defaults to the overlayfs storage driver instead of overlay2, which causes slow and eventually stalled image pushes during sandbox creation.

What was expected:
Onboarding should complete all 7 steps, print a success message, and register the sandbox in ~/.nemoclaw/sandboxes.json so that nemoclaw list and nemoclaw connect work correctly.

nemoclaw-debug.tar.gz

Reproduction Steps

  1. Install NemoClaw on Fedora via curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
  2. Run nemoclaw onboard, select NVIDIA Endpoints, select Nemotron 3 Super 120B, name the sandbox
  3. Observe that after Image is available in the gateway is printed, the CLI hangs indefinitely (verified for 1.5+ hours)
  4. In a second terminal, run docker exec openshell-cluster-nemoclaw kubectl get pods --all-namespaces — sandbox shows 1/1 Running
  5. Run cat ~/.nemoclaw/sandboxes.json — shows {"sandboxes": {}, "defaultSandbox": null}
  6. Run nemoclaw list — shows No sandboxes registered

Workaround:

  • Switch Docker storage driver from overlayfs to overlay2 by editing /etc/docker/daemon.json
  • Manually write the sandbox entry to ~/.nemoclaw/sandboxes.json with the correct name, gateway, provider, model, and policies

Environment

  1. OS: Fedora Linux (kernel confirmed running, SELinux active)
  2. NemoClaw version: v0.1.0
  3. OpenShell CLI version: openshell 0.0.10
  4. Docker version: Docker version 29.3.0, build 5927d80
  5. Node.js version: v22.22.1
  6. GPU: NVIDIA GPU, 8151 MB VRAM
  7. RAM: 30.73 GB
  8. Docker storage driver (default): overlayfs — had to manually switch to overlay2

Debug Output

"nemoclaw debug --output nemoclaw-debug.tar.gz" Has been attached in the description.


---

### Logs

Key finding from `docker logs openshell-cluster-nemoclaw` while hung — only etcd compaction loop entries, no sandbox startup activity:

time="..." level=info msg="COMPACT compactRev=276 targetCompactRev=367 currentRev=1276"
time="..." level=info msg="COMPACT deleted 18 rows from 91 revisions in 1.298802ms"


Key finding from `docker exec openshell-cluster-nemoclaw kubectl describe pod fedora-claw -n openshell` while CLI was hung:

Status:   Running
Ready:    True
Events:   Container started successfully

Logs

Checklist

  • I confirmed this bug is reproducible
  • I searched existing issues and this is not a duplicate

Metadata

Metadata

Assignees

No one assigned

    Labels

    Getting StartedUse this label to identify setup, installation, or onboarding issues.bugSomething isn't workingenhancement: platformRequest for support on other platforms.status: triageFor new items that haven't been reviewed yet.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions