Agent Diagnostic
Loaded debug-openshell-cluster skill from NVIDIA/OpenShell.
Step 1 — CLI reachability:
$ openshell gateway info
Gateway: nemoclaw → http://127.0.0.1:8080
$ openshell status
Status: Connected Version: 0.0.39
Gateway is healthy. Failure is in sandbox creation, not gateway reach.
Step 2 — Compute platform: VM driver (libkrun + Apple Virtualization.framework
on Apple Silicon). Confirmed via which openshell-driver-vm and
openshell-driver-vm --version → 0.0.39.
Step 6 — VM-backed gateway checks:
- openshell-driver-vm process: spawned per gateway log, launcher_pid=74518.
- Rootfs: extracted on host at
~/.local/state/nemoclaw/openshell-docker-gateway/vm-driver/sandboxes/
1fc9c060-3e28-4b1d-ab24-828c966edb96/rootfs/ — owned by host uid 502.
- Host virtualization: Apple M4, Virtualization.framework + libkrun.
- Supervisor callback: never reached — PID 1 exits at 0.079s before the
supervisor starts.
Sandbox console (rootfs-console.log) shows init aborting:
[0.039s] detected eth0 (gvproxy networking)
[0.047s] no DHCP client, using static config
[0.079s] fixing /sandbox ownership (was uid=502, setting to sandbox=998:998)
chown: changing ownership of '/sandbox/.bashrc': Permission denied
chown: changing ownership of '/sandbox/.profile': Permission denied
Gateway log loops every ~60s until consumer timeout:
openshell_server::compute: Sandbox failed to become ready
sandbox_id=1fc9c060-3e28-4b1d-ab24-828c966edb96
reason=ProcessExited VM process exited with status 1
(repeated 8× at ~60s intervals)
Tried locally:
- openshell-driver-vm --help — no flag to skip the chown step
(only --vcpus / --mem-mib / --gpu / --bind-address / --state-dir / TLS).
- Located the offending step in source:
crates/openshell-driver-vm/scripts/openshell-vm-sandbox-init.sh:398-410
# Fix /sandbox ownership. The host-side CLI extracts OCI layers as a
# non-root user (e.g. UID 501 on macOS), so /sandbox may be owned by
# the host UID.
if [ -d /sandbox ]; then
_sb_uid=$(id -u sandbox 2>/dev/null || true)
_sb_gid=$(id -g sandbox 2>/dev/null || true)
if [ -n "$_sb_uid" ] && [ -n "$_sb_gid" ]; then
_cur_uid=$(stat -c '%u' /sandbox 2>/dev/null || true)
if [ -n "$_cur_uid" ] && [ "$_cur_uid" != "$_sb_uid" ]; then
ts "fixing /sandbox ownership (was uid=${_cur_uid}, ...)"
chown -R "${_sb_uid}:${_sb_gid}" /sandbox
fi
fi
fi
Script shebang/header (line 1, 10): `#!/bin/bash` and `set -euo pipefail`.
The chown's non-zero return aborts PID 1 → VM exits status 1.
- Confirmed /sandbox is shared into the guest via virtio-fs:
crates/openshell-driver-vm/src/runtime.rs:21 comment
"gvproxy for libkrun, virtiofsd for QEMU".
- For comparison, the Linux Docker driver does NOT hit this path because
the in-sandbox PID 1 there is the Rust `openshell-sandbox` supervisor,
bind-mounted into the container, which handles host-uid extractions
correctly. Only the VM driver uses the bash init script with `chown -R`.
Why my agent could not resolve it: the failing code is in the upstream init
script bundled with openshell-driver-vm. There is no env var or flag to
disable it, and no consumer-side patch (image Dockerfile, build args) can
prevent the chown from running — virtio-fs reflects host uids back into the
guest regardless of what the image's chown layers set.
Description
On macOS arm64 using openshell-driver-vm 0.0.39 (libkrun backend), the VM
sandbox-init script runs chown -R sandbox:sandbox /sandbox against a
virtio-fs-shared rootfs. virtio-fs returns EACCES for ownership changes to
a uid (998) that does not exist on the host, even when the in-guest caller
is root. Combined with set -euo pipefail at the top of the script, the
chown failure kills PID 1 at 0.079s and the VM exits with status 1.
The gateway then reports "Sandbox failed to become ready
reason=ProcessExited VM process exited with status 1" every ~60s. The
sandbox never reaches Ready and the calling project (NemoClaw onboard)
times out and tears the sandbox down.
Expected: sandbox-init tolerates virtio-fs ownership semantics on macOS
(detect virtio-fs and skip the chown, or treat EACCES as a warning, or
use a uid-mapped virtio-fs mount option). The Rust openshell-sandbox
supervisor already handles host-uid extractions correctly on the Linux
Docker driver path — porting that strategy to the VM init script would
fix this.
Reproduction Steps
- Apple Silicon macOS host (tested on M4, Darwin 24.1.0).
- Install OpenShell 0.0.39 (openshell, openshell-gateway, openshell-driver-vm).
- Start the standalone gateway (e.g. NemoClaw onboard does this; or
openshell-gateway directly).
- Create any sandbox whose image has
chown root:root /sandbox/.bashrc /sandbox/.profile in a build layer (e.g. NemoClaw's Dockerfile.base):
openshell sandbox create --image
or
nemoclaw onboard --non-interactive
- Gateway calls CreateSandbox; openshell-driver-vm extracts rootfs to a
host-side directory as the macOS user, then boots libkrun and shares
the rootfs via virtio-fs.
- Inside the guest, /init runs and at 0.079s prints:
fixing /sandbox ownership (was uid=502, setting to sandbox=998:998)
chown: changing ownership of '/sandbox/.bashrc': Permission denied
then exits status 1.
- Sandbox phase: Provisioning → Error. Subsequent restarts repeat the same
failure. Sandbox never reaches Ready.
Environment
- OS: macOS 15.x (Darwin 24.1.0)
- Hardware: Apple Silicon M4, 24 GiB unified memory
- openshell: 0.0.39 (~/.local/bin/openshell)
- openshell-gateway: 0.0.39 (~/.local/bin/openshell-gateway)
- openshell-driver-vm: 0.0.39 (~/.local/bin/openshell-driver-vm)
- openshell-sandbox: not installed (Linux-only — expected on macOS)
- Docker: Colima (used by consumer project for image build only; VM runs
via libkrun directly on host, Colima is not involved at sandbox runtime)
- Consumer project (for repro): NVIDIA/NemoClaw
main running
nemoclaw onboard --non-interactive
Logs
# rootfs-console.log (in-guest PID 1 output)
[0.039s] detected eth0 (gvproxy networking)
[0.047s] no DHCP client, using static config
[0.079s] fixing /sandbox ownership (was uid=502, setting to sandbox=998:998)
chown: changing ownership of '/sandbox/.bashrc': Permission denied
chown: changing ownership of '/sandbox/.profile': Permission denied
# Gateway log excerpts (relevant frames only)
2026-05-13T05:00:08.260877Z INFO openshell_driver_vm::driver: vm driver: create_sandbox received
sandbox_id=1fc9c060-3e28-4b1d-ab24-828c966edb96 sandbox_name=my-assistant
2026-05-13T05:01:10.117964Z INFO openshell_driver_vm::driver: vm driver: rootfs prepared
image_identity=sha256:f6917eb1f12be758d3b74f01fe2192a6dee1a5e3444a1abf2f786aee1987bf36
2026-05-13T05:01:10.118445Z INFO openshell_driver_vm::driver: vm driver: spawning VM launcher
launcher=/Users/sdang/.local/bin/openshell-driver-vm
2026-05-13T05:01:15.914544Z INFO openshell_server::compute: Sandbox phase changed
old_phase=Provisioning new_phase=Error
2026-05-13T05:01:15.914591Z WARN openshell_server::compute: Sandbox failed to become ready
sandbox_id=1fc9c060-3e28-4b1d-ab24-828c966edb96 reason=ProcessExited VM process exited with status 1
# (above WARN repeats 7 more times at ~60s intervals until consumer torn down)
Agent-First Checklist
Agent Diagnostic
Loaded
debug-openshell-clusterskill from NVIDIA/OpenShell.Step 1 — CLI reachability:
$ openshell gateway info
Gateway: nemoclaw → http://127.0.0.1:8080
$ openshell status
Status: Connected Version: 0.0.39
Gateway is healthy. Failure is in sandbox creation, not gateway reach.
Step 2 — Compute platform: VM driver (libkrun + Apple Virtualization.framework
on Apple Silicon). Confirmed via
which openshell-driver-vmandopenshell-driver-vm --version→ 0.0.39.Step 6 — VM-backed gateway checks:
- openshell-driver-vm process: spawned per gateway log, launcher_pid=74518.
- Rootfs: extracted on host at
~/.local/state/nemoclaw/openshell-docker-gateway/vm-driver/sandboxes/
1fc9c060-3e28-4b1d-ab24-828c966edb96/rootfs/ — owned by host uid 502.
- Host virtualization: Apple M4, Virtualization.framework + libkrun.
- Supervisor callback: never reached — PID 1 exits at 0.079s before the
supervisor starts.
Sandbox console (rootfs-console.log) shows init aborting:
Gateway log loops every ~60s until consumer timeout:
Tried locally:
-
openshell-driver-vm --help— no flag to skip the chown step(only --vcpus / --mem-mib / --gpu / --bind-address / --state-dir / TLS).
- Located the offending step in source:
crates/openshell-driver-vm/scripts/openshell-vm-sandbox-init.sh:398-410
Why my agent could not resolve it: the failing code is in the upstream init
script bundled with
openshell-driver-vm. There is no env var or flag todisable it, and no consumer-side patch (image Dockerfile, build args) can
prevent the chown from running — virtio-fs reflects host uids back into the
guest regardless of what the image's chown layers set.
Description
On macOS arm64 using openshell-driver-vm 0.0.39 (libkrun backend), the VM
sandbox-init script runs
chown -R sandbox:sandbox /sandboxagainst avirtio-fs-shared rootfs. virtio-fs returns EACCES for ownership changes to
a uid (998) that does not exist on the host, even when the in-guest caller
is root. Combined with
set -euo pipefailat the top of the script, thechown failure kills PID 1 at 0.079s and the VM exits with status 1.
The gateway then reports "Sandbox failed to become ready
reason=ProcessExited VM process exited with status 1" every ~60s. The
sandbox never reaches Ready and the calling project (NemoClaw onboard)
times out and tears the sandbox down.
Expected: sandbox-init tolerates virtio-fs ownership semantics on macOS
(detect virtio-fs and skip the chown, or treat EACCES as a warning, or
use a uid-mapped virtio-fs mount option). The Rust
openshell-sandboxsupervisor already handles host-uid extractions correctly on the Linux
Docker driver path — porting that strategy to the VM init script would
fix this.
Reproduction Steps
openshell-gatewaydirectly).chown root:root /sandbox/.bashrc /sandbox/.profilein a build layer (e.g. NemoClaw'sDockerfile.base):openshell sandbox create --image
or
nemoclaw onboard --non-interactive
host-side directory as the macOS user, then boots libkrun and shares
the rootfs via virtio-fs.
fixing /sandbox ownership (was uid=502, setting to sandbox=998:998)
chown: changing ownership of '/sandbox/.bashrc': Permission denied
then exits status 1.
failure. Sandbox never reaches Ready.
Environment
via libkrun directly on host, Colima is not involved at sandbox runtime)
mainrunningnemoclaw onboard --non-interactiveLogs
Agent-First Checklist
debug-openshell-cluster,debug-inference,openshell-cli)