Agent Diagnostic
-
Investigated a local Homebrew OpenShell upgrade where sandbox creation got stuck in Provisioning.
-
Found that an older Homebrew install had created /opt/homebrew/var/openshell/gateway.toml.
-
That file persisted across upgrade and still pinned:
[openshell.drivers.docker]
supervisor_image = "ghcr.io/nvidia/openshell/supervisor:0.0.43"
-
After upgrading the CLI/gateway to 0.0.54, the Homebrew service still honored the old prefix config, so the gateway launched Docker sandboxes with supervisor 0.0.43.
-
Result: sandbox containers started, but never completed the supervisor relay handshake.
-
Cleaning the old state and reinstalling 0.0.55 removed the stale pin; the gateway then pulled ghcr.io/nvidia/openshell/supervisor:0.0.55 and sandbox creation worked.
Description
Actual behavior:
A Homebrew upgrade can leave an old /opt/homebrew/var/openshell/gateway.toml in place. If that file was generated by an older package flow and contains a version-pinned Docker supervisor_image, the upgraded gateway continues using that old supervisor image.
In my case:
CLI/gateway: 0.0.54
Docker supervisor: 0.0.43
Sandbox creation then got stuck at:
Starting sandbox... Waiting for supervisor relay
Expected behavior:
The upgrade/install path should not silently keep using an old Docker supervisor image that is incompatible with the upgraded gateway.
Possible fixes:
- Detect stale Homebrew prefix config during install/upgrade.
- Warn if
[openshell.drivers.docker].supervisor_image is pinned to a different OpenShell version than the installed gateway.
- Migrate/remove old generated Homebrew config when it only contains package-generated defaults.
- Prefer runtime defaults over old generated prefix config unless the user explicitly opted into it.
Reproduction Steps
-
Start from an older Homebrew OpenShell install that generated /opt/homebrew/var/openshell/gateway.toml.
-
Ensure that file contains a pinned Docker supervisor image, for example:
[openshell.drivers.docker]
supervisor_image = "ghcr.io/nvidia/openshell/supervisor:0.0.43"
-
Upgrade OpenShell using the current install script or Homebrew.
-
Run:
-
Observe that sandbox creation can hang at Waiting for supervisor relay.
Environment
- OS: macOS Darwin 25.1.0 arm64
- Install method: Homebrew via
install.sh
- Docker: Docker Desktop 28.3.x
- OpenShell upgrade observed: old
0.0.43-era config to 0.0.54
- Confirmed working after clean reinstall:
0.0.55
Logs
Relevant stale config:
[openshell.drivers.docker]
supervisor_image = "ghcr.io/nvidia/openshell/supervisor:0.0.43"
Version mismatch:
openshell --version
openshell 0.0.54
docker exec <sandbox-container> /opt/openshell/bin/openshell-sandbox --version
openshell-sandbox 0.0.43
Sandbox symptoms:
Starting sandbox... Waiting for supervisor relay
Sandbox logs included:
PermissionDenied, message: "this method requires a sandbox principal"
NET:FAIL host.openshell.internal:17670
After cleanup/reinstall, expected behavior returned:
Pulling docker supervisor image image="ghcr.io/nvidia/openshell/supervisor:0.0.55"
Extracting supervisor binary from image to host cache
Server listening address=127.0.0.1:17670
Agent-First Checklist
Agent Diagnostic
Investigated a local Homebrew OpenShell upgrade where sandbox creation got stuck in
Provisioning.Found that an older Homebrew install had created
/opt/homebrew/var/openshell/gateway.toml.That file persisted across upgrade and still pinned:
After upgrading the CLI/gateway to
0.0.54, the Homebrew service still honored the old prefix config, so the gateway launched Docker sandboxes with supervisor0.0.43.Result: sandbox containers started, but never completed the supervisor relay handshake.
Cleaning the old state and reinstalling
0.0.55removed the stale pin; the gateway then pulledghcr.io/nvidia/openshell/supervisor:0.0.55and sandbox creation worked.Description
Actual behavior:
A Homebrew upgrade can leave an old
/opt/homebrew/var/openshell/gateway.tomlin place. If that file was generated by an older package flow and contains a version-pinned Dockersupervisor_image, the upgraded gateway continues using that old supervisor image.In my case:
Sandbox creation then got stuck at:
Expected behavior:
The upgrade/install path should not silently keep using an old Docker supervisor image that is incompatible with the upgraded gateway.
Possible fixes:
[openshell.drivers.docker].supervisor_imageis pinned to a different OpenShell version than the installed gateway.Reproduction Steps
Start from an older Homebrew OpenShell install that generated
/opt/homebrew/var/openshell/gateway.toml.Ensure that file contains a pinned Docker supervisor image, for example:
Upgrade OpenShell using the current install script or Homebrew.
Run:
Observe that sandbox creation can hang at
Waiting for supervisor relay.Environment
install.sh0.0.43-era config to0.0.540.0.55Logs
Agent-First Checklist
debug-openshell-cluster,debug-inference,openshell-cli)