Skip to content

fix+docs: make docker compose up -d actually work#16

Merged
kdairatchi merged 2 commits intomainfrom
chore/fix-compose-paths
Apr 18, 2026
Merged

fix+docs: make docker compose up -d actually work#16
kdairatchi merged 2 commits intomainfrom
chore/fix-compose-paths

Conversation

@kdairatchi
Copy link
Copy Markdown
Contributor

@kdairatchi kdairatchi commented Apr 18, 2026

Summary

Running `docker compose up -d` against the published image crashed the container in a restart loop. Two real defects + one doc bug. All three fixed in this PR.

1. User/UID collision (runtime crash, exit 4)

Upstream `/opt/gem/run.sh` defaults `USER=gem USER_UID=1000 USER_GID=1000` and does `groupadd --gid 1000 gem`. But `cybersandbox/Dockerfile:143` already baked a `hunter` user at UID/GID 1000. Collision: `groupadd: GID '1000' already exists` → `set -e` kills the script → restart loop.
Fix: set `USER=hunter USER_UID=1000 USER_GID=1000` in the root compose env.

2. Premature privilege drop (runtime crash, exit 1)

`cybersandbox/Dockerfile:150` ends with `USER 1000`. Upstream's run.sh expects root at entry so it can useradd, `chown -R` /opt/jupyter, and write /etc/sudoers.d/$USER — it self-drops to $USER via `su -` before exec'ing supervisord. Running as UID 1000 means chown fails → `set -e` → crash. Running as root at entry does not weaken the app-layer posture: every supervised process (nginx, jupyter, code-server, mcp-hub, browser, VNC) still runs as hunter.
Fix: `user: "0:0"` on the sandbox service. Clean fix is to drop `USER 1000` from the Dockerfile + rebuild; tracked as a follow-up issue.

3. Two compose files, one README pointed at the wrong one

  • `./docker-compose.yaml` — consumer setup (pulls the published image).
  • `cybersandbox/docker-compose.yml` — maintainer setup (builds locally, mounts a personal Obsidian vault at `/mnt/c/Users/Dr34d/...`, binds `127.0.0.1:8082`).

README told new users `cd cybersandbox && docker compose up -d`, dropping them onto the maintainer file with host paths that don't exist on their machines.

Fix:

  • Rename `cybersandbox/docker-compose.yml` → `cybersandbox/docker-compose.dev.yml` so the filename self-documents.
  • Point root README at `./docker-compose.yaml` (run from repo root, no `cd`).
  • Document the WSL `docker-credential-desktop.exe` failure mode in the README (this is what surfaced the whole problem).
  • Update `cybersandbox/SETUP.md` commands to `-f docker-compose.dev.yml` and banner back to root quick-start.

Verification (done on this branch)

```
docker compose up -d # healthy in ~15s
docker compose ps # Up (healthy) 0.0.0.0:8080->8080
curl -s -o /dev/null -w "%{http_code}" :8080 # 200

python -c "from agent_sandbox import Sandbox; c=Sandbox(base_url='http://localhost:8080'); print(c.shell.exec_command(command='echo hi').data)"

session_id='...' output='hi' exit_code=0

```

All 11 supervisord services up (python-server, gem-server, browser, nginx, websocat, code-server, mcp-server-browser, jupyter, mcp-hub, openbox, tigervnc).

Follow-ups (not in this PR)

  • Drop `USER 1000` from `cybersandbox/Dockerfile:150` and rebuild the image. That lets us remove the `user: "0:0"` override from compose. (Separate issue — needs a CI image build + new :latest push.)

Test plan

  • `docker compose up -d` succeeds from repo root on WSL against published image
  • HTTP 200 on :8080 after 15s
  • Python SDK round-trip confirmed
  • Both compose files still parse (`docker compose config --quiet`)
  • Reviewer runs the same sequence on a fresh machine

Two compose files were confusing users: the root docker-compose.yaml
(pulls the published GHCR image, for consumers) and cybersandbox/
docker-compose.yml (builds locally with personal Obsidian/nuclei
mounts, for maintainers). The README pointed at the second one via
`cd cybersandbox`, dropping new users onto a dev compose with host
paths that don't exist on their machines.

- Rename cybersandbox/docker-compose.yml -> docker-compose.dev.yml
  so the filename self-documents its audience.
- Point root README at ./docker-compose.yaml (no `cd cybersandbox`).
- Document the WSL Docker Desktop credsStore failure mode since it
  bites anyone pulling the public image from WSL without Desktop
  running.
- Update SETUP.md commands to `-f docker-compose.dev.yml`.
Verified end-to-end: `Sandbox(base_url="http://localhost:8080")
.shell.exec_command(...)` now returns a real response envelope
against the pulled image.

Two defects in cybersandbox:latest were crashing the container in
a restart loop (exit 1/4) before any service bound to :8080:

1. Dockerfile:150 ends with `USER 1000`, but upstream
   /opt/gem/run.sh expects root entry so it can useradd, chown
   /opt/jupyter, and write /etc/sudoers.d/$USER. It self-drops to
   $USER via `su -` before exec'ing supervisord, so running as
   root at entry doesn't weaken the app-layer posture — hunter
   still owns every supervised process. Override with user: "0:0".

2. Upstream run.sh defaults USER=gem, USER_UID/GID=1000. The image
   already baked hunter at UID/GID 1000 (Dockerfile:143), so
   groupadd --gid 1000 gem blew up with `GID already exists`.
   Set USER=hunter explicitly in compose env.

Both are runtime-only patches; the clean fix is to drop `USER 1000`
from the Dockerfile and rebuild the image (see follow-up issue).
@kdairatchi kdairatchi changed the title docs: reconcile compose paths, document WSL credsStore gotcha fix+docs: make docker compose up -d actually work Apr 18, 2026
@kdairatchi kdairatchi merged commit 8be08c8 into main Apr 18, 2026
3 checks passed
@kdairatchi kdairatchi deleted the chore/fix-compose-paths branch April 18, 2026 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant