A live AI escape room. Teams write prompts; an AI ghost operates a Linux desktop and tries to solve the puzzle. The spectacle of watching the ghost work is the experience.
Each team gets their own browser tab pointed at a full Linux desktop running inside a Modal container. They edit a SKILL.md prompt file in an in-desktop editor, hit Enter in the RUN AGENT terminal, and watch a local Qwen 2.5 7B (served by Ollama) autonomously navigate the filesystem, read clues, correlate information, and submit answers. The differentiator between teams is prompt quality, not CS skill.
- 20 isolated Linux desktops deployed on Modal (L4 GPU per container).
- Each container gets its own public HTTPS URL via
modal.forward(6080)into an XFCE desktop served through noVNC — teams just open a browser tab. - A local Qwen 2.5 7B runs in each container via Ollama. No API keys, no cloud LLM billing.
- 3 phases of filesystem puzzles gated by an
unlockcommand. Each phase requires the agent to find and correlate information from different parts of the filesystem. - Live scoreboard: each container tails
scores.jsonland pushes events to a sharedmodal.Dictthat the driver renders as a live-updating Rich table. - Unlimited retries. Phase unlock state persists across agent runs, so teams can iterate on their prompt without losing progress.
- "The Captain" — Correlate a departure log with a crew roster. Answer is a captain's last name.
- "The Marked Shelf" — Cross-reference a librarian's note against a book catalog and a shelf inventory. Answer is an author's first name.
- "The Shifted Message" — Decode a short Caesar cipher. The decoded word must appear in a provided dictionary. Requires telling the agent to actually compute the shift (e.g. "use python3 to decode") rather than doing the arithmetic in its head — this is the knob that separates winning prompts from losing ones.
┌─ Modal L4 container (one per team, spawned by .spawn()) ─────────────┐
│ │
│ Ollama ── qwen2.5:7b (pre-baked into a modal.Volume) │
│ │ │
│ agent_runner.py ── calls /api/chat with a tools array │
│ │ (exec, read, list, done) │
│ │ │
│ Xvfb :99 ── xfce4-session │
│ ├─ mousepad (auto-opened on /root/puzzle/SKILL.md) │
│ └─ xfce4-terminal running run_agent_loop.sh │
│ "Press Enter to run the agent" │
│ │
│ x11vnc :5900 ─▶ websockify+noVNC :6080 ─▶ modal.forward(6080) │
│ │ │
│ ▼ │
│ public HTTPS URL │
│ │
│ /root/puzzle/ │
│ ├─ README.md, SKILL.md, AGENTS.md │
│ ├─ phase1/ (visible from the start) │
│ ├─ phase2/ (copied from /var/lib/puzzle-staging when unlocked) │
│ ├─ phase3/ (copied when phase2 unlocks) │
│ ├─ state/phase{N}.unlocked (persistent markers) │
│ ├─ scores.jsonl │
│ └─ unlock.sh │
│ │
│ score_watcher.py ── tails scores.jsonl ─▶ modal.Dict │
│ │
└──────────────────────────────────────────────────────────────────────┘
Local driver (@app.local_entrypoint):
1. Read teams.txt (20 names)
2. .spawn() 20 run_team_desktop containers
3. Collect 20 URLs from modal.Queue.from_name("ghostroom-urls")
4. Print the team → URL table
5. Poll modal.Dict.from_name("ghostroom-scores") every 15s
and render a live Rich scoreboard until duration expires
The original design called for OpenClaw's gateway + openclaw agent to drive the session. In practice, OpenClaw's Ollama tool-call wiring produces schemas that Qwen 2.5 7B emits as stringified JSON inside markdown code blocks rather than as real function calls. container/agent_runner.py bypasses the gateway and hits Ollama's /api/chat directly with a tools array, where Qwen 2.5 7B emits proper tool_calls 100% of the time. OpenClaw is still installed and runnable (best-effort gateway at :18789) for teams who want to explore it, but the puzzle-solving loop goes through the direct agent.
.
├── modal_app.py # Image, run_team_desktop, local_entrypoint driver
├── teams.txt # 20 team names, one per line (edit freely)
├── requirements-driver.txt # Local deps for the driver (modal, rich)
├── container/ # In-container assets (baked into image)
│ ├── boot.sh # Orchestration entrypoint: ollama, xfce, vnc, gateway, score watcher
│ ├── agent_runner.py # Python agent loop over Ollama /api/chat with tools
│ ├── run_agent_loop.sh # "Press Enter to run" terminal
│ ├── _agent_inner.sh # One-shot launcher (alternative to run_agent_loop)
│ ├── run_agent.sh # Desktop-launcher target
│ ├── openclaw.config.json # Pre-baked OpenClaw config (Ollama provider, coding profile)
│ ├── skill_template.md # Starter SKILL.md teams edit
│ ├── score_watcher.py # Tails scores.jsonl → modal.Dict
│ ├── autostart/ # XFCE autostart entries
│ └── desktop/ # Desktop launcher file
├── puzzle/ # Puzzle content (baked into image)
│ ├── README.md # Agent entry briefing
│ ├── unlock.sh # The gate: validates answers, unlocks next phase
│ ├── phase1/ # Crew roster + departure log
│ ├── phase2/ # Library catalog + shelf inventory
│ └── phase3/ # Caesar cipher + dictionary
├── docs/ # Screenshots for this README
├── modal.md # Modal rules/guidelines reference
├── CLAUDE.md # Project-scoped Claude instructions
├── LICENSE
└── README.md
- Python 3.12+
- A Modal account (
modal setup) - Modal credits (~$100–200 for a 3-hour, 20-team event on L4)
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-driver.txt
modal setup # first time onlyThe first modal run builds the image, which takes ~5–15 minutes (it installs Node 22, OpenClaw, XFCE, Ollama, and pre-pulls qwen2.5:7b into a shared modal.Volume). Subsequent runs use cached layers and are much faster.
modal run --detach modal_app.py::run_team_desktop \
--team test-01 \
--duration-s 1800 \
--url-queue-name ghostroom-urls-testIn a second shell, grab the URL off the test queue:
python -c "
import modal
print(modal.Queue.from_name('ghostroom-urls-test').get()['url'])
"Open that URL in a browser. You should see:
- Mousepad open on
/root/puzzle/SKILL.md(edit your prompt here). - Welcome terminal showing
/root/puzzle/README.md. - RUN AGENT terminal with a banner, live SKILL.md preview, and
Press Enter to run the agent....
Click into the RUN AGENT terminal, press Enter, and watch tool calls scroll in real time as the ghost reads files, reasons, and calls unlock <phase> <answer>.
head -n3 teams.txt > teams.small.txt
modal run modal_app.py --duration-min 20 --teams-file teams.small.txtThe driver prints the URL table as containers come up, then switches to a live scoreboard refreshing every 15 seconds.
modal run modal_app.py --duration-min 180 --teams-file teams.txtmodal app stop ghostroomEach unlock is logged to /root/puzzle/scores.jsonl as a line of JSON:
{"ts": 1775772636, "phase": 3, "event": "unlock", "answer": "lantern"}score_watcher.py tails this file and pushes updates into a shared modal.Dict keyed by team name. The driver reads the dict every 15 seconds and renders a table sorted by:
- Phases unlocked (descending)
- Sum of first-unlock timestamps (ascending — faster wins)
- Wrong-answer count (ascending — fewer wrong guesses breaks ties)
Only the first unlock of a phase counts. Wrong, out-of-order, and bad-phase attempts all increment the tiebreaker counter.
Click to reveal
- Phase 1:
delgado(captain of the ship that departed2024-03-14) - Phase 2:
dalia(first name of the author of the asterisked book on shelf B3) - Phase 3:
lantern(Caesar −3 decode ofodqwhuq, verified against the provided word list)
Keep these away from teams.
- GPU: 20 × L4 × 3 h × ~$0.80/hr ≈ $48
- Compute / egress / storage: ~$30–50
- Total: ~$80–100 — comfortably under $200.
Extend to 4 hours for ~$16 more. The first build pulls the model (~5 GB) into a persistent modal.Volume, so subsequent deploys skip the download entirely.
- Event duration —
--duration-minon the driver (default180). - Default prompt —
container/skill_template.md. Make it weaker to widen the prompt-quality gap between teams; make it stronger if teams are struggling. - Model —
MODEL_TAGinmodal_app.pyandcontainer/boot.sh.qwen2.5:7bis the current sweet spot for Ollama native tool calling at 7B scale. Other models tested:qwen2.5-coder:7bandqwen3:8bboth emit tool calls as stringified JSON rather than real function calls in Ollama. - Agent turn budget —
--max-turnsinagent_runner.py(default60), and the nudge budget (INITIAL_NUDGES = 6). - Resolution —
Xvfb :99 -screen 0 1280x800x24incontainer/boot.sh. Drop to 1024×768 if noVNC is laggy.
npm install -g openclawwithout a version tag can resolve to a squatter placeholder (openclaw@0.0.1). Always pin toopenclaw@2026.4.9or later.- OpenClaw requires Node ≥22.12 at runtime. Debian's default Node 18 is too old; we install Node 22 LTS from NodeSource.
- OpenClaw's gateway boot hook auto-drops
BOOTSTRAP.md,SOUL.md,USER.md, etc. into any workspace on first run, which confuses the agent's role.boot.shdeletes them. - XFCE desktop icons require trust metadata (
gio set … metadata::trusted true) to be launchable with a double-click. Rather than fight this, theRUN AGENTexperience is an auto-opened terminal — more discoverable anyway. - Gate bypass — a curious team could
ls /var/lib/puzzle-staging/and peek at phase 2/3 before unlocking phase 1. Scoring only credits phases unlocked viaunlock.sh, so cheating doesn't produce score events. Honor system for gameplay; hard-enforced for scoring.
Built over a single afternoon against Modal, Ollama, noVNC, XFCE, and OpenClaw. The puzzles, agent loop, and container plumbing were iterated live against a real Modal container with Claude Code.
MIT — see LICENSE.

