chriswritescode-dev · chriswritescode-dev · Apr 4, 2026 · Mar 31, 2026 · Apr 3, 2026 · Apr 4, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
 dist/
 node_modules/
 *.tsbuildinfo
+.pnpm-store/
diff --git a/README.md b/README.md
@@ -38,6 +38,7 @@ The local embedding model downloads automatically on install. For API-based embe
 - **CLI Tools** - Export, import, list, stats, cleanup, upgrade, status, and cancel commands via `ocm-mem` binary
 - **Dimension Mismatch Detection** - Detects embedding model changes and guides recovery via reindex
 - **Iterative Development Loops** - Autonomous coding/auditing loop with worktree isolation, session rotation, stall detection, and review finding persistence
+- **Docker Sandbox** - Run loops inside isolated Docker containers with bind-mounted project directory, automatic container lifecycle, and selective tool routing (bash, glob, grep)
 
 ## Agents
 
@@ -269,6 +270,10 @@ You can edit this file to customize settings. The file is created only if it doe
     "minAudits": 1,
     "stallTimeoutMs": 60000
   },
+  "sandbox": {
+    "mode": "off",
+    "image": "ocm-sandbox:latest"
+  },
   "tui": {
     "sidebar": true,
     "showLoops": true,
@@ -344,6 +349,10 @@ When enabled, logs are written to the specified file with timestamps. The log fi
 - `loop.stallTimeoutMs` - Watchdog stall detection timeout in milliseconds (default: `60000`)
 - `loop.minAudits` - Minimum audit iterations required before completion (default: `1`)
 
+#### Sandbox
+- `sandbox.mode` - Sandbox mode: `"off"` or `"docker"` (default: `"off"`)
+- `sandbox.image` - Docker image for sandbox containers (default: `"ocm-sandbox:latest"`)
+
 #### Top-level
 - `defaultKvTtlMs` - Default TTL for KV store entries in milliseconds (default: `604800000` / 7 days)
 
@@ -397,8 +406,8 @@ After the architect presents a plan, the user approves via one of four execution
 
 - **New session** — Creates a new Code session via `memory-plan-execute`
 - **Execute here** — Executes the plan in the current session (code agent takes over immediately)
-- **Loop (worktree)** — Runs the plan in an isolated git worktree with iterative coding/auditing via `memory-loop`
-- **Loop** — Same as loop (worktree) but runs in the current directory (no worktree isolation)
+- **Loop (worktree)** — Runs the plan in an isolated git worktree with iterative coding/auditing via `memory-loop`. When `config.sandbox.mode` is `"docker"`, the loop automatically uses Docker sandbox.
+- **Loop** — Same as loop (worktree) but runs in the current directory (no worktree isolation, no sandbox)
 
 Set `executionModel` in your config to a fast model (e.g., Haiku) and use a smart model (e.g., Opus) for the architect session.
 
@@ -460,6 +469,84 @@ By default, loops run in an isolated git worktree. Set `inPlace: true` to run in
 
 See the [full documentation](https://chriswritescode-dev.github.io/opencode-memory/features/memory/#loop) for details on worktree management, model configuration, and termination conditions.
 
+## Docker Sandbox
+
+Run loop iterations inside an isolated Docker container. Three tools (`bash`, `glob`, `grep`) execute inside the container via `docker exec`, while `read`/`write`/`edit` operate on the host filesystem. Your project directory is bind-mounted at `/workspace` for instant file sharing.
+
+### Prerequisites
+
+- Docker running on your machine
+
+### Setup
+
+**1. Build the sandbox image:**
+
+```bash
+docker build -t ocm-sandbox:latest container/
+```
+
+The image includes Node.js 24, pnpm, Bun, Python 3 + uv, ripgrep, git, and jq.
+
+**2. Enable sandbox mode in your config** (`~/.config/opencode/memory-config.jsonc`):
+
+```jsonc
+{
+  "sandbox": {
+    "mode": "docker",
+    "image": "ocm-sandbox:latest"
+  }
+}
+```
+
+**3. Restart OpenCode.**
+
+### Usage
+
+Start a sandbox loop via the architect plan approval flow (select "Loop (worktree)") or directly with the `memory-loop` tool:
+
+```
+memory-loop with worktree: true
+```
+
+Sandbox is automatically enabled when `config.sandbox.mode` is set to `"docker"` and the loop uses `worktree: true`. The loop:
+1. Creates a git worktree (if `worktree: true`)
+2. Starts a Docker container with the worktree directory bind-mounted at `/workspace`
+3. Redirects `bash`, `glob`, and `grep` tool calls into the container
+4. Cleans up the container on loop completion or cancellation
+
+### How It Works
+
+- **Bind mount** -- the project directory is mounted directly into the container at `/workspace`. No sync daemon, no file copying. Changes are visible instantly on both sides.
+- **Tool redirection** -- `bash`, `glob`, and `grep` route through `docker exec` when a session belongs to a sandbox loop. The `read`/`write`/`edit` tools operate on the host filesystem directly (compatible with host LSP).
+- **Git blocking** -- git commands are explicitly blocked inside the container. All git operations (commit, push, branch management) are handled by the loop system on the host.
+- **Host LSP** -- since files are shared via the bind mount, OpenCode's LSP servers on the host read the same files and provide diagnostics after writes and edits.
+- **Container lifecycle** -- one container per loop, automatically started and stopped. Container name format: `ocm-sandbox-<worktreeName>`.
+
+### Configuration
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `sandbox.mode` | `"off"` | Set to `"docker"` to enable sandbox support |
+| `sandbox.image` | `"ocm-sandbox:latest"` | Docker image to use for sandbox containers |
+
+### Customizing the Image
+
+The `container/Dockerfile` is included in the project. To add project-specific tools (e.g., Go, Rust, additional language servers), edit the Dockerfile and rebuild:
+
+```bash
+docker build -t ocm-sandbox:latest container/
+```
+
+### Caveats
+
+- **Worktree required** -- sandbox only works with `worktree: true`. In-place loops (`worktree: false`) never use sandbox.
+- **Git blocked** -- git commands are explicitly blocked inside the container. All git operations are handled by the loop system on the host.
+- **No `tsc` global** -- TypeScript compiler is not globally available in the container. Use `pnpm tsc` or add it to your project dependencies.
+- **pnpm install caution** -- running `pnpm install` in the container writes `node_modules` to the host via the bind mount, potentially bloating worktree diffs.
+- **No network isolation** -- the container has full network access (no `--network=none` flag).
+- **No resource limits** -- no `--memory`, `--cpus`, or `--pids-limit` flags are applied.
+- **Orphan cleanup** -- orphaned containers from previous runs are automatically cleaned up on plugin startup.
+
 ## Documentation
 
 Full documentation available at [chriswritescode-dev.github.io/opencode-memory/features/memory](https://chriswritescode-dev.github.io/opencode-memory/features/memory/)

diff --git a/config.jsonc b/config.jsonc
@@ -44,6 +44,10 @@
     "minAudits": 1,
     "stallTimeoutMs": 60000
   },
+  "sandbox": {
+    "mode": "off",
+    "image": "ocm-sandbox:latest"
+  },
   "tui": {
     "sidebar": true,
     "showLoops": true,

diff --git a/container/.dockerignore b/container/.dockerignore
@@ -0,0 +1,4 @@
+node_modules
+.git
+dist
+*.log
diff --git a/container/Dockerfile b/container/Dockerfile
@@ -0,0 +1,35 @@
+FROM node:24-slim
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    curl \
+    jq \
+    python3 \
+    python3-venv \
+    sudo \
+    ca-certificates \
+    unzip \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN npm install -g pnpm
+
+ENV PNPM_HOME="/home/devuser/.local/share/pnpm"
+ENV npm_config_store_dir="/home/devuser/.local/share/pnpm/store"
+
+RUN ARCH="$(uname -m)" && \
+    curl -LsSf "https://github.com/BurntSushi/ripgrep/releases/download/14.1.1/ripgrep-14.1.1-${ARCH}-unknown-linux-gnu.tar.gz" | tar xz && \
+    mv "ripgrep-14.1.1-${ARCH}-unknown-linux-gnu/rg" /usr/local/bin/ && \
+    rm -rf "ripgrep-14.1.1-${ARCH}-unknown-linux-gnu"
+
+RUN useradd -m -s /bin/bash -u 1001 devuser && \
+    echo "devuser ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
+
+USER devuser
+
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh
+RUN curl -fsSL https://bun.sh/install | bash
+
+ENV PATH="/home/devuser/.local/bin:/home/devuser/.bun/bin:/home/devuser/.cargo/bin:${PATH}"
+
+WORKDIR /workspace
+CMD ["sleep", "infinity"]
diff --git a/package.json b/package.json
@@ -51,7 +51,6 @@
     "@huggingface/transformers": "^3.8.1",
     "@opencode-ai/plugin": "^1.3.5",
     "@opencode-ai/sdk": "^1.2.26",
-    "jsonc-parser": "^3.3.1",
     "sqlite-vec": "0.1.7-alpha.2"
   },
   "peerDependencies": {

diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml
diff --git a/src/agents/architect.ts b/src/agents/architect.ts
@@ -99,11 +99,50 @@ KV entries are scoped to the current project and expire after 7 days. Use this f
 Present plans with:
 - **Objective**: What we're building and why
 - **Phases**: Ordered implementation steps, each with specific files to create/modify, what changes to make, and acceptance criteria
-- **Verification**: Concrete, runnable commands that prove the plan is complete. Every plan MUST include at least one verification step. Examples:
-  - Test commands: \`pnpm test\`, \`vitest run src/path/to/test.ts\`
-  - Type checking: \`pnpm tsc --noEmit\`, \`pnpm lint\`
-  - Runtime checks: curl commands, specific assertions about output
-  Plans without verification steps are incomplete. If no existing tests cover the changes, the plan MUST include a phase to write tests.
+- **Verification**: Concrete criteria the code agent can validate automatically inside the loop. Every plan MUST include verification. Plans without verification are incomplete.
+
+  **Verification tiers (prefer higher tiers):**
+
+  | Tier | Type | Example | Why |
+  |---|---|---|---|
+  | 1 | Targeted tests | \`vitest run src/services/loop.test.ts\` | Directly exercises the new code paths |
+  | 2 | Type/lint checks | \`pnpm tsc --noEmit\`, \`pnpm lint\` | Catches structural and convention errors |
+  | 3 | File assertions | "src/services/auth.ts exports \`validateToken(token: string): boolean\`" | Auditor can verify by reading code |
+  | 4 | Behavioral assertions | "Calling \`parseConfig({})\` returns default config, not throws" | Should be captured in a test |
+
+  **Do NOT use these as verification — they cannot be validated in an automated loop:**
+  - \`pnpm build\` — tests bundling, not correctness; slow and opaque
+  - \`curl\` / HTTP requests — requires a running server
+  - \`pnpm test\` (full suite without path) — too broad, may fail for unrelated reasons
+  - Manual checks ("verify the UI", "check the output looks right")
+  - External service dependencies (APIs, databases that may not be running)
+
+  **Test requirements for new code:**
+  When a plan adds new functions, modules, or significant logic, verification MUST include either:
+  - Existing tests that already cover the new code paths (cite the specific test file)
+  - A dedicated phase to write targeted tests, specifying: what function/behavior to test, happy path, error cases, and edge cases
+
+  When tests are required, they must actually exercise the code — not just exist. The auditor will verify test quality.
+
+  **Per-phase acceptance criteria:**
+  Each phase MUST have its own acceptance criteria, not just a global verification section. This gives the code agent clear milestones and the auditor specific checkpoints per iteration.
+
+  **Good verification example:**
+  \`\`\`
+  ## Verification
+  1. \`vitest run test/loop.test.ts\` — all tests pass
+  2. \`pnpm tsc --noEmit\` — no type errors
+  3. \`src/services/loop.ts\` exports \`buildAuditPrompt\` accepting \`LoopState\`, returning \`string\`
+  4. New tests in \`test/loop.test.ts\` cover: empty state, state with findings, long prompt truncation
+  \`\`\`
+
+  **Bad verification example:**
+  \`\`\`
+  ## Verification
+  1. Run \`pnpm build\` — builds successfully
+  2. Start the server and test manually
+  3. Everything should work
+  \`\`\`
 - **Decisions**: Architectural choices made during planning with rationale
 - **Conventions**: Existing project conventions that must be followed
 - **Key Context**: Relevant code patterns, file locations, integration points, and dependencies discovered during research

diff --git a/src/agents/auditor.ts b/src/agents/auditor.ts
@@ -57,9 +57,12 @@ Diffs alone are not enough. After getting the diff:
 
 **Behavior Changes** — If a behavioral change is introduced, raise it (especially if possibly unintentional).
 
-**Plan Compliance** — When reviewing loop iterations, check whether the implementation satisfies the plan's stated acceptance criteria and verification steps.
-- If the task context includes verification commands (test, lint, type check), check whether they were run and passed
-- If acceptance criteria from the plan are not met, report as a **warning** with the specific unmet criterion
+**Plan Compliance** — When reviewing loop iterations, rigorously verify the implementation against the plan's stated acceptance criteria and verification steps.
+- Check **per-phase acceptance criteria**: each plan phase should have its own criteria. Verify every phase that has been implemented so far.
+- If verification commands are listed (targeted tests, type check, lint), confirm they were run AND passed. If you can't confirm, run them yourself.
+- If the plan required tests to be written, verify the tests actually exercise the stated scenarios — not just that they exist. Tests that pass trivially (empty assertions, mocked everything) do not satisfy the requirement.
+- If file-level assertions are listed (e.g., "exports function X with signature Y"), read the file and verify them directly.
+- Report **unmet acceptance criteria as bug severity** — they block loop completion. Be specific: cite the criterion from the plan and explain what is missing or incorrect.
 
 ## Before You Flag Something