Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
dist/
node_modules/
*.tsbuildinfo
.pnpm-store/
91 changes: 89 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ The local embedding model downloads automatically on install. For API-based embe
- **CLI Tools** - Export, import, list, stats, cleanup, upgrade, status, and cancel commands via `ocm-mem` binary
- **Dimension Mismatch Detection** - Detects embedding model changes and guides recovery via reindex
- **Iterative Development Loops** - Autonomous coding/auditing loop with worktree isolation, session rotation, stall detection, and review finding persistence
- **Docker Sandbox** - Run loops inside isolated Docker containers with bind-mounted project directory, automatic container lifecycle, and selective tool routing (bash, glob, grep)

## Agents

Expand Down Expand Up @@ -269,6 +270,10 @@ You can edit this file to customize settings. The file is created only if it doe
"minAudits": 1,
"stallTimeoutMs": 60000
},
"sandbox": {
"mode": "off",
"image": "ocm-sandbox:latest"
},
"tui": {
"sidebar": true,
"showLoops": true,
Expand Down Expand Up @@ -344,6 +349,10 @@ When enabled, logs are written to the specified file with timestamps. The log fi
- `loop.stallTimeoutMs` - Watchdog stall detection timeout in milliseconds (default: `60000`)
- `loop.minAudits` - Minimum audit iterations required before completion (default: `1`)

#### Sandbox
- `sandbox.mode` - Sandbox mode: `"off"` or `"docker"` (default: `"off"`)
- `sandbox.image` - Docker image for sandbox containers (default: `"ocm-sandbox:latest"`)

#### Top-level
- `defaultKvTtlMs` - Default TTL for KV store entries in milliseconds (default: `604800000` / 7 days)

Expand Down Expand Up @@ -397,8 +406,8 @@ After the architect presents a plan, the user approves via one of four execution

- **New session** — Creates a new Code session via `memory-plan-execute`
- **Execute here** — Executes the plan in the current session (code agent takes over immediately)
- **Loop (worktree)** — Runs the plan in an isolated git worktree with iterative coding/auditing via `memory-loop`
- **Loop** — Same as loop (worktree) but runs in the current directory (no worktree isolation)
- **Loop (worktree)** — Runs the plan in an isolated git worktree with iterative coding/auditing via `memory-loop`. When `config.sandbox.mode` is `"docker"`, the loop automatically uses Docker sandbox.
- **Loop** — Same as loop (worktree) but runs in the current directory (no worktree isolation, no sandbox)

Set `executionModel` in your config to a fast model (e.g., Haiku) and use a smart model (e.g., Opus) for the architect session.

Expand Down Expand Up @@ -460,6 +469,84 @@ By default, loops run in an isolated git worktree. Set `inPlace: true` to run in

See the [full documentation](https://chriswritescode-dev.github.io/opencode-memory/features/memory/#loop) for details on worktree management, model configuration, and termination conditions.

## Docker Sandbox

Run loop iterations inside an isolated Docker container. Three tools (`bash`, `glob`, `grep`) execute inside the container via `docker exec`, while `read`/`write`/`edit` operate on the host filesystem. Your project directory is bind-mounted at `/workspace` for instant file sharing.

### Prerequisites

- Docker running on your machine

### Setup

**1. Build the sandbox image:**

```bash
docker build -t ocm-sandbox:latest container/
```

The image includes Node.js 24, pnpm, Bun, Python 3 + uv, ripgrep, git, and jq.

**2. Enable sandbox mode in your config** (`~/.config/opencode/memory-config.jsonc`):

```jsonc
{
"sandbox": {
"mode": "docker",
"image": "ocm-sandbox:latest"
}
}
```

**3. Restart OpenCode.**

### Usage

Start a sandbox loop via the architect plan approval flow (select "Loop (worktree)") or directly with the `memory-loop` tool:

```
memory-loop with worktree: true
```

Sandbox is automatically enabled when `config.sandbox.mode` is set to `"docker"` and the loop uses `worktree: true`. The loop:
1. Creates a git worktree (if `worktree: true`)
2. Starts a Docker container with the worktree directory bind-mounted at `/workspace`
3. Redirects `bash`, `glob`, and `grep` tool calls into the container
4. Cleans up the container on loop completion or cancellation

### How It Works

- **Bind mount** -- the project directory is mounted directly into the container at `/workspace`. No sync daemon, no file copying. Changes are visible instantly on both sides.
- **Tool redirection** -- `bash`, `glob`, and `grep` route through `docker exec` when a session belongs to a sandbox loop. The `read`/`write`/`edit` tools operate on the host filesystem directly (compatible with host LSP).
- **Git blocking** -- git commands are explicitly blocked inside the container. All git operations (commit, push, branch management) are handled by the loop system on the host.
- **Host LSP** -- since files are shared via the bind mount, OpenCode's LSP servers on the host read the same files and provide diagnostics after writes and edits.
- **Container lifecycle** -- one container per loop, automatically started and stopped. Container name format: `ocm-sandbox-<worktreeName>`.

### Configuration

| Option | Default | Description |
|--------|---------|-------------|
| `sandbox.mode` | `"off"` | Set to `"docker"` to enable sandbox support |
| `sandbox.image` | `"ocm-sandbox:latest"` | Docker image to use for sandbox containers |

### Customizing the Image

The `container/Dockerfile` is included in the project. To add project-specific tools (e.g., Go, Rust, additional language servers), edit the Dockerfile and rebuild:

```bash
docker build -t ocm-sandbox:latest container/
```

### Caveats

- **Worktree required** -- sandbox only works with `worktree: true`. In-place loops (`worktree: false`) never use sandbox.
- **Git blocked** -- git commands are explicitly blocked inside the container. All git operations are handled by the loop system on the host.
- **No `tsc` global** -- TypeScript compiler is not globally available in the container. Use `pnpm tsc` or add it to your project dependencies.
- **pnpm install caution** -- running `pnpm install` in the container writes `node_modules` to the host via the bind mount, potentially bloating worktree diffs.
- **No network isolation** -- the container has full network access (no `--network=none` flag).
- **No resource limits** -- no `--memory`, `--cpus`, or `--pids-limit` flags are applied.
- **Orphan cleanup** -- orphaned containers from previous runs are automatically cleaned up on plugin startup.

## Documentation

Full documentation available at [chriswritescode-dev.github.io/opencode-memory/features/memory](https://chriswritescode-dev.github.io/opencode-memory/features/memory/)
Expand Down
4 changes: 4 additions & 0 deletions config.jsonc
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@
"minAudits": 1,
"stallTimeoutMs": 60000
},
"sandbox": {
"mode": "off",
"image": "ocm-sandbox:latest"
},
"tui": {
"sidebar": true,
"showLoops": true,
Expand Down
4 changes: 4 additions & 0 deletions container/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
node_modules
.git
dist
*.log
35 changes: 35 additions & 0 deletions container/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
FROM node:24-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
git \
curl \
jq \
python3 \
python3-venv \
sudo \
ca-certificates \
unzip \
&& rm -rf /var/lib/apt/lists/*

RUN npm install -g pnpm

ENV PNPM_HOME="/home/devuser/.local/share/pnpm"
ENV npm_config_store_dir="/home/devuser/.local/share/pnpm/store"

RUN ARCH="$(uname -m)" && \
curl -LsSf "https://github.com/BurntSushi/ripgrep/releases/download/14.1.1/ripgrep-14.1.1-${ARCH}-unknown-linux-gnu.tar.gz" | tar xz && \
mv "ripgrep-14.1.1-${ARCH}-unknown-linux-gnu/rg" /usr/local/bin/ && \
rm -rf "ripgrep-14.1.1-${ARCH}-unknown-linux-gnu"

RUN useradd -m -s /bin/bash -u 1001 devuser && \
echo "devuser ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

USER devuser

RUN curl -LsSf https://astral.sh/uv/install.sh | sh
RUN curl -fsSL https://bun.sh/install | bash

ENV PATH="/home/devuser/.local/bin:/home/devuser/.bun/bin:/home/devuser/.cargo/bin:${PATH}"

WORKDIR /workspace
CMD ["sleep", "infinity"]
1 change: 0 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@
"@huggingface/transformers": "^3.8.1",
"@opencode-ai/plugin": "^1.3.5",
"@opencode-ai/sdk": "^1.2.26",
"jsonc-parser": "^3.3.1",
"sqlite-vec": "0.1.7-alpha.2"
},
"peerDependencies": {
Expand Down
24 changes: 0 additions & 24 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

49 changes: 44 additions & 5 deletions src/agents/architect.ts
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,50 @@ KV entries are scoped to the current project and expire after 7 days. Use this f
Present plans with:
- **Objective**: What we're building and why
- **Phases**: Ordered implementation steps, each with specific files to create/modify, what changes to make, and acceptance criteria
- **Verification**: Concrete, runnable commands that prove the plan is complete. Every plan MUST include at least one verification step. Examples:
- Test commands: \`pnpm test\`, \`vitest run src/path/to/test.ts\`
- Type checking: \`pnpm tsc --noEmit\`, \`pnpm lint\`
- Runtime checks: curl commands, specific assertions about output
Plans without verification steps are incomplete. If no existing tests cover the changes, the plan MUST include a phase to write tests.
- **Verification**: Concrete criteria the code agent can validate automatically inside the loop. Every plan MUST include verification. Plans without verification are incomplete.

**Verification tiers (prefer higher tiers):**

| Tier | Type | Example | Why |
|---|---|---|---|
| 1 | Targeted tests | \`vitest run src/services/loop.test.ts\` | Directly exercises the new code paths |
| 2 | Type/lint checks | \`pnpm tsc --noEmit\`, \`pnpm lint\` | Catches structural and convention errors |
| 3 | File assertions | "src/services/auth.ts exports \`validateToken(token: string): boolean\`" | Auditor can verify by reading code |
| 4 | Behavioral assertions | "Calling \`parseConfig({})\` returns default config, not throws" | Should be captured in a test |

**Do NOT use these as verification — they cannot be validated in an automated loop:**
- \`pnpm build\` — tests bundling, not correctness; slow and opaque
- \`curl\` / HTTP requests — requires a running server
- \`pnpm test\` (full suite without path) — too broad, may fail for unrelated reasons
- Manual checks ("verify the UI", "check the output looks right")
- External service dependencies (APIs, databases that may not be running)

**Test requirements for new code:**
When a plan adds new functions, modules, or significant logic, verification MUST include either:
- Existing tests that already cover the new code paths (cite the specific test file)
- A dedicated phase to write targeted tests, specifying: what function/behavior to test, happy path, error cases, and edge cases

When tests are required, they must actually exercise the code — not just exist. The auditor will verify test quality.

**Per-phase acceptance criteria:**
Each phase MUST have its own acceptance criteria, not just a global verification section. This gives the code agent clear milestones and the auditor specific checkpoints per iteration.

**Good verification example:**
\`\`\`
## Verification
1. \`vitest run test/loop.test.ts\` — all tests pass
2. \`pnpm tsc --noEmit\` — no type errors
3. \`src/services/loop.ts\` exports \`buildAuditPrompt\` accepting \`LoopState\`, returning \`string\`
4. New tests in \`test/loop.test.ts\` cover: empty state, state with findings, long prompt truncation
\`\`\`

**Bad verification example:**
\`\`\`
## Verification
1. Run \`pnpm build\` — builds successfully
2. Start the server and test manually
3. Everything should work
\`\`\`
- **Decisions**: Architectural choices made during planning with rationale
- **Conventions**: Existing project conventions that must be followed
- **Key Context**: Relevant code patterns, file locations, integration points, and dependencies discovered during research
Expand Down
9 changes: 6 additions & 3 deletions src/agents/auditor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,12 @@ Diffs alone are not enough. After getting the diff:

**Behavior Changes** — If a behavioral change is introduced, raise it (especially if possibly unintentional).

**Plan Compliance** — When reviewing loop iterations, check whether the implementation satisfies the plan's stated acceptance criteria and verification steps.
- If the task context includes verification commands (test, lint, type check), check whether they were run and passed
- If acceptance criteria from the plan are not met, report as a **warning** with the specific unmet criterion
**Plan Compliance** — When reviewing loop iterations, rigorously verify the implementation against the plan's stated acceptance criteria and verification steps.
- Check **per-phase acceptance criteria**: each plan phase should have its own criteria. Verify every phase that has been implemented so far.
- If verification commands are listed (targeted tests, type check, lint), confirm they were run AND passed. If you can't confirm, run them yourself.
- If the plan required tests to be written, verify the tests actually exercise the stated scenarios — not just that they exist. Tests that pass trivially (empty assertions, mocked everything) do not satisfy the requirement.
- If file-level assertions are listed (e.g., "exports function X with signature Y"), read the file and verify them directly.
- Report **unmet acceptance criteria as bug severity** — they block loop completion. Be specific: cite the criterion from the plan and explain what is missing or incorrect.

## Before You Flag Something

Expand Down
Loading
Loading