Skip to content

feat: sprout-agent + sprout-dev-mcp — minimal ACP coding agent#493

Merged
tlongwell-block merged 84 commits into
mainfrom
tyler/sprout-agent
May 11, 2026
Merged

feat: sprout-agent + sprout-dev-mcp — minimal ACP coding agent#493
tlongwell-block merged 84 commits into
mainfrom
tyler/sprout-agent

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

@tlongwell-block tlongwell-block commented May 6, 2026

What

Two new crates: a minimal, auditable coding agent and its MCP tool server.

sprout-agent speaks ACP over stdio, calls an LLM, executes MCP tools. Multiple concurrent sessions (configurable cap, default 8), each with its own MCP servers, history, and context. Internal context handoff when history fills. MCP-driven lifecycle hooks for task enforcement. Works with Zed, JetBrains, sprout-acp, or anything that speaks ACP.

sprout-dev-mcp is an MCP server providing shell, str_replace, and todo tools plus _Stop and _PostCompact lifecycle hooks. Ephemeral processes with process-group kill on every exit path. Bounded output. rg and tree on PATH (gitignore-aware, line counts). Works with any MCP client.

Together: ~4,200 lines of Rust purpose-built for headless autonomous coding work.

Why

See VISION_AGENT.md for the full rationale. The short version:

  • Auditable — a senior engineer reads both crates in a sitting
  • Correct at the boundary — ACP/MCP compliance, cancellation on every path, process-group kill on timeout
  • Composable — any ACP client gets a coding agent; any MCP server gets a capable caller
  • Hardened#![forbid(unsafe_code)], zero panics, bounded everything

Architecture

ACP client (Zed, JetBrains, sprout-acp)
    │ stdio ACP (JSON-RPC 2.0)
    v
sprout-agent (up to 8 concurrent sessions)
    │ stdio MCP (JSON-RPC 2.0) — one per session
    v
sprout-dev-mcp (or any MCP server)
    │
    v
shell, str_replace, todo; rg + tree on PATH

Key Features

Parallel Tool Calls

When the LLM returns multiple tool calls in one turn, they execute concurrently (default limit: 8). Semaphore-bounded JoinSet with cancel drain.

MCP Server Lifecycle

2-state machine (Healthy/Dead). Transport errors kill the process group and mark dead. Lazy restart with exponential backoff + jitter. Application-level errors returned to the LLM — server is healthy.

Context Handoff

When history exceeds 75% of budget, the agent summarizes and resets. Original task + _PostCompact hook state preserved across handoff.

Tree Shim (gitignore-aware, line counts)

tree is a PATH shim — shows directory structure with line counts, respects .gitignore. Bounded output (2000 lines / 50KB).

Todo Tool (MCP-native)

The todo tool lives in sprout-dev-mcp as a regular MCP tool. Same CRUD interface as before (full-list replacement, max 50 items, validation). Additionally exposes _Stop (returns objection if open items exist) and _PostCompact (returns full list for re-injection after handoff).

MCP Lifecycle Hooks (_Stop, _PostCompact)

The agent has a generic hook system compatible with the Open Plugin Spec. Any MCP server can participate in agent lifecycle events by exposing tools prefixed with _:

  • _Stop — called before honoring end_turn. If the hook returns non-empty text (an objection), the agent injects it as a tool result and continues. The todo server uses this to enforce task completion.
  • _PostCompact — called after context handoff/compaction. The hook response is injected into the fresh context so MCP servers can re-establish state visibility.

Hooks are:

  • Advisory, not authoritative — agent sovereignty via timeout (500ms), rejection budget (3/session), and consecutive-rejection stop (LLM heard the objection, made no tool calls, ended again → respect it)
  • Invisible to the LLM — tools prefixed with _ are filtered from the tool list and rejected if the LLM tries to call them directly
  • Fail-open — timeouts kill the server (standard restart path), errors are silent, hooks never block
  • Operator-configuredMCP_HOOK_SERVERS env var controls which servers get hook access (unset = no hooks, * = all, dev,policy = named list)

Safety Properties

Every input is bounded. Every exit path kills the process group. Every truncation is marked.

  • Protocol boundary: strict UTF-8, bounded line reader (4MB), JSON-RPC 2.0 validation
  • LLM interface: response body cap (16MB), error body cap (4KB), retry with jitter
  • MCP lifecycle: ArcSwap+CAS for concurrent safety, PgidGuard drop-guard
  • Tool execution: per-call timeout, process-group kill, bounded output (10MB)
  • Hook execution: 2500ms timeout, server killed on second timeout, fail-open, budget-bounded
  • File edits: atomic write, path escape rejection, file size cap
  • Session isolation: each session gets independent MCP servers, history, cancel channel

tlongwell-block and others added 30 commits May 5, 2026 23:20
Minimal ACP agent in Rust replacing goose. Speaks ACP over stdio,
calls LLM (Anthropic or OpenAI-compatible) non-streaming, uses MCP
tools via rmcp.

Five files (types, mcp, llm, agent, main). Provider enum (no trait).
Typed HistoryItem prevents tool_call_id pairing bugs. One session,
one prompt in flight. Env-var config only.

Tests not yet added.
Adds tests/fake_llm.rs with a minimal HTTP/1.1 server returning canned
LLM responses. Three tests:

  - text_only_end_turn: verifies initialize, session/new, session/prompt,
    agent_message_chunk, end_turn stopReason.
  - tool_call_then_text: verifies tool_call(pending) →
    request_permission → permission reply → tool_call_update(failed)
    [unknown tool synthetic error] → next round → end_turn.
  - rejects_concurrent_prompts: verifies -32602 on second in-flight
    session/prompt.

Also fixes a writer race: notifications must flush before the response
they precede. Added biased select! + try_recv drain to guarantee
agent_message_chunk arrives before stopReason on the wire.
1. agent.rs - only record assistant history when tool_calls is non-empty
2. mcp.rs - collapse_content takes max_bytes and stops appending once full
3. types.rs - clamp() always returns <= max bytes, even with tiny max
4. llm.rs - bound LLM response body to 16 MiB via chunk loop
5. agent.rs/main.rs - cancellation cleans up pending permission entry and
   emits a terminal tool_call_update(failed,cancelled)
6. main.rs - explicit JSON-RPC classification (request/notification/
   response/malformed); -32600 for requests with missing params
- mcp.rs: push_bounded helper bounds ALL collapse_content branches
  (text, image, audio, resource-link, resource elision)
- agent.rs: always remove pending permission entry on any non-success
  exit (cancel, wire send failure, dropped oneshot)
- main.rs: own history directly in the prompt task -- no Arc<Mutex>;
  Session.history is taken on prompt start and restored on completion
- main.rs: don't hold app.state across MCP child spawn; quick-check,
  release, spawn, re-check before installing the session
- main.rs: reject unterminated partial frames at EOF -- log and close
- main.rs: session_token() reads 8 random bytes from /dev/urandom
  (falls back to nanos^pid<<32) instead of leaking a stack address
- New src/wire.rs: typed ACP framing, classify(), JSON-RPC helpers,
  bounded line reader, writer task, ACP request param types.
- New src/config.rs: Config + env parsing + protocol/byte constants.
- src/types.rs: pure domain types only.
- src/agent.rs: RunCtx struct (was 9-arg run_prompt). Rejects
  unsupported ContentBlock with -32602; treats provider ToolUse + 0
  calls as error.
- src/main.rs: thin dispatch. Validates cwd is absolute (-32602),
  protocolVersion (-32602), strict JSON-RPC classify (no-method-no-id
  => -32600), never responds to notifications. session_token() uses
  getrandom.
- Rename ACP_SEED_* env vars to SPROUT_AGENT_* in src/ and tests/.
- Remove Session.history Option workaround (mem::take on Vec).
- Drop #[allow(clippy::too_many_arguments)].
- tests/golden_transcripts.rs: 10 tests that read like ACP spec examples.
  Covers initialize handshake, version check, cwd absolute validation,
  text-only response, full tool-call transcript (pending -> failed),
  permission flow, unsupported content block, malformed JSON-RPC,
  oversized line, concurrent prompt rejection, cancel notification.
- main.rs: introduce decode() and reject() helpers to collapse 4 stages
  of duplicated error-emit code; trim from 317 to 303 LOC.
- README.md: add metadata + canonical SPROUT_AGENT_* env vars.
- regressions.rs picked up SPROUT_AGENT_* rename.

cargo clippy -p sprout-agent --tests -- -D warnings: clean.
cargo test -p sprout-agent: 25/25 passing (15 prior + 10 golden).
3 MCP tools (shell, todo, str_replace) plus an rg PATH shim that
prefers the system ripgrep and falls back to a built-in matcher.

- shell: ephemeral bash -c, workdir param, timeout with process-group
  kill via setsid + killpg, tail-heavy truncation (last ~8KB) with
  full output saved to a per-session artifact ring buffer (last 8).
- todo: persistent file under the session tempdir; replace-all when
  content is given, read otherwise.
- str_replace: atomic find-and-replace via NamedTempFile + persist,
  unique-match enforcement, unified diff output, fuzzy line hint
  (similar crate) on misses.
- rg shim: per-session mkdtemp (0700) hardlinked to the binary;
  prepended to PATH only inside the shell tool's env. argv0 dispatch
  re-execs the system rg with the shim removed from PATH; falls back
  to a small built-in supporting --files, -n, -i, -l, -g, -C.
- Bootstrap: ServerInfo.instructions describes tools, working
  directory, detected stack (Cargo.toml, package.json, go.mod, ...).

938 LOC across 6 files. No clap/anyhow/thiserror/tracing.
Must fix (safety/correctness):
- str_replace: 10MB file size cap before read
- str_replace: count_occurrences_capped stops at 2 matches (memory)
- rg fallback: stream BufReader line-by-line; bounded sink caps total
  output at 50KB / 2000 lines (matches shell tool caps)
- todo: 1MB content cap + atomic write (temp + rename)
- shell: spawn failure now returns CallToolResult::error (is_error=true)
  instead of fake success with exit_code -1
- shell: artifact wording is honest about the 10MB capture cap
- rg: streaming design eliminates the m + opts.context overflow path

Should fix (quality):
- shim: drop guard renamed _dir (private), removed allow(dead_code)
- shell: kill / wait / try_wait errors surface in response notes field
  instead of being swallowed
- str_replace: nearest_line_hint capped at first 200 lines

Tests (16, all passing):
- str_replace: count_occurrences_capped, resolve_within rejects escape,
  basic replace + diff, outside-workspace rejection, file-too-large
- todo: read/write round-trip, oversize rejection
- shell: basic echo, timeout fires, workdir honored
- rg: parse basic / files-only / unknown flag, CappedSink byte limit,
  glob matching, scan_file finds match

Verification:
- cargo clippy --all-targets clean
- cargo test pass (16/16)
- zero expect/unwrap in production code
- production LOC: 1193 (under 1200 budget)
Split kill_process_group into two strategies:
- kill_process_group_graceful (async): SIGTERM → tokio::time::sleep(200ms) → SIGKILL
- kill_process_group_immediate (sync): SIGKILL only, for Drop where async is unavailable

Eliminates 200ms thread::sleep that blocked the current_thread runtime.
The LLM could previously pass any directory as workdir, escaping the
workspace boundary. Now both shell and str_replace canonicalize the
provided workdir and reject it if it doesn't start_with the server's
initial cwd. Symlink escapes are caught by canonicalization.
…not 0)

A pathological server that starts fine but deadlocks on every tool call
would previously get infinite restart attempts because kill_server reset
attempts to 0. Now it starts at 1, so repeated kill→restart cycles
eventually exhaust the budget.
MCP server initialization (spawn + handshake) can take up to 30s.
Previously this blocked the single reader task, preventing session/cancel
from being processed during that window. Now spawned like session/prompt.
- todo.rs: remove anti-removal enforcement, control-char rejection,
  deny_unknown_fields. Keep end-turn gate. 405 → 287 lines.
- tree.rs: new PATH shim (like rg). Shows directory structure with
  line counts. 102 lines, zero comments, codex-approved.
- Remove 8 unhelpful comments across agent.rs, mcp.rs, shell.rs.
- Deduplicate MCP descriptions: server instructions shrink to 3 lines,
  tool descriptions are self-contained with no overlap.
- shim.rs: symlink tree alongside rg.
- main.rs: add mod tree + argv[0] dispatch.
tree / was hanging forever — it walked the entire filesystem before
truncating output. Now collect() checks out.len() >= line_budget at
every recursion entry and before each file/dir, stopping as soon as
we have enough lines to fill the output cap.
…add, -- support

Fixes from codex audit:
- Skip files >10MB instead of reading into memory (OOM prevention)
- Use writeln! to stdout lock instead of println! (no panic on broken pipe)
- saturating_add on line count totals (overflow prevention)
- Support -- terminator for paths starting with -
Replaced manual read_dir + SKIP_DIRS with ignore::WalkBuilder.
Handles .gitignore, .ignore, global gitignore, nested ignores,
and hidden files. Single-pass with stack-based directory totals.
Still bounded (line budget, file size cap, broken pipe safe).
VISION_AGENT.md: 10 files (not 9), 39 tests (not 25), ~2,900 LOC (not ~2,500),
~4,400 total (not ~4,000), sprout-dev-mcp 7 files (not 6).
README.md: ten files, seven deps, updated architecture diagram, ~2,900 LOC.
The while-loop guard ensures stack is non-empty, making the unwrap
logically unreachable. Replace with let-else to eliminate the panic
path entirely and satisfy #![forbid(unsafe_code)] + zero-panic claims.
…pact)

Remove the 295-LOC synthetic todo tool from sprout-agent and replace it
with a generic MCP hook system. Any MCP server can now participate in
agent lifecycle events by exposing tools prefixed with _.

Hook system (~80 LOC in agent):
- call_hooks(): single DRY dispatch function for all lifecycle points
- _Stop: called before honoring end_turn; objections continue the loop
- _PostCompact: called after context handoff; re-injects server state
- Tools prefixed with _ are filtered from LLM and rejected if called directly
- Fail-open: timeouts kill the server, errors are silent, hooks never block
- Agent sovereignty: 500ms timeout, 3-rejection session budget,
  consecutive end_turn without tool calls = respect LLM's decision

Todo reimplemented in sprout-dev-mcp:
- Regular 'todo' MCP tool (CRUD, same schema/validation as before)
- _Stop hook returns objection when open items exist
- _PostCompact hook returns full list for context re-injection
- schemars annotations expose constraints (max 50 items, id<=9999, etc)

Configuration:
- MCP_HOOK_SERVERS: operator allowlist (unset=no hooks, *=all)
- SPROUT_AGENT_HOOK_TIMEOUT_MS: per-hook timeout (default 500)
- SPROUT_AGENT_STOP_MAX_REJECTIONS: session budget (default 3, 0=disable)

Compatible with Open Plugin Spec hook naming conventions.
MCP_HOOK_SERVERS is a standard env var name for cross-agent adoption.

Tests: 64 passing (15 regression, 12 config unit, 10 golden, 4 transcript,
23 dev-mcp). New coverage: stop-blocks, budget-exhausted, consecutive-end,
timeout-failopen, hidden-tool-rejection, post-compact-injection.
tlongwell-block and others added 5 commits May 8, 2026 22:26
Remove numeric IDs from todo items. Schema is now [{text, done}] —
position is display-only. The LLM no longer invents/tracks IDs.

Add silent-removal detection: if open items disappear from the list
without being marked done, the tool response includes a soft warning.
This closes the _Stop bypass where the LLM could just delete open items.

Hardening:
- Reject control characters and Unicode trickery (bidi, zero-width, etc)
- Reject duplicate text (after trim normalization)
- deny_unknown_fields on all param structs
- Trim text on storage for consistent identity
- Length validation on trimmed form
- Atomic write+render under single lock hold

39 tests in sprout-dev-mcp (was 23). Codex GPT 5.5 scored 9/10 —
remaining items are exotic Unicode normalization and the intentional
advisory (not authoritative) design of the removal warning.
tlongwell-block and others added 3 commits May 9, 2026 15:29
Documentation:
- README: fix 'no hooks'/'no compaction' claims to match implementation
- README: fix tool-name separator from <server>.<tool> to server__tool
- README: fix 'text never reaches client' — agent emits agent_message_chunk
- README: fix stale method count (was 'six methods', now accurate)
- README/VISION_AGENT.md/Cargo.toml: remove all LOC claims (they drift)
- Fix misleading handoff log message ('-> 1 item' -> actual count)

Logging:
- Migrate both crates from eprintln!/custom log_*! macros to tracing
  (consistent with the rest of the sprout codebase)
- Delete both log.rs files
- Init tracing-subscriber writing to stderr in both main() functions
When a session is cancelled, the agent now sends notifications/cancelled
to in-flight MCP servers via rmcp's cancellable request API. sprout-dev-mcp
observes the cancellation token and kills the running shell process group.

Agent side:
- do_call uses send_cancellable_request + select! cancel vs response
- fire_and_forget_cancel() helper sends notification without blocking
- execute_parallel closes semaphore on cancel (cooperative drain, 5s bound)
- Early borrow() check prevents missed cancellation on pre-cancelled receivers
- Typed AgentError::Cancelled variant (no string matching)

Server side:
- RequestContext<RoleServer>.ct threaded into shell::run
- Shell wait loop observes ct.cancelled() → SIGKILL process group + bounded reap
- PgidGuard only disarmed after successful reap

Tests:
- End-to-end: sleep 60 killed on cancel, PID verified dead via kill -0
- Protocol-level: fake_mcp receives notifications/cancelled with correct requestId
- fake_mcp refactored to channel-based reader thread for notification capture
- All waits use bounded polling (no fixed sleeps)
@tlongwell-block tlongwell-block merged commit 082414b into main May 11, 2026
14 checks passed
@tlongwell-block tlongwell-block deleted the tyler/sprout-agent branch May 11, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant