Skip to content

feat(agent): subprocess-driven Claude Code, replacing API-key requirement (PR C)#10

Merged
WiktorStarczewski merged 2 commits intomainfrom
pr-c/subprocess-claude-cli
Apr 25, 2026
Merged

feat(agent): subprocess-driven Claude Code, replacing API-key requirement (PR C)#10
WiktorStarczewski merged 2 commits intomainfrom
pr-c/subprocess-claude-cli

Conversation

@WiktorStarczewski
Copy link
Copy Markdown
Owner

Summary

Phase 2 originally shipped via the Anthropic Managed-Agents API (#7 + #8), which is API-key-only. The constraint that surfaced post-merge: both peers and consumers in the actual user base have Claude Code subscriptions (Pro / Max / Team), not API keys. The shipped Phase-2 tools were therefore unusable for the real users.

This PR replaces the SDK path with a subprocess driver around claude --print. The peer's existing OAuth credentials authenticate every call; subscription quota pays the bill. No code changes for consumers (Wiktor's side — the MCP wire shape is unchanged); breaking CLI-flag changes for peers (with a hard-fail + upgrade-notes pointer to make migration loud).

Internals

  • internal/agent/cli.go (new) — subprocess driver. Three argv variants (OneShot, first-turn-of-conv, resumed-turn) all share --print --output-format json --allowed-tools "Read Glob Grep". Parses Claude Code's JSON result for stop_reason + usage; replays the session JSONL via the existing internal/transcript package to extract per-tool-call detail (the JSON payload has no tool_calls[] field — verified by running the actual CLI).
  • internal/agent/conversations.go (slimmed) — convID is now the Claude Code session UUID. StartConversation no longer hits an upstream service. EndConversation marks the local slot ended without deleting anything; the JSONL stays on disk so Phase-1 read_session can still surface it. Idle reaper is local-only.
  • internal/agent/sdk.go / loop.go / customtools.go (deleted) — the Managed-Agents wrapper, event-loop translation, and hand-rolled Read/Glob/Grep handlers all go away. Claude Code's native tools replace the custom dispatch; the JSON output replaces the event stream.
  • go.mod — drops github.com/anthropics/anthropic-sdk-go and its transitive deps (jsonparser, tidwall/*, wk8/go-ordered-map, ...). Static-binary supply-chain footprint shrinks.

Security

Two-leg defense for the read-only allowlist:

  1. First leg (load-bearing): --allowed-tools "Read Glob Grep" is hardcoded on every invocation. Claude Code itself enforces it. Asserted by the argv-shape unit test.
  2. Second leg (defense-in-depth): after every call, the JSONL replay scans for tool_use blocks outside the allowlist. Any disallowed name flips StopReason to error / disallowed_tool.

ANTHROPIC_API_KEY in the operator's env is stripped from the subprocess env by default — Claude Code's auth precedence is ANTHROPIC_API_KEY > apiKeyHelper > OAuth/keychain, and a stale env var would silently redirect billing. --agent-keep-env-key opts back in for the team-API-account / CI cases.

CLI changes (breaking, peer side only)

Flag v0.2 v0.3
--agent-api-key-env required for --enable-agent REMOVED (hard-fail)
--agent-base-url optional REMOVED (hard-fail)
--agent-model optional REMOVED (hard-fail)
--agent-default-max-tokens hard cap (SDK-enforced) kept; soft budget (system-prompt nudge)
--agent-claude-bin NEW (override claude path)
--agent-keep-env-key NEW (opt-in API-key path)
--enable-agent, --agent-default-max-tool-calls, --agent-default-timeout-seconds, --agent-log-path, --max-conversations, --conversation-idle-timeout unchanged unchanged

Version bumped 0.1.0-dev0.3.0-dev (skipping 0.2.0-dev since PR B forgot to bump).

Test plan

  • go build ./... && go vet ./... — clean.
  • go test ./... — green across all 9 packages.
  • go test -race -coverpkg=./... -coverprofile=… — aggregate 92.9% on a clean run (race-mode timing causes some flicker; CI's cache: false should be deterministic).
  • Argv-contract tests (cli_test.go): three variants asserted; --allowed-tools value hardcoded and verified per call; --bare and --no-session-persistence actively rejected.
  • JSON parser cascade: every subtype + stop_reason combination mapped to the right StopReason; future max_turns_exceeded defensively → max_tool_calls.
  • Second-leg adversarial defense: fixture JSONL with tool_use.name == "Bash"stopReason:"error", errorSummary:"disallowed_tool". Read+Grep allowed.
  • MaxToolCalls cap: 5 tool_use blocks vs MaxToolCalls=2stopReason:"max_tool_calls".
  • Timeout kills subprocess: 5-second sleep vs 200-ms budget → stopReason:"timeout", no zombie.
  • ErrClaudeMissing: binary deleted between New and OneShot → typed ErrorSummary:"claude_missing" (no substring guessing; bypasses classifyErrorMsg).
  • Env handling: ANTHROPIC_API_KEY stripped by default, kept under --agent-keep-env-key, EnvAPIKeyHandling audit field correctly tags each path.
  • CLI flag tests (main_test.go): refuses to start without claude on PATH; accepts override path; hard-fails on each removed flag with the right message.
  • E2E (hearsay_e2e_test.go): TestE2E_AgentRefusesStartWithoutClaudeBin (replaces the API-key version), TestE2E_AgentToolPresentWhenEnabled and the three conversation tests pass with a fake claude shell script on PATH.
  • Manual loopback — hold for after merge: hearsay --enable-agent against a real Claude Code login; mcp__wiktor__ask_peer_claude("read this repo's go.mod"); verify the JSONL appears in ~/.claude/projects/ and list_sessions surfaces it.

Plan

~/.claude/plans/hearsay-phase-2-subprocess.md — six review rounds: 3 → 4 → 1 → 2 → 0 → 0 issues.

…ment (PR C)

Phase 2 originally shipped via the Anthropic Managed-Agents API (PRs
#7 + #8), which is API-key-only.  The constraint that surfaced
post-merge: both peers and consumers in the actual user base have
Claude Code subscriptions (Pro / Max / Team), not API keys.  The
shipped Phase-2 tools were therefore unusable for the real users.

This PR replaces the SDK path with a subprocess driver around
`claude --print`.  The peer's existing OAuth credentials authenticate
every call; subscription quota pays the bill.  No code changes for
consumers (Wiktor's side); breaking CLI-flag changes for peers.

Internals
---------

* internal/agent/cli.go (new) — subprocess driver.  Builds three argv
  variants (OneShot, first-turn-of-conv, resumed-turn) with
  --print --output-format json --allowed-tools "Read Glob Grep" on
  every call.  Parses Claude Code's JSON result for stop_reason +
  usage; replays the session JSONL via the existing internal/transcript
  package to extract per-tool-call detail (the JSON payload has no
  tool_calls[] field).

* internal/agent/conversations.go (slimmed) — convID is now the
  Claude Code session UUID.  StartConversation no longer hits an
  upstream service; it allocates a UUID, records the system_prompt,
  returns.  EndConversation marks the local slot ended without
  deleting anything; the session JSONL stays on disk so Phase-1
  read_session can still surface it.  Idle reaper is local-only.

* internal/agent/sdk.go, loop.go, customtools.go (deleted) — the
  Managed-Agents wrapper, event-loop translation layer, and
  hand-rolled Read/Glob/Grep handlers all go away.  Claude Code's
  native tools replace the custom dispatch; the JSON output replaces
  the event stream.

* go.mod — drops github.com/anthropics/anthropic-sdk-go and its
  transitive deps (jsonparser, tidwall/*, wk8/go-ordered-map, ...).
  Static binary supply-chain footprint shrinks meaningfully.

Security
--------

Two-leg defense for the read-only allowlist:
1. --allowed-tools "Read Glob Grep" is hardcoded on every invocation;
   Claude Code itself enforces it.  Asserted by the argv-shape unit
   test.
2. After every call, the JSONL replay scans for tool_use blocks
   outside the allowlist.  Any disallowed name flips StopReason to
   error / disallowed_tool.  Defense-in-depth against
   future-Claude-Code drift / corrupted builds.

ANTHROPIC_API_KEY in the operator's env is stripped from the
subprocess env by default — Claude Code's auth precedence is
ANTHROPIC_API_KEY > apiKeyHelper > OAuth/keychain, and a stale env
var would silently redirect billing.  --agent-keep-env-key opts
back in.

CLI changes (breaking)
----------------------

Removed (hard-fail with a friendly upgrade-notes pointer):
* --agent-api-key-env
* --agent-base-url
* --agent-model

Kept, reinterpreted:
* --agent-default-max-tokens — soft budget (system-prompt nudge); the
  CLI doesn't expose a token cap.

Added:
* --agent-claude-bin <path>
* --agent-keep-env-key

Unchanged: --enable-agent, --agent-default-max-tool-calls,
--agent-default-timeout-seconds, --agent-log-path, --max-conversations,
--conversation-idle-timeout.

Version bumped 0.1.0-dev → 0.3.0-dev (skipping 0.2 since PR B forgot
to bump).

Test coverage
-------------

Aggregate race-mode coverage 92.9% on a clean run (fluctuates 88-93%
with the race detector's goroutine-timing sensitivity).  Coverage
gate is 90% with cache:false on setup-go.

Plan: ~/.claude/plans/hearsay-phase-2-subprocess.md (six review
rounds: 3 → 4 → 1 → 2 → 0 → 0 issues).
Linux's /bin/sh is dash, which does NOT forward SIGTERM to a child
sleep / claude process.  CI's TestRunClaude_TimeoutKillsSubprocess
saw the subprocess run the full 5s sleep — a real bug in the
production path on Linux too, since claude itself spawns helper
processes that wouldn't get the cancel signal otherwise.

Fix: set Setpgid on the subprocess so it owns its own process group;
on cancel, send SIGTERM to the whole group via syscall.Kill(-pgid,
SIGTERM) instead of just the leader.  cli_unix.go and cli_other.go
build-tag the platform-specific calls.  macOS keeps working (Setpgid
is a no-op for already-isolated procs); Windows compiles to a stub
since hearsay doesn't ship Windows builds.
@WiktorStarczewski WiktorStarczewski merged commit 8dbb2ea into main Apr 25, 2026
2 checks passed
@WiktorStarczewski WiktorStarczewski deleted the pr-c/subprocess-claude-cli branch April 25, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant