Skip to content

Live mode

Bang Juwon edited this page May 14, 2026 · 12 revisions

Live mode — Real Claude API + MCP stdio

dart-agent ships in two modes: deterministic (scripted policy, no API key needed) and live (real Claude API connected to dart-mcp over JSON-RPC stdio). This page documents live mode end-to-end.


Why both modes exist

Deterministic Live
LLM None — scripted policy Claude (default claude-sonnet-4)
API key required No ANTHROPIC_API_KEY
Use case CI, reproducibility, air-gapped runs Real DFIR work, judgment-heavy cases
Network egress None Outbound HTTPS to api.anthropic.com
Architectural guarantees Same Same

The architectural guarantees (read-only MCP boundary, audit chain, contradiction enforcement) are identical across modes. The only difference is who picks the next call: the YAML playbook policy, or Claude.


Setup

git clone https://github.com/Juwon1405/agentic-dart.git
cd agentic-dart
bash scripts/install.sh

export ANTHROPIC_API_KEY="sk-ant-..."
export DART_EVIDENCE_ROOT=/mnt/case-evidence

To register dart-mcp with Claude Code (so you can run it interactively):

claude mcp add agentic-dart -s user -- python3 -m dart_mcp.server

Then in your Claude Code session:

/mcp call agentic-dart get_amcache --hive_path AmCache.hve
/mcp call agentic-dart parse_prefetch --target chrome.exe

Running the agent loop in live mode

# Evidence root is set via env var (not a CLI flag)
export DART_EVIDENCE_ROOT=/mnt/case-evidence

python3 -m dart_agent \
    --case CASE-2026-001 \
    --out ./out/case-2026-001 \
    --mode live \
    --max-iterations 25

(Add --dry-run to use a scripted mock Claude with no API key — useful for CI.)

The agent:

  1. Spawns dart-mcp as a subprocess with stdio piped.
  2. Performs the JSON-RPC initialize handshake.
  3. Calls tools/list — Claude sees exactly 60 typed forensic functions (native + SIFT adapters), nothing more.
  4. Loops:
    • Sends the current state + hypothesis to Claude as a messages.create request with tools=[...the 60...].
    • Claude returns a tool_use block selecting one tool + arguments.
    • The agent forwards the call to dart-mcp over stdio.
    • The output goes into the audit chain and back to Claude as a tool_result message.
    • dart-corr runs on the new state. Contradictions force hypothesis revision.
  5. Stops at confidence ≥ 0.90, max iterations, or when Claude emits no further tool_use.

What Claude can and cannot do in live mode

Can:

  • Choose any of the typed MCP functions on the surface (67 total — native + SIFT adapters)
  • Pass any schema-valid arguments
  • Reason about the output and pick the next call

Cannot:

  • Call functions not on the surface — this raises ToolNotFound at the wire boundary, not at the agent
  • Modify evidence — no function on the surface can write
  • Bypass the audit log — the agent runs audit.log() after every result, before the result is consumed
  • Ignore UNRESOLVED contradictions — dart-corr runs after every step and the serializer refuses to emit findings while contradictions are open

This is the architectural guarantee made concrete: a fully jailbroken model is still bounded by the surface.


Wire-level tests

tests/test_live_mcp.py runs end-to-end tests against the real MCP stdio server (with a scripted "mock Claude" that picks tools deterministically). No API key required:

python3 tests/test_live_mcp.py

The four assertions:

  1. Initialize handshake completes
  2. tools/list advertises the full typed MCP surface (native + SIFT adapters)
  3. Calling a non-registered function returns ToolNotFound over the wire
  4. The full loop produces a chain-verified audit log

Cost / performance notes

A single iteration of the live loop consumes ~5K-15K tokens depending on artifact size (mostly tool output sent back to Claude). The bundled IP-KVM case completes in ~5 iterations. A typical real-case run (15-25 iterations) costs roughly $0.50-$1.50 with claude-sonnet-4 at current pricing. Costs are logged in the audit chain (token count per call).

For air-gapped or cost-sensitive environments, deterministic mode handles the same case classes the playbook covers, with no external dependency.


Troubleshooting

Symptom Likely cause Fix
ANTHROPIC_API_KEY not set env var missing export ANTHROPIC_API_KEY=...
MCP handshake timeout dart-mcp subprocess crashed at startup Run python3 -m dart_mcp.server directly to see the error
tools/list returns 0 tools Wrong PYTHONPATH export PYTHONPATH="$PWD/dart_mcp/src:..."
Loop hangs Claude waiting on tool_result that never arrived Check audit.jsonl for the last call — likely a parser raised silently

See also

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally