Skip to content

Live mode

Juwon1405 edited this page Jun 15, 2026 · 12 revisions

Live mode — Real Claude API + MCP stdio

dart-agent ships in two modes: deterministic (scripted policy, no API key needed) and live (real Claude API connected to dart-mcp over JSON-RPC stdio). This page documents live mode end-to-end.


Why both modes exist

Deterministic Live
LLM None — scripted policy Claude (default claude-haiku-4-5-20251001)
API key required No ANTHROPIC_API_KEY
Use case CI, reproducibility, air-gapped runs Real DFIR work, judgment-heavy cases
Network egress None Outbound HTTPS to api.anthropic.com
Architectural guarantees Same Same

The architectural guarantees (read-only MCP boundary, audit chain, contradiction enforcement) are identical across modes. The only difference is who picks the next call: the YAML playbook policy, or Claude.


Setup

git clone https://github.com/Juwon1405/agentic-dart.git
cd agentic-dart
bash scripts/install.sh

# Authenticate with an Anthropic API key:
export ANTHROPIC_API_KEY="sk-ant-..."

The user-facing runner is python3 run_eval.py --case <tier>/case-NN (it authenticates via ANTHROPIC_API_KEY and fails fast if it is not set).

To register dart-mcp with Claude Code (so you can run it interactively):

claude mcp add agentic-dart -s user -- python3 -m dart_mcp.server_stdio

Then in your Claude Code session:

/mcp call agentic-dart get_amcache --hive_path AmCache.hve
/mcp call agentic-dart parse_prefetch --target chrome.exe

Running the agent loop in live mode

# Evidence root is set via env var (not a CLI flag)
export DART_EVIDENCE_ROOT=/mnt/case-evidence

python3 -m dart_agent \
    --case CASE-2026-001 \
    --out ./out/case-2026-001 \
    --mode live \
    --max-iterations 25

(Add --dry-run to use a scripted mock Claude with no API key — useful for CI.)

The agent:

  1. Spawns dart-mcp as a subprocess with stdio piped.
  2. Performs the JSON-RPC initialize handshake.
  3. Calls tools/list — Claude sees exactly 73 typed forensic functions (48 native + 25 SIFT adapters), nothing more.
  4. Loops:
    • Sends the current state + hypothesis to Claude as a messages.create request with tools=[...the 72...].
    • Claude returns a tool_use block selecting one tool + arguments.
    • The agent forwards the call to dart-mcp over stdio.
    • The output goes into the audit chain and back to Claude as a tool_result message.
    • dart-corr runs on the new state. Contradictions force hypothesis revision.
  5. Stops at confidence ≥ 0.90, max iterations, or when Claude emits no further tool_use.

What Claude can and cannot do in live mode

Can:

  • Choose any of the typed MCP functions on the surface (73 total — 48 native + 25 SIFT adapters)
  • Pass any schema-valid arguments
  • Reason about the output and pick the next call

Cannot:

  • Call functions not on the surface — this raises ToolNotFound at the wire boundary, not at the agent
  • Modify evidence — no function on the surface can write
  • Bypass the audit log — the agent runs audit.log() after every result, before the result is consumed
  • Ignore UNRESOLVED contradictions — dart-corr runs after every step and the serializer refuses to emit findings while contradictions are open

This is the architectural guarantee made concrete: a fully jailbroken model is still bounded by the surface.


Wire-level tests

tests/test_live_mcp.py runs end-to-end tests against the real MCP stdio server (with a scripted "mock Claude" that picks tools deterministically). No API key required:

python3 tests/test_live_mcp.py

The four assertions:

  1. Initialize handshake completes
  2. tools/list advertises the full typed MCP surface (native + SIFT adapters)
  3. Calling a non-registered function returns ToolNotFound over the wire
  4. The full loop produces a chain-verified audit log

Performance and usage notes

A single iteration of the live loop consumes tokens depending on artifact size and the amount of tool output sent back to Claude. The bundled IP-KVM case typically completes in about five iterations. Token counts are recorded in the live run outputs so operators can review usage after the run. Check current Anthropic pricing and account limits before running live investigations.

For air-gapped or credential-free reproduction, deterministic mode handles the same case classes the playbook covers with no external dependency. --dry-run also exercises the live MCP plumbing without contacting the API.


Troubleshooting

Symptom Likely cause Fix
ANTHROPIC_API_KEY not set env var missing export ANTHROPIC_API_KEY=...
MCP handshake timeout dart-mcp subprocess crashed at startup Run python3 -m dart_mcp.server_stdio directly to see the error
tools/list returns 0 tools Wrong PYTHONPATH export PYTHONPATH="$PWD/dart_mcp/src:..."
Loop hangs Claude waiting on tool_result that never arrived Check audit.jsonl for the last call — likely a parser raised silently

See also

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally