Live mode

Live mode — Real Claude API + MCP stdio

dart-agent ships in two modes: deterministic (scripted policy, no API key needed) and live (real Claude API connected to dart-mcp over JSON-RPC stdio). This page documents live mode end-to-end.

Why both modes exist

	Deterministic	Live
LLM	None — scripted policy	Claude (default `claude-haiku-4-5-20251001`)
Credentials required	No	`ANTHROPIC_API_KEY` or `claude login`
Use case	CI, reproducibility, air-gapped runs	Real DFIR work, judgment-heavy cases
Network egress	None	Outbound HTTPS to `api.anthropic.com`
Architectural guarantees	Same	Same

The architectural guarantees (read-only MCP boundary, audit chain, contradiction enforcement) are identical across modes. The only difference is who picks the next call: the YAML playbook policy, or Claude.

Setup

git clone https://github.com/Juwon1405/agentic-dart.git
cd agentic-dart
bash scripts/install.sh

# Authenticate — pick one:
export ANTHROPIC_API_KEY="sk-ant-..."   # an API key, or
claude login                            # sign in with Claude Code

The user-facing runner is python3 run_eval.py --case <tier>/case-NN (it accepts either credential above and fails fast if neither is present).

To register dart-mcp with Claude Code (so you can run it interactively):

claude mcp add agentic-dart -s user -- python3 -m dart_mcp.server_stdio

Then in your Claude Code session:

/mcp call agentic-dart get_amcache --hive_path AmCache.hve
/mcp call agentic-dart parse_prefetch --target chrome.exe

Running the agent loop in live mode

# Evidence root is set via env var (not a CLI flag)
export DART_EVIDENCE_ROOT=/mnt/case-evidence

python3 -m dart_agent \
    --case CASE-2026-001 \
    --out ./out/case-2026-001 \
    --mode live \
    --max-iterations 25

(Add --dry-run to use a scripted mock Claude with no API key — useful for CI.)

The agent:

Spawns dart-mcp as a subprocess with stdio piped.
Performs the JSON-RPC initialize handshake.
Calls tools/list — Claude sees exactly 72 typed forensic functions (47 native + 25 SIFT adapters), nothing more.
Loops:
- Sends the current state + hypothesis to Claude as a messages.create request with tools=[...the 72...].
- Claude returns a tool_use block selecting one tool + arguments.
- The agent forwards the call to dart-mcp over stdio.
- The output goes into the audit chain and back to Claude as a tool_result message.
- dart-corr runs on the new state. Contradictions force hypothesis revision.
Stops at confidence ≥ 0.90, max iterations, or when Claude emits no further tool_use.

What Claude can and cannot do in live mode

Can:

Choose any of the typed MCP functions on the surface (72 total — 47 native + 25 SIFT adapters)
Pass any schema-valid arguments
Reason about the output and pick the next call

Cannot:

Call functions not on the surface — this raises ToolNotFound at the wire boundary, not at the agent
Modify evidence — no function on the surface can write
Bypass the audit log — the agent runs audit.log() after every result, before the result is consumed
Ignore UNRESOLVED contradictions — dart-corr runs after every step and the serializer refuses to emit findings while contradictions are open

This is the architectural guarantee made concrete: a fully jailbroken model is still bounded by the surface.

Wire-level tests

tests/test_live_mcp.py runs end-to-end tests against the real MCP stdio server (with a scripted "mock Claude" that picks tools deterministically). No API key required:

python3 tests/test_live_mcp.py

The four assertions:

Initialize handshake completes
tools/list advertises the full typed MCP surface (native + SIFT adapters)
Calling a non-registered function returns ToolNotFound over the wire
The full loop produces a chain-verified audit log

Performance and usage notes

A single iteration of the live loop consumes tokens depending on artifact size and the amount of tool output sent back to Claude. The bundled IP-KVM case typically completes in about five iterations. Token counts are recorded in the live run outputs so operators can review usage after the run. Check current Anthropic pricing and account limits before running live investigations.

For air-gapped or credential-free reproduction, deterministic mode handles the same case classes the playbook covers with no external dependency. --dry-run also exercises the live MCP plumbing without contacting the API.

Troubleshooting

Symptom	Likely cause	Fix
`ANTHROPIC_API_KEY not set`	env var missing	`export ANTHROPIC_API_KEY=...`
`MCP handshake timeout`	`dart-mcp` subprocess crashed at startup	Run `python3 -m dart_mcp.server_stdio` directly to see the error
`tools/list returns 0 tools`	Wrong PYTHONPATH	`export PYTHONPATH="$PWD/dart_mcp/src:..."`
`Loop hangs`	Claude waiting on tool_result that never arrived	Check `audit.jsonl` for the last call — likely a parser raised silently

Live mode

Live mode — Real Claude API + MCP stdio

Why both modes exist

Setup

Running the agent loop in live mode

What Claude can and cannot do in live mode

Wire-level tests

Performance and usage notes

Troubleshooting

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project

Project links

Clone this wiki locally