Live mode

Live mode — Real Claude API + MCP stdio

dart-agent ships in two modes: deterministic (scripted policy, no API key needed) and live (real Claude API connected to dart-mcp over JSON-RPC stdio). This page documents live mode end-to-end.

Why both modes exist

	Deterministic	Live
LLM	None — scripted policy	Claude (default `claude-haiku-4-5`)
API key required	No	`ANTHROPIC_API_KEY`
Use case	CI, reproducibility, air-gapped runs	Real DFIR work, judgment-heavy cases
Network egress	None	Outbound HTTPS to `api.anthropic.com`
Architectural guarantees	Same	Same

The architectural guarantees (read-only MCP boundary, audit chain, contradiction enforcement) are identical across modes. The only difference is who picks the next call: the YAML playbook policy, or Claude.

Setup

git clone https://github.com/Juwon1405/agentic-dart.git
cd agentic-dart
bash scripts/install.sh

export ANTHROPIC_API_KEY="sk-ant-..."
export DART_EVIDENCE_ROOT=/mnt/case-evidence

To register dart-mcp with Claude Code (so you can run it interactively):

claude mcp add agentic-dart -s user -- python3 -m dart_mcp.server

Then in your Claude Code session:

/mcp call agentic-dart get_amcache --hive_path AmCache.hve
/mcp call agentic-dart parse_prefetch --target chrome.exe

Running the agent loop in live mode

# Evidence root is set via env var (not a CLI flag)
export DART_EVIDENCE_ROOT=/mnt/case-evidence

python3 -m dart_agent \
    --case CASE-2026-001 \
    --out ./out/case-2026-001 \
    --mode live \
    --max-iterations 25

(Add --dry-run to use a scripted mock Claude with no API key — useful for CI.)

The agent:

Spawns dart-mcp as a subprocess with stdio piped.
Performs the JSON-RPC initialize handshake.
Calls tools/list — Claude sees exactly 72 typed forensic functions (47 native + 25 SIFT adapters), nothing more.
Loops:
- Sends the current state + hypothesis to Claude as a messages.create request with tools=[...the 60...].
- Claude returns a tool_use block selecting one tool + arguments.
- The agent forwards the call to dart-mcp over stdio.
- The output goes into the audit chain and back to Claude as a tool_result message.
- dart-corr runs on the new state. Contradictions force hypothesis revision.
Stops at confidence ≥ 0.90, max iterations, or when Claude emits no further tool_use.

What Claude can and cannot do in live mode

Can:

Choose any of the typed MCP functions on the surface (67 total — native + SIFT adapters)
Pass any schema-valid arguments
Reason about the output and pick the next call

Cannot:

Call functions not on the surface — this raises ToolNotFound at the wire boundary, not at the agent
Modify evidence — no function on the surface can write
Bypass the audit log — the agent runs audit.log() after every result, before the result is consumed
Ignore UNRESOLVED contradictions — dart-corr runs after every step and the serializer refuses to emit findings while contradictions are open

This is the architectural guarantee made concrete: a fully jailbroken model is still bounded by the surface.

Wire-level tests

tests/test_live_mcp.py runs end-to-end tests against the real MCP stdio server (with a scripted "mock Claude" that picks tools deterministically). No API key required:

python3 tests/test_live_mcp.py

The four assertions:

Initialize handshake completes
tools/list advertises the full typed MCP surface (native + SIFT adapters)
Calling a non-registered function returns ToolNotFound over the wire
The full loop produces a chain-verified audit log

Cost / performance notes

A single iteration of the live loop consumes ~5K-15K tokens depending on artifact size (mostly tool output sent back to Claude). The bundled IP-KVM case completes in ~5 iterations. The default claude-haiku-4-5 runs at zero per-call cost on a Claude Code OAuth subscription; a typical real-case run (15-25 iterations) on claude-sonnet-4-6 (--model, higher fidelity) costs roughly $0.50-$1.50 at current pay-as-you-go pricing. Costs are logged in the audit chain (token count per call).

For air-gapped or cost-sensitive environments, deterministic mode handles the same case classes the playbook covers, with no external dependency.

Troubleshooting

Symptom	Likely cause	Fix
`ANTHROPIC_API_KEY not set`	env var missing	`export ANTHROPIC_API_KEY=...`
`MCP handshake timeout`	`dart-mcp` subprocess crashed at startup	Run `python3 -m dart_mcp.server` directly to see the error
`tools/list returns 0 tools`	Wrong PYTHONPATH	`export PYTHONPATH="$PWD/dart_mcp/src:..."`
`Loop hangs`	Claude waiting on tool_result that never arrived	Check `audit.jsonl` for the last call — likely a parser raised silently

Live mode

Live mode — Real Claude API + MCP stdio

Why both modes exist

Setup

Running the agent loop in live mode

What Claude can and cannot do in live mode

Wire-level tests

Cost / performance notes

Troubleshooting

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project

Project links

Clone this wiki locally