-
Notifications
You must be signed in to change notification settings - Fork 5
Live mode
dart-agent ships in two modes: deterministic (scripted policy, no API key needed) and live (real Claude API connected to dart-mcp over JSON-RPC stdio). This page documents live mode end-to-end.
| Deterministic | Live | |
|---|---|---|
| LLM | None — scripted policy | Claude (default claude-haiku-4-5-20251001) |
| Credentials required | No |
ANTHROPIC_API_KEY or claude login
|
| Use case | CI, reproducibility, air-gapped runs | Real DFIR work, judgment-heavy cases |
| Network egress | None | Outbound HTTPS to api.anthropic.com
|
| Architectural guarantees | Same | Same |
The architectural guarantees (read-only MCP boundary, audit chain, contradiction enforcement) are identical across modes. The only difference is who picks the next call: the YAML playbook policy, or Claude.
git clone https://github.com/Juwon1405/agentic-dart.git
cd agentic-dart
bash scripts/install.sh
# Authenticate — pick one:
export ANTHROPIC_API_KEY="sk-ant-..." # an API key, or
claude login # sign in with Claude CodeThe user-facing runner is python3 run_eval.py --case <tier>/case-NN (it
accepts either credential above and fails fast if neither is present).
To register dart-mcp with Claude Code (so you can run it interactively):
claude mcp add agentic-dart -s user -- python3 -m dart_mcp.server_stdioThen in your Claude Code session:
/mcp call agentic-dart get_amcache --hive_path AmCache.hve
/mcp call agentic-dart parse_prefetch --target chrome.exe
# Evidence root is set via env var (not a CLI flag)
export DART_EVIDENCE_ROOT=/mnt/case-evidence
python3 -m dart_agent \
--case CASE-2026-001 \
--out ./out/case-2026-001 \
--mode live \
--max-iterations 25(Add --dry-run to use a scripted mock Claude with no API key — useful for CI.)
The agent:
- Spawns
dart-mcpas a subprocess with stdio piped. - Performs the JSON-RPC
initializehandshake. - Calls
tools/list— Claude sees exactly 72 typed forensic functions (47 native + 25 SIFT adapters), nothing more. - Loops:
- Sends the current state + hypothesis to Claude as a
messages.createrequest withtools=[...the 72...]. - Claude returns a
tool_useblock selecting one tool + arguments. - The agent forwards the call to
dart-mcpover stdio. - The output goes into the audit chain and back to Claude as a
tool_resultmessage. -
dart-corrruns on the new state. Contradictions force hypothesis revision.
- Sends the current state + hypothesis to Claude as a
- Stops at confidence ≥ 0.90, max iterations, or when Claude emits no further
tool_use.
Can:
- Choose any of the typed MCP functions on the surface (72 total — 47 native + 25 SIFT adapters)
- Pass any schema-valid arguments
- Reason about the output and pick the next call
Cannot:
- Call functions not on the surface — this raises
ToolNotFoundat the wire boundary, not at the agent - Modify evidence — no function on the surface can write
- Bypass the audit log — the agent runs
audit.log()after every result, before the result is consumed - Ignore
UNRESOLVEDcontradictions —dart-corrruns after every step and the serializer refuses to emit findings while contradictions are open
This is the architectural guarantee made concrete: a fully jailbroken model is still bounded by the surface.
tests/test_live_mcp.py runs end-to-end tests against the real MCP stdio server (with a scripted "mock Claude" that picks tools deterministically). No API key required:
python3 tests/test_live_mcp.pyThe four assertions:
- Initialize handshake completes
-
tools/listadvertises the full typed MCP surface (native + SIFT adapters) - Calling a non-registered function returns
ToolNotFoundover the wire - The full loop produces a chain-verified audit log
A single iteration of the live loop consumes tokens depending on artifact size and the amount of tool output sent back to Claude. The bundled IP-KVM case typically completes in about five iterations. Token counts are recorded in the live run outputs so operators can review usage after the run. Check current Anthropic pricing and account limits before running live investigations.
For air-gapped or credential-free reproduction, deterministic mode handles the
same case classes the playbook covers with no external dependency. --dry-run
also exercises the live MCP plumbing without contacting the API.
| Symptom | Likely cause | Fix |
|---|---|---|
ANTHROPIC_API_KEY not set |
env var missing | export ANTHROPIC_API_KEY=... |
MCP handshake timeout |
dart-mcp subprocess crashed at startup |
Run python3 -m dart_mcp.server_stdio directly to see the error |
tools/list returns 0 tools |
Wrong PYTHONPATH | export PYTHONPATH="$PWD/dart_mcp/src:..." |
Loop hangs |
Claude waiting on tool_result that never arrived | Check audit.jsonl for the last call — likely a parser raised silently |
- dart-agent — the wrapper loop
- dart-mcp — the typed surface that gets exposed
- Architecture deep dive
-
docs/live-mode.md— the equivalent doc in repo form
Agentic-DART — autonomous DFIR agent · architecture-first, not prompt-first · MIT license · github.com/Juwon1405/agentic-dart
- The Memex bet ⭐ Why this design
- About the name
- Architecture-first vs prompt-first
- Architecture deep dive
- Threat model
- Glossary
- dart-mcp — typed surface (native + SIFT adapters)
- dart-agent — senior-analyst loop
- dart-corr — cross-artifact correlation
- dart-audit — SHA-256 chained log
- dart-playbook — senior-analyst sequencing rules (v3 default)
- MCP function catalog (native + SIFT adapters)
- Comparison with adjacent tools
- FAQ
- Operator guide — distro-agnostic
- Running on SIFT
- Live mode
- Accuracy report
-
Roadmap ⭐ Phase 1 ~95% complete
- Phase 1 — Agentic DFIR ⭐ dedicated page · SANS submission
-
Phase 2 — Detection engineering
- The self-learning loop ⭐ design note
- Phase 3 — Agentic SOC
- Phase 4 — Broader agentic security