-
Notifications
You must be signed in to change notification settings - Fork 5
Architecture deep dive
The README explains what Agentic-DART does. This page explains why the architecture is shaped the way it is, and what was deliberately not built.
A senior analyst's reasoning is not "what to say" — it's what they refuse to do. Encode the refusal as architecture, not as prompt.
That sentence shows up in the README as a tagline. Here is what it means concretely.
A traditional LLM-driven assistant is a function:
(prompt + context) → text
The safety surface is the prompt. Every guardrail you want — "don't modify evidence", "don't fabricate findings", "always cite an artifact" — has to be re-asserted in language. Language is leaky. A jailbreak, an unusual context, or a long enough conversation can erode every prompt-based guardrail.
Agentic-DART inverts this:
(prompt + context) → typed_tool_call(args) → typed_result
↑ ↑
schema-validated no destructive op exists
The agent literally cannot call execute_shell or write_file on
the evidence tree, because those functions do not exist on the MCP
surface. The "guardrail" is not a sentence in the prompt. It is the
absence of a function.
This is why the project's bypass test (tests/test_mcp_bypass.py)
is the most important test in the repo:
def test_unregistered_destructive_function_raises_ToolNotFound():
"""Calling anything not in the registry must fail hard."""
try:
call_tool("execute_shell", {"cmd": "rm -rf /"})
except KeyError as e:
assert "ToolNotFound" in str(e)
else:
raise AssertionError("forbidden function is somehow exposed")If that test ever fails — meaning something on the MCP surface lets a destructive verb through — the architecture has been compromised. The agent's reasoning quality is downstream of this; it does not matter how smart the loop is if the surface leaks.
┌─────────────────────────────────────────────────────────────────┐
│ IR analyst (human) │
└────────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────────┐
│ dart-agent (senior-analyst loop wrapper) │
│ dart-playbook (YAML sequencing rules) │
└────────────────────────────┬────────────────────────────────────┘
│ (typed MCP calls only)
┌────────────────────────────▼────────────────────────────────────┐
│ ▼ READ-ONLY BOUNDARY (architectural) ▼ │
│ │
│ dart-mcp the typed forensic function surface (native + SIFT adapters) │
│ · schema-validated input │
│ · cursor-paginated output │
│ · no destructive verb exists │
│ │
│ dart-corr DuckDB cross-artifact correlation │
│ · flags contradictions as UNRESOLVED │
│ │
│ dart-audit SHA-256 chained JSONL │
│ · side-tapped from every MCP call │
│ · replayable, tamper-evident │
│ │
│ Evidence (read-only mount) │
└─────────────────────────────────────────────────────────────────┘
Each package owns exactly one responsibility:
| Package | Owns | Forbidden |
|---|---|---|
dart_audit |
Append-only, chained log of every tool call | Reading evidence directly |
dart_mcp |
The set of tools the agent can call | Side-effects outside the evidence tree |
dart_agent |
The reasoning loop | Calling tools that aren't on the MCP surface |
dart_corr |
Cross-artifact correlation, contradiction detection | Making claims (only surfaces them) |
dart_playbook |
Sequencing rules, analyst heuristics | Imperative code |
dart-corr is the boring part of the project, and it is the part that
makes the rest work.
LLMs are excellent at narrative reasoning and bad at set algebra at scale. Joining a 5-million-row MFT timeline against a 200-thousand-row process list under deadline pressure is set algebra. We push that work to DuckDB and let the agent do what it's good at: interpreting the join result.
Specifically:
- DuckDB runs in-process (no daemon, no port to harden)
- Reads Parquet, CSV, and JSONL natively — most evidence parsers produce one of those
- Joins of millions of rows finish in seconds on a SIFT VM
- Window functions for time-proximity joins are first-class
The agent never writes SQL. dart-corr exposes a small typed surface
(correlate_events, correlate_timeline) and the agent calls those.
Forensic findings have a chain-of-custody requirement that ordinary software doesn't. If the agent claims "USB Kingston DataTraveler was inserted at 14:22:18 UTC", a reviewer must be able to verify, after the fact, that:
- The agent actually saw that artifact
- The artifact has not been edited between the agent's read and the reviewer's verification
- No log entry has been silently inserted, deleted, or reordered
A simple append-only log gives you (1). A SHA-256 chain — where each entry's hash includes the previous entry's hash — gives you (2) and (3) for free. Tampering with any entry breaks the chain at that point and every subsequent point.
Implementation: dart_audit/src/dart_audit/__init__.py. ~150 lines.
The simplicity is the feature; this is not the place to be clever.
These are conscious omissions, not oversights.
There is no execute_shell, no eval, no subprocess.run exposed
through the MCP surface. The temptation in agent design is to add a
general fallback so the agent can "just figure it out" when the typed
surface is insufficient. We refuse this. If a typed function is
missing, the right move is to add a new typed function, not to
expose a general escape.
Every parser opens files in 'r' or 'rb' mode. The OS-level mount
is read-only. There is no code path that can write to the evidence
tree even if asked. Evidence is fixture, not workspace.
The agent does not quarantine, terminate, or block anything. It
reports. Phase 3 of the roadmap (agentic SOC) will introduce
supervised response, but it will be a separate package
(dart_responder) with its own boundary, not a flag on dart-agent.
A run is a run. State lives in progress.jsonl and audit.jsonl for
the duration of one case. There is no global "knowledge base" that
accumulates across runs. The reasoning has to be reproducible from a
single audit log alone.
The system prompt does say things like "always cite the audit_id of
the supporting MCP call". But none of those instructions are load-
bearing. If the model ignores them, the serializer in dart_agent
refuses to emit a finding without an audit_id. Every prompt-level
"rule" has an architectural enforcer downstream.
If you want to add something:
-
A new typed forensic function: yes, this is the easy path.
Read
CONTRIBUTING.mdand add a Pydantic schema, a_safe_resolvecall, and a bypass test. - A new playbook YAML: yes, no Python change required.
-
A new correlation pattern in
dart-corr: yes, but the new pattern must surface contradictions asUNRESOLVED, not "decide". - A way for the agent to write back to evidence: no, ever.
- A general-purpose tool ("query_anything"): no. If you find yourself wanting one, the typed surface is too narrow somewhere specific — add a typed function for that specific case.
-
docs/architecture.md— the same content in repo form, with the architecture diagram inline -
docs/case-pth-timestomp.md— a worked example showing the loop and the contradiction handling - Threat model — what the architecture defends against, and what it does not
Agentic-DART — autonomous DFIR agent · architecture-first, not prompt-first · MIT license · github.com/Juwon1405/agentic-dart
- The Memex bet ⭐ Why this design
- About the name
- Architecture-first vs prompt-first
- Architecture deep dive
- Threat model
- Glossary
- dart-mcp — typed surface (native + SIFT adapters)
- dart-agent — senior-analyst loop
- dart-corr — cross-artifact correlation
- dart-audit — SHA-256 chained log
- dart-playbook — senior-analyst sequencing rules (v3 default)
- MCP function catalog (native + SIFT adapters)
- Comparison with adjacent tools
- FAQ
- Operator guide — distro-agnostic
- Running on SIFT
- Live mode
- Accuracy report
-
Roadmap ⭐ Phase 1 ~95% complete
- Phase 1 — Agentic DFIR ⭐ dedicated page · SANS submission
-
Phase 2 — Detection engineering
- The self-learning loop ⭐ design note
- Phase 3 — Agentic SOC
- Phase 4 — Broader agentic security