Skip to content

Architecture deep dive

Juwon1405 edited this page May 2, 2026 · 8 revisions

Architecture deep dive

The README explains what Agentic-DART does. This page explains why the architecture is shaped the way it is, and what was deliberately not built.


The core claim

A senior analyst's reasoning is not "what to say" — it's what they refuse to do. Encode the refusal as architecture, not as prompt.

That sentence shows up in the README as a tagline. Here is what it means concretely.

A traditional LLM-driven assistant is a function:

(prompt + context) → text

The safety surface is the prompt. Every guardrail you want — "don't modify evidence", "don't fabricate findings", "always cite an artifact" — has to be re-asserted in language. Language is leaky. A jailbreak, an unusual context, or a long enough conversation can erode every prompt-based guardrail.

Agentic-DART inverts this:

(prompt + context) → typed_tool_call(args) → typed_result
                          ↑                       ↑
                    schema-validated       no destructive op exists

The agent literally cannot call execute_shell or write_file on the evidence tree, because those functions do not exist on the MCP surface. The "guardrail" is not a sentence in the prompt. It is the absence of a function.

This is why the project's bypass test (tests/test_mcp_bypass.py) is the most important test in the repo:

def test_unregistered_destructive_function_raises_ToolNotFound():
    with pytest.raises(ToolNotFound):
        call_tool("execute_shell", {"cmd": "rm -rf /"})

If that test ever fails — meaning something on the MCP surface lets a destructive verb through — the architecture has been compromised. The agent's reasoning quality is downstream of this; it does not matter how smart the loop is if the surface leaks.


Five small packages, one boundary

┌─────────────────────────────────────────────────────────────────┐
│                       IR analyst (human)                        │
└────────────────────────────┬────────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────────┐
│           dart-agent     (senior-analyst loop wrapper)          │
│           dart-playbook  (YAML sequencing rules)                │
└────────────────────────────┬────────────────────────────────────┘
                             │  (typed MCP calls only)
┌────────────────────────────▼────────────────────────────────────┐
│           ▼  READ-ONLY BOUNDARY (architectural)  ▼              │
│                                                                 │
│           dart-mcp     60 typed forensic functions (35 native + 25 SIFT)             │
│                         · schema-validated input                │
│                         · cursor-paginated output               │
│                         · no destructive verb exists            │
│                                                                 │
│           dart-corr     DuckDB cross-artifact correlation       │
│                         · flags contradictions as UNRESOLVED    │
│                                                                 │
│           dart-audit    SHA-256 chained JSONL                   │
│                         · side-tapped from every MCP call       │
│                         · replayable, tamper-evident            │
│                                                                 │
│           Evidence (read-only mount)                            │
└─────────────────────────────────────────────────────────────────┘

Each package owns exactly one responsibility:

Package Owns Forbidden
dart_audit Append-only, chained log of every tool call Reading evidence directly
dart_mcp The set of tools the agent can call Side-effects outside the evidence tree
dart_agent The reasoning loop Calling tools that aren't on the MCP surface
dart_corr Cross-artifact correlation, contradiction detection Making claims (only surfaces them)
dart_playbook Sequencing rules, analyst heuristics Imperative code

Why DuckDB

dart-corr is the boring part of the project, and it is the part that makes the rest work.

LLMs are excellent at narrative reasoning and bad at set algebra at scale. Joining a 5-million-row MFT timeline against a 200-thousand-row process list under deadline pressure is set algebra. We push that work to DuckDB and let the agent do what it's good at: interpreting the join result.

Specifically:

  • DuckDB runs in-process (no daemon, no port to harden)
  • Reads Parquet, CSV, and JSONL natively — most evidence parsers produce one of those
  • Joins of millions of rows finish in seconds on a SIFT VM
  • Window functions for time-proximity joins are first-class

The agent never writes SQL. dart-corr exposes a small typed surface (correlate_events, correlate_timeline) and the agent calls those.


Why SHA-256 chained audit

Forensic findings have a chain-of-custody requirement that ordinary software doesn't. If the agent claims "USB Kingston DataTraveler was inserted at 14:22:18 UTC", a reviewer must be able to verify, after the fact, that:

  1. The agent actually saw that artifact
  2. The artifact has not been edited between the agent's read and the reviewer's verification
  3. No log entry has been silently inserted, deleted, or reordered

A simple append-only log gives you (1). A SHA-256 chain — where each entry's hash includes the previous entry's hash — gives you (2) and (3) for free. Tampering with any entry breaks the chain at that point and every subsequent point.

Implementation: dart_audit/src/dart_audit/__init__.py. ~150 lines. The simplicity is the feature; this is not the place to be clever.


What was deliberately not built

These are conscious omissions, not oversights.

No "general purpose escape hatch"

There is no execute_shell, no eval, no subprocess.run exposed through the MCP surface. The temptation in agent design is to add a general fallback so the agent can "just figure it out" when the typed surface is insufficient. We refuse this. If a typed function is missing, the right move is to add a new typed function, not to expose a general escape.

No write path to evidence

Every parser opens files in 'r' or 'rb' mode. The OS-level mount is read-only. There is no code path that can write to the evidence tree even if asked. Evidence is fixture, not workspace.

No automatic remediation

The agent does not quarantine, terminate, or block anything. It reports. Phase 3 of the roadmap (agentic SOC) will introduce supervised response, but it will be a separate package (dart_responder) with its own boundary, not a flag on dart-agent.

No memory across cases

A run is a run. State lives in progress.jsonl and audit.jsonl for the duration of one case. There is no global "knowledge base" that accumulates across runs. The reasoning has to be reproducible from a single audit log alone.

No prompt-based guardrails

The system prompt does say things like "always cite the audit_id of the supporting MCP call". But none of those instructions are load- bearing. If the model ignores them, the serializer in dart_agent refuses to emit a finding without an audit_id. Every prompt-level "rule" has an architectural enforcer downstream.


What this means for contributions

If you want to add something:

  • A new typed forensic function: yes, this is the easy path. Read CONTRIBUTING.md and add a Pydantic schema, a _safe_resolve call, and a bypass test.
  • A new playbook YAML: yes, no Python change required.
  • A new correlation pattern in dart-corr: yes, but the new pattern must surface contradictions as UNRESOLVED, not "decide".
  • A way for the agent to write back to evidence: no, ever.
  • A general-purpose tool ("query_anything"): no. If you find yourself wanting one, the typed surface is too narrow somewhere specific — add a typed function for that specific case.

Further reading

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally