Roadmap

This page is the honest version of "where is Agentic-DART going". It is structured as four phases. Phase 1 is the SANS FIND EVIL! 2026 submission and is essentially complete. Phases 2–4 are not promises; they are the directions the architecture was built to support.

The codename DART was chosen to remain accurate as the scope expands — Detection And Response covers everything below.

At a glance

Phase	Focus	Status	Window
Phase 1 ⭐	Agentic DFIR — investigate one case end-to-end	~95% complete	closes 2026-06-15
Phase 2	Agentic detection engineering — Sigma synthesis, coverage-gap reasoning	Spec phase	~Q3 2026
Phase 3	Agentic SOC — supervised triage + response orchestration	Design only	~Q1 2027
Phase 4	Broader agentic security — vuln management, compliance, adversary emulation	Direction only	2027+

Phase 1 is what is shipping for SANS FIND EVIL! 2026. Every architectural guarantee made in Phase 1 (read-only MCP boundary, audit chain, contradiction enforcement, path safety) propagates unchanged into later phases. Forking the playbook cannot loosen them.

Phase 1 — Agentic DFIR (current, ~95% complete) ⭐

This is the SANS FIND EVIL! 2026 hackathon submission. It is architecturally complete and empirically validated against three public DFIR datasets. Phase 1 is the foundation everything else builds on — every architectural guarantee made here propagates unchanged into Phase 2, 3, and 4.

What "agentic DFIR" means in Phase 1

The agent investigates a single forensic case end-to-end. It loads the senior-analyst playbook, walks the ten phases (volatility → initial access triage → timeline → anomaly surfacing → hypothesis formation → kill chain assembly → contradiction handling → attribution → recovery-denial check → finding emission), and produces a courtroom-grade report where every claim cites the audit ID of the MCP call that produced it.

Phase 1 is offline-first. The agent runs on mounted evidence, not live hosts. Live response (agentic SOC) is explicitly Phase 3.

Phase 1 is architecturally complete because

The MCP boundary is real, not promised. The 72-tool typed forensic function surface on the wire (47 native + 25 SIFT adapters) is the whole available action space. Anything outside this surface (execute_shell, write_file, mount, eval) raises ToolNotFound regardless of what the prompt says. Asserted by a bypass suite that runs on every commit.
The audit chain is tamper-evident. Every MCP call logged with SHA-256 chaining. 50 threads × 20 calls = 1000-entry chain verified concurrent-safe via threading.Lock() (v0.4.1 fix).
Path safety is fuzz-tested. _safe_resolve rejects ../, null bytes, absolute escapes, paths >1024 chars. Reuses Linux kernel's realpath() semantics.
Contradictions cannot be smoothed over. dart-corr flags UNRESOLVED when two artifacts disagree (e.g. MFT $SI < $FN by 11 seconds → timestomp pre-existed alert window). Serializer rejects findings that ignore unresolved contradictions.
Findings cite their evidence. Every finding carries an audit_id referencing the exact MCP call. Serializer rejects findings without one. v3 additionally requires an ADS template.

Phase 1 deliverables (Done)

Core architecture

the typed forensic function surface (native + SIFT adapters) across broad MITRE ATT&CK enterprise tactic coverage (10 of 12 in-scope tactics; TA0009 Collection and full TA0011 C2 are Phase-2, though detect_dns_tunneling already adds DNS-tunneling C2 indicators) (MCP function catalog)
Read-only MCP boundary, asserted by 7-test bypass suite (Architecture deep dive)
SHA-256 chained audit log, replayable, tamper-evident, lock-protected (dart-audit)
Cross-artifact correlation engine with UNRESOLVED contradiction surfacing (dart-corr)
Path sandbox (_safe_resolve) with fuzz-validated traversal/null-byte/escape protection (Threat model)
Live mode — real Claude API + JSON-RPC stdio MCP server (Live mode)

Cross-platform coverage

Windows: EVTX, MFT, AmCache, Prefetch, ShimCache, Shellbags, USB history, Registry, Scheduled Tasks, Kerberos events, Windows logons
Linux: auditd, systemd-journal, bash history, /etc/passwd, web access logs, Unix auth logs (added v0.4 — 2026-04-30)
macOS: unified log, launchd plists, bash history (added v0.4 — 2026-04-30)
Memory + Network: process tree, open sockets, credential signals

Methodology — three playbook versions, each layer adds discipline

senior-analyst-v1.yaml (128 lines) — quick-demo baseline
senior-analyst-v2.yaml (845 lines, 2026-04-30) — methodology baseline. Synthesizes Mandiant M-Trends 2026, Targeted Attack Lifecycle, SANS PICERL, Lockheed Kill Chain, Bianco Pyramid of Pain + HMM, Diamond Model, MITRE ATT&CK v16, F3EAD, NIST SP 800-61/86/150, DFIR Report case studies, CISA #StopRansomware advisories, and field practice from Metcalf, Edwards, Wardle, Pomeranz, Zimmerman, Case, Roth, JPCERT/CC. 10 phases, 10 case classes, 25 references.
senior-analyst-v3.yaml ⭐ (2026-05-01) — industrialization release. Adds four mature-SOC framework blocks as YAML data scaffolds on top of v2's runtime path: Palantir ADS Framework (9-section detection contract), MaGMa UCF (FI-ISAC NL three-tier traceability with CMMI 5-level maturity), TaHiTI threat hunt cycle (H1/H2/H3 with designed trigger), Bianco HMM (v3 yaml self-declares HMM3 Innovative). extensive reference list — adds awesome-soc, awesome-incident-response, awesome-threat-detection, ThreatHunter-Playbook, Atomic Red Team, Sigma schema, Crafting the InfoSec Playbook, plus external Yamato Security references (Hayabusa, EnableWindowsLogSettings) cited as third-party prior art only. v3 is the default playbook. Runtime activation of the four scaffolds in dart_agent / dart_corr is a post-SANS work item (issue #44). See dart-playbook.

Validation

The full pytest suite passes on a fresh clone with the documented dependencies installed (CI: Python 3.10 + 3.12)
Dedicated bypass tests assert ToolNotFound for forbidden operations
Demo run completes in <1 second on a SIFT v22.04 baseline
Two reproducible case-study walkthroughs:
- Case: IP-KVM remote-hands compromise
- Case: Pass-the-Hash with timestomp

Documentation

26-page wiki — concept pages, package READMEs, case studies, operator guide, threat model, FAQ, glossary
Memex Bet conceptual framing (The Memex Bet) — places Agentic-DART in the lineage from Bush 1945 → Karpathy 2026 → Agentic-DART 2026
4-minute SANS demo video (mock-screencast pre-cut; live screencast in flight per #14)
GitHub Social Preview, Devpost project page

Remaining for Phase 1 (closing on 2026-06-15)

Item	Status	Reference
Live screencast on SANS SIFT v22.04 (replaces mock-screencast preview)	🟡 In progress	#14
Devpost submission click	🟡 Scheduled — 2026-06-13 (T-2)	#15
Accuracy measurement on Ali Hadi Memory Forensic Challenge #1	🟡 In progress	#16
Accuracy measurement on NIST CFReDS Hacking Case (re-measure post T1070.006 tightening)	⏰ TODO	#1, #17
Accuracy measurement on Digital Corpora M57 Patents	⏰ TODO	#18
Final accuracy report committed to `docs/accuracy-report.md` with reference audit-tail hash	⏰ TODO	—

After 6/15, Phase 1 is closed. Bug fixes only on main. Architectural changes go to a Phase 2 branch.

What Phase 1 explicitly does NOT do (deferred by design)

❌ Live response. No kill_process, no quarantine, no block. The agent reads evidence, never modifies it. Response is Phase 3.
❌ Sigma rule synthesis. v3 cites Sigma schema and hayabusa-rules as prior art, but Agentic-DART does not yet generate Sigma rules from observed evidence. That is Phase 2 (dart-synth, #10).
❌ Cloud DFIR. No CloudTrail, GuardDuty, or cloud-native log analysis. Phase 2 (analyze_aws_cloudtrail, #11).
❌ Memory forensics with Volatility. Memory is read for process tree + sockets only. Volatility-style plugin coverage is deferred to Phase 2.
❌ Auto-execute YAML playbooks. The v2/v3 YAML is read by the agent but execution still goes through hardcoded Python phase scaffolds. Auto-execution is Phase 2 (#34).

These omissions are intentional — Phase 1 ships a tight, defensible architecture rather than a sprawling feature surface.

Phase 2 — Agentic detection engineering (~Q3 2026)

Extending the agent from "investigate one case" to "improve the detection corpus from many cases".

Goals

Read a corpus of historical incidents (audit logs from past runs) and surface coverage gaps
Synthesize new Sigma rules from observed attacker behavior
Quantify rule overlap, dead rules, and false-positive patterns
Maintain a versioned detection-as-code repo separate from the agent codebase

What changes architecturally

New package: dart_synth — Sigma rule synthesizer. Reads audit logs, emits .yml. Pure function: (audit_log → rule_yaml).
No change to MCP boundary. The synthesizer reads JSONL, not evidence. The boundary stays where it is.
New playbook: coverage-gap-analyst-v1.yaml for the new reasoning class.

What does not change

The agent still cannot write to the evidence tree.
The synthesizer's output is reviewed by a human before it lands in the production rule base. Agentic-DART does not auto-deploy rules.

Measure of success

Generate Sigma rules from 10+ historical cases
≥80% of generated rules pass review without modification
Detection coverage gaps are surfaced before an analyst notices them in production

Phase 3 — Agentic SOC (~Q1 2027)

Triage, enrichment, and supervised response orchestration.

Goals

Ingest live SIEM alerts, route them to the right playbook
Enrich with TI, asset context, recent case history
Produce response drafts (containment, lateral-movement scoping, user notifications)
Hand off to a human-in-the-loop for any action with side effects

What changes architecturally

New package: dart_responder — proposes responses, does not execute them. Output is a structured action plan, not a script.
New boundary: response side effects. A separate module owns the verbs that have effects (quarantine, isolate, kill-process, etc.). It is feature-flagged off by default. Enabling it requires per-environment configuration and a human approval step on every action.
New audit chain category: proposed_action vs taken_action. Proposed actions are logged like findings. Taken actions require cryptographic approval from a human key.

What does not change

The DFIR boundary (Phase 1) stays as-is. SOC functions live in a different boundary.
Human approval is always required for actions with side effects. The architecture refuses to be auto-deployed without a human-in-the-loop.

Measure of success

Mean time to triage drops by 50% on a representative SOC corpus
Zero auto-executed actions without human approval (asserted by test_responder_no_auto_execute)
Response plans match analyst plans on ≥70% of incidents

Phase 4 — Broader agentic security (~2027 and beyond)

Vague on purpose. The architecture is designed to support directions we haven't picked yet:

Continuous detection-engineering loop — Phase 2 + Phase 3 in a single closed feedback loop
Threat-hunting agent — proactive hypothesis generation against cold storage
Code-review assistant for security infrastructure — reads PRs to detection-as-code repos and flags rule regressions
Cross-environment correlation — multi-tenant dart-corr for organizations operating multiple SOCs

What's constant across Phase 4 directions:

The architectural rules from Phase 1 still hold. Nothing has a general-purpose escape hatch. Every new verb is typed.
Every action with side effects requires a human-in-the-loop.
Every reasoning step is replayable from an audit chain.

What's not on the roadmap

These have been considered and rejected for the foreseeable future:

Auto-remediation without human approval

Even with high confidence, the architecture refuses to take actions with side effects without a human approving each one. We will not ship this; if you need it, fork and own the consequences.

A general-purpose tool ("ask_anything")

The whole project is built on the premise that typed surfaces beat prompted ones. A general-purpose tool would defeat that.

Closed-source distribution

The architectural guarantees are only meaningful if the surface is auditable. The project is MIT and will stay open source.

Vendor-specific integrations as core packages

Splunk, Sentinel, XSOAR, etc. — these belong in adapters, not in the core. We will document the adapter interface and keep core surface vendor-neutral.

How to influence the roadmap

The hackathon submission is fixed. Post-6/15, the roadmap is open to community input:

Issues tagged roadmap — discuss direction
Issues tagged phase-2, phase-3, phase-4 — propose specific features for that phase
Pull requests with prototypes — strong signal. A working prototype of a Phase 2 feature is worth more than 100 issues.

If you're a hackathon judge reading this: the roadmap is here to demonstrate that the architecture was designed to expand. Phase 1 is what's submitted. Phases 2-4 are evidence the design is honest.

Anti-roadmap (what we will refuse)

To be explicit:

Request	Response
"Can you add `execute_shell` for power users?"	No.
"Can the agent auto-terminate processes?"	Not without per-action human approval.
"Can we ship a binary blob that 'just works'?"	No. Architecture must be auditable.
"Can we make the prompt more permissive 'just for this case'?"	No. Guardrails are architectural.

Roadmap

Roadmap

At a glance

Phase 1 — Agentic DFIR (current, ~95% complete) ⭐

What "agentic DFIR" means in Phase 1

Phase 1 is architecturally complete because

Phase 1 deliverables (Done)

Core architecture

Cross-platform coverage

Methodology — three playbook versions, each layer adds discipline

Validation

Documentation

Remaining for Phase 1 (closing on 2026-06-15)

What Phase 1 explicitly does NOT do (deferred by design)

Phase 2 — Agentic detection engineering (~Q3 2026)

Goals

What changes architecturally

What does not change

Measure of success

Phase 3 — Agentic SOC (~Q1 2027)

Goals

What changes architecturally

What does not change

Measure of success

Phase 4 — Broader agentic security (~2027 and beyond)

What's not on the roadmap

Auto-remediation without human approval

A general-purpose tool ("ask_anything")

Closed-source distribution

Vendor-specific integrations as core packages

How to influence the roadmap

Anti-roadmap (what we will refuse)

Further reading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project

Project links

Clone this wiki locally