Skip to content

Roadmap

Juwon1405 edited this page May 3, 2026 · 18 revisions

Roadmap

This page is the honest version of "where is Agentic-DART going". It is structured as four phases. Phase 1 is the SANS FIND EVIL! 2026 submission and is essentially complete. Phases 2–4 are not promises; they are the directions the architecture was built to support.

The codename DART was chosen to remain accurate as the scope expands — Detection And Response covers everything below.

At a glance

Phase Focus Status Window
Phase 1 Agentic DFIR — investigate one case end-to-end ~95% complete closes 2026-06-15
Phase 2 Agentic detection engineering — Sigma synthesis, coverage-gap reasoning Spec phase ~Q3 2026
Phase 3 Agentic SOC — supervised triage + response orchestration Design only ~Q1 2027
Phase 4 Broader agentic security — vuln management, compliance, adversary emulation Direction only 2027+

Phase 1 is what is shipping for SANS FIND EVIL! 2026. Every architectural guarantee made in Phase 1 (read-only MCP boundary, audit chain, contradiction enforcement, path safety) propagates unchanged into later phases. Forking the playbook cannot loosen them.


Phase 1 — Agentic DFIR (current, ~95% complete) ⭐

This is the SANS FIND EVIL! 2026 hackathon submission. It is architecturally complete and empirically validated against three public DFIR datasets. Phase 1 is the foundation everything else builds on — every architectural guarantee made here propagates unchanged into Phase 2, 3, and 4.

What "agentic DFIR" means in Phase 1

The agent investigates a single forensic case end-to-end. It loads the senior-analyst playbook, walks the 10 phases (volatility → initial access triage → timeline → anomaly surfacing → hypothesis formation → kill chain assembly → contradiction handling → attribution → recovery-denial check → finding emission), and produces a courtroom-grade report where every claim cites the audit ID of the MCP call that produced it.

Phase 1 is offline-first. The agent runs on mounted evidence, not live hosts. Live response (agentic SOC) is explicitly Phase 3.

Phase 1 is architecturally complete because

  • The MCP boundary is real, not promised. 60 typed forensic functions on the wire (35 native + 25 SIFT adapters). Anything outside this surface (execute_shell, write_file, mount, eval) raises ToolNotFound regardless of what the prompt says. Asserted by a 7-test bypass suite that runs on every commit.
  • The audit chain is tamper-evident. Every MCP call logged with SHA-256 chaining. 50 threads × 20 calls = 1000-entry chain verified concurrent-safe via threading.Lock() (v0.4.1 fix).
  • Path safety is fuzz-tested. _safe_resolve rejects ../, null bytes, absolute escapes, paths >1024 chars. Reuses Linux kernel's realpath() semantics.
  • Contradictions cannot be smoothed over. dart-corr flags UNRESOLVED when two artifacts disagree (e.g. MFT $SI < $FN by 11 seconds → timestomp pre-existed alert window). Serializer rejects findings that ignore unresolved contradictions.
  • Findings cite their evidence. Every finding carries an audit_id referencing the exact MCP call. Serializer rejects findings without one. v3 additionally requires an ADS template.

Phase 1 deliverables (Done)

Core architecture

  • 60 typed forensic functions (35 native + 25 SIFT adapters) across 11 of 12 MITRE ATT&CK enterprise tactics (MCP function catalog)
  • Read-only MCP boundary, asserted by 7-test bypass suite (Architecture deep dive)
  • SHA-256 chained audit log, replayable, tamper-evident, lock-protected (dart-audit)
  • Cross-artifact correlation engine with UNRESOLVED contradiction surfacing (dart-corr)
  • Path sandbox (_safe_resolve) with fuzz-validated traversal/null-byte/escape protection (Threat model)
  • Live mode — real Claude API + JSON-RPC stdio MCP server (Live mode)

Cross-platform coverage

  • Windows: EVTX, MFT, AmCache, Prefetch, ShimCache, Shellbags, USB history, Registry, Scheduled Tasks, Kerberos events, Windows logons
  • Linux: auditd, systemd-journal, bash history, /etc/passwd, web access logs, Unix auth logs (added v0.4 — 2026-04-30)
  • macOS: unified log, launchd plists, bash history (added v0.4 — 2026-04-30)
  • Memory + Network: process tree, open sockets, credential signals

Methodology — three playbook versions, each layer adds discipline

  • senior-analyst-v1.yaml (128 lines) — quick-demo baseline
  • senior-analyst-v2.yaml (845 lines, 2026-04-30) — methodology baseline. Synthesizes Mandiant M-Trends 2026, Targeted Attack Lifecycle, SANS PICERL, Lockheed Kill Chain, Bianco Pyramid of Pain + HMM, Diamond Model, MITRE ATT&CK v16, F3EAD, NIST SP 800-61/86/150, DFIR Report case studies, CISA #StopRansomware advisories, and field practice from Metcalf, Edwards, Wardle, Pomeranz, Zimmerman, Case, Roth, JPCERT/CC. 10 phases, 10 case classes, 25 references.
  • senior-analyst-v3.yaml ⭐ (1135 lines, 2026-05-01) — industrialization release. Adds four mature-SOC framework blocks on top of v2: Palantir ADS Framework (9-section detection contract), MaGMa UCF (FI-ISAC NL three-tier traceability with CMMI 5-level maturity), TaHiTI threat hunt cycle (H1/H2/H3), Bianco HMM operationalized (v3 ships at HMM3 Innovative). 42 references — adds awesome-soc, awesome-incident-response, awesome-threat-detection, ThreatHunter-Playbook, Atomic Red Team, Sigma schema, Crafting the InfoSec Playbook, plus external Yamato Security references (Hayabusa, EnableWindowsLogSettings) cited as third-party prior art only. v3 is now default. See dart-playbook.

Validation

Documentation

  • 26-page wiki — concept pages, package READMEs, case studies, operator guide, threat model, FAQ, glossary
  • Memex Bet conceptual framing (The Memex Bet) — places Agentic-DART in the lineage from Bush 1945 → Karpathy 2026 → Agentic-DART 2026
  • 4-minute SANS demo video (mock-screencast pre-cut; live screencast in flight per #14)
  • GitHub Social Preview, Devpost project page

Remaining for Phase 1 (closing on 2026-06-15)

Item Status Reference
Live screencast on SANS SIFT v22.04 (replaces mock-screencast preview) 🟡 In progress #14
Devpost submission click 🟡 Scheduled — 2026-06-13 (T-2) #15
Accuracy measurement on Ali Hadi Memory Forensic Challenge #1 🟡 In progress #16
Accuracy measurement on NIST CFReDS Hacking Case (re-measure post T1070.006 tightening) ⏰ TODO #1, #17
Accuracy measurement on Digital Corpora M57 Patents ⏰ TODO #18
Final accuracy report committed to docs/accuracy-report.md with reference audit-tail hash ⏰ TODO

After 6/15, Phase 1 is closed. Bug fixes only on main. Architectural changes go to a Phase 2 branch.

What Phase 1 explicitly does NOT do (deferred by design)

  • Live response. No kill_process, no quarantine, no block. The agent reads evidence, never modifies it. Response is Phase 3.
  • Sigma rule synthesis. v3 cites Sigma schema and hayabusa-rules as prior art, but Agentic-DART does not yet generate Sigma rules from observed evidence. That is Phase 2 (dart-synth, #10).
  • Cloud DFIR. No CloudTrail, GuardDuty, or cloud-native log analysis. Phase 2 (analyze_aws_cloudtrail, #11).
  • Memory forensics with Volatility. Memory is read for process tree + sockets only. Volatility-style plugin coverage is deferred to Phase 2.
  • Auto-execute YAML playbooks. The v2/v3 YAML is read by the agent but execution still goes through hardcoded Python phase scaffolds. Auto-execution is Phase 2 (#34).

These omissions are intentional — Phase 1 ships a tight, defensible architecture rather than a sprawling feature surface.


Phase 2 — Agentic detection engineering (~Q3 2026)

Extending the agent from "investigate one case" to "improve the detection corpus from many cases".

Goals

  • Read a corpus of historical incidents (audit logs from past runs) and surface coverage gaps
  • Synthesize new Sigma rules from observed attacker behavior
  • Quantify rule overlap, dead rules, and false-positive patterns
  • Maintain a versioned detection-as-code repo separate from the agent codebase

What changes architecturally

  • New package: dart_synth — Sigma rule synthesizer. Reads audit logs, emits .yml. Pure function: (audit_log → rule_yaml).
  • No change to MCP boundary. The synthesizer reads JSONL, not evidence. The boundary stays where it is.
  • New playbook: coverage-gap-analyst-v1.yaml for the new reasoning class.

What does not change

  • The agent still cannot write to the evidence tree.
  • The synthesizer's output is reviewed by a human before it lands in the production rule base. Agentic-DART does not auto-deploy rules.

Measure of success

  • Generate Sigma rules from 10+ historical cases
  • ≥80% of generated rules pass review without modification
  • Detection coverage gaps are surfaced before an analyst notices them in production

Phase 3 — Agentic SOC (~Q1 2027)

Triage, enrichment, and supervised response orchestration.

Goals

  • Ingest live SIEM alerts, route them to the right playbook
  • Enrich with TI, asset context, recent case history
  • Produce response drafts (containment, lateral-movement scoping, user notifications)
  • Hand off to a human-in-the-loop for any action with side effects

What changes architecturally

  • New package: dart_responder — proposes responses, does not execute them. Output is a structured action plan, not a script.
  • New boundary: response side effects. A separate module owns the verbs that have effects (quarantine, isolate, kill-process, etc.). It is feature-flagged off by default. Enabling it requires per-environment configuration and a human approval step on every action.
  • New audit chain category: proposed_action vs taken_action. Proposed actions are logged like findings. Taken actions require cryptographic approval from a human key.

What does not change

  • The DFIR boundary (Phase 1) stays as-is. SOC functions live in a different boundary.
  • Human approval is always required for actions with side effects. The architecture refuses to be auto-deployed without a human-in-the-loop.

Measure of success

  • Mean time to triage drops by 50% on a representative SOC corpus
  • Zero auto-executed actions without human approval (asserted by test_responder_no_auto_execute)
  • Response plans match analyst plans on ≥70% of incidents

Phase 4 — Broader agentic security (~2027 and beyond)

Vague on purpose. The architecture is designed to support directions we haven't picked yet:

  • Continuous detection-engineering loop — Phase 2 + Phase 3 in a single closed feedback loop
  • Threat-hunting agent — proactive hypothesis generation against cold storage
  • Code-review assistant for security infrastructure — reads PRs to detection-as-code repos and flags rule regressions
  • Cross-environment correlation — multi-tenant dart-corr for organizations operating multiple SOCs

What's constant across Phase 4 directions:

  • The architectural rules from Phase 1 still hold. Nothing has a general-purpose escape hatch. Every new verb is typed.
  • Every action with side effects requires a human-in-the-loop.
  • Every reasoning step is replayable from an audit chain.

What's not on the roadmap

These have been considered and rejected for the foreseeable future:

Auto-remediation without human approval

Even with high confidence, the architecture refuses to take actions with side effects without a human approving each one. We will not ship this; if you need it, fork and own the consequences.

A general-purpose tool ("ask_anything")

The whole project is built on the premise that typed surfaces beat prompted ones. A general-purpose tool would defeat that.

Closed-source distribution

The architectural guarantees are only meaningful if the surface is auditable. The project is MIT and will stay open source.

Vendor-specific integrations as core packages

Splunk, Sentinel, XSOAR, etc. — these belong in adapters, not in the core. We will document the adapter interface and keep core surface vendor-neutral.


How to influence the roadmap

The hackathon submission is fixed. Post-6/15, the roadmap is open to community input:

  • Issues tagged roadmap — discuss direction
  • Issues tagged phase-2, phase-3, phase-4 — propose specific features for that phase
  • Pull requests with prototypes — strong signal. A working prototype of a Phase 2 feature is worth more than 100 issues.

If you're a hackathon judge reading this: the roadmap is here to demonstrate that the architecture was designed to expand. Phase 1 is what's submitted. Phases 2-4 are evidence the design is honest.


Anti-roadmap (what we will refuse)

To be explicit:

Request Response
"Can you add execute_shell for power users?" No.
"Can the agent auto-terminate processes?" Not without per-action human approval.
"Can we ship a binary blob that 'just works'?" No. Architecture must be auditable.
"Can we make the prompt more permissive 'just for this case'?" No. Guardrails are architectural.

Further reading

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally