-
Notifications
You must be signed in to change notification settings - Fork 5
Roadmap
This page is the honest version of "where is Agentic-DART going". It is structured as four phases. Phase 1 is the SANS FIND EVIL! 2026 submission and is essentially complete. Phases 2–4 are not promises; they are the directions the architecture was built to support.
The codename DART was chosen to remain accurate as the scope expands — Detection And Response covers everything below.
| Phase | Focus | Status | Window |
|---|---|---|---|
| Phase 1 ⭐ | Agentic DFIR — investigate one case end-to-end | ~95% complete | closes 2026-06-15 |
| Phase 2 | Agentic detection engineering — Sigma synthesis, coverage-gap reasoning | Spec phase | ~Q3 2026 |
| Phase 3 | Agentic SOC — supervised triage + response orchestration | Design only | ~Q1 2027 |
| Phase 4 | Broader agentic security — vuln management, compliance, adversary emulation | Direction only | 2027+ |
Phase 1 is what is shipping for SANS FIND EVIL! 2026. Every architectural guarantee made in Phase 1 (read-only MCP boundary, audit chain, contradiction enforcement, path safety) propagates unchanged into later phases. Forking the playbook cannot loosen them.
This is the SANS FIND EVIL! 2026 hackathon submission. It is architecturally complete and empirically validated against three public DFIR datasets. Phase 1 is the foundation everything else builds on — every architectural guarantee made here propagates unchanged into Phase 2, 3, and 4.
The agent investigates a single forensic case end-to-end. It loads the senior-analyst playbook, walks the ten phases (volatility → initial access triage → timeline → anomaly surfacing → hypothesis formation → kill chain assembly → contradiction handling → attribution → recovery-denial check → finding emission), and produces a courtroom-grade report where every claim cites the audit ID of the MCP call that produced it.
Phase 1 is offline-first. The agent runs on mounted evidence, not live hosts. Live response (agentic SOC) is explicitly Phase 3.
-
The MCP boundary is real, not promised. The 72-tool typed forensic function surface on the wire (47 native + 25 SIFT adapters) is the whole available action space. Anything outside this surface (
execute_shell,write_file,mount,eval) raisesToolNotFoundregardless of what the prompt says. Asserted by a bypass suite that runs on every commit. -
The audit chain is tamper-evident. Every MCP call logged with SHA-256 chaining. 50 threads × 20 calls = 1000-entry chain verified concurrent-safe via
threading.Lock()(v0.4.1 fix). -
Path safety is fuzz-tested.
_safe_resolverejects../, null bytes, absolute escapes, paths >1024 chars. Reuses Linux kernel'srealpath()semantics. -
Contradictions cannot be smoothed over.
dart-corrflagsUNRESOLVEDwhen two artifacts disagree (e.g. MFT $SI < $FN by 11 seconds → timestomp pre-existed alert window). Serializer rejects findings that ignore unresolved contradictions. -
Findings cite their evidence. Every finding carries an
audit_idreferencing the exact MCP call. Serializer rejects findings without one. v3 additionally requires an ADS template.
- the typed forensic function surface (native + SIFT adapters) across broad MITRE ATT&CK enterprise tactic coverage (10 of 12 in-scope tactics; TA0009 Collection and full TA0011 C2 are Phase-2, though
detect_dns_tunnelingalready adds DNS-tunneling C2 indicators) (MCP function catalog) - Read-only MCP boundary, asserted by 7-test bypass suite (Architecture deep dive)
- SHA-256 chained audit log, replayable, tamper-evident, lock-protected (dart-audit)
- Cross-artifact correlation engine with
UNRESOLVEDcontradiction surfacing (dart-corr) - Path sandbox (
_safe_resolve) with fuzz-validated traversal/null-byte/escape protection (Threat model) - Live mode — real Claude API + JSON-RPC stdio MCP server (Live mode)
- Windows: EVTX, MFT, AmCache, Prefetch, ShimCache, Shellbags, USB history, Registry, Scheduled Tasks, Kerberos events, Windows logons
- Linux: auditd, systemd-journal, bash history, /etc/passwd, web access logs, Unix auth logs (added v0.4 — 2026-04-30)
- macOS: unified log, launchd plists, bash history (added v0.4 — 2026-04-30)
- Memory + Network: process tree, open sockets, credential signals
-
senior-analyst-v1.yaml(128 lines) — quick-demo baseline -
senior-analyst-v2.yaml(845 lines, 2026-04-30) — methodology baseline. Synthesizes Mandiant M-Trends 2026, Targeted Attack Lifecycle, SANS PICERL, Lockheed Kill Chain, Bianco Pyramid of Pain + HMM, Diamond Model, MITRE ATT&CK v16, F3EAD, NIST SP 800-61/86/150, DFIR Report case studies, CISA #StopRansomware advisories, and field practice from Metcalf, Edwards, Wardle, Pomeranz, Zimmerman, Case, Roth, JPCERT/CC. 10 phases, 10 case classes, 25 references. -
senior-analyst-v3.yaml⭐ (2026-05-01) — industrialization release. Adds four mature-SOC framework blocks as YAML data scaffolds on top of v2's runtime path: Palantir ADS Framework (9-section detection contract), MaGMa UCF (FI-ISAC NL three-tier traceability with CMMI 5-level maturity), TaHiTI threat hunt cycle (H1/H2/H3 with designed trigger), Bianco HMM (v3 yaml self-declares HMM3 Innovative). extensive reference list — adds awesome-soc, awesome-incident-response, awesome-threat-detection, ThreatHunter-Playbook, Atomic Red Team, Sigma schema, Crafting the InfoSec Playbook, plus external Yamato Security references (Hayabusa, EnableWindowsLogSettings) cited as third-party prior art only. v3 is the default playbook. Runtime activation of the four scaffolds indart_agent/dart_corris a post-SANS work item (issue #44). See dart-playbook.
- The full pytest suite passes on a fresh clone with the documented dependencies installed (CI: Python 3.10 + 3.12)
- Dedicated bypass tests assert
ToolNotFoundfor forbidden operations - Demo run completes in <1 second on a SIFT v22.04 baseline
- Two reproducible case-study walkthroughs:
- 26-page wiki — concept pages, package READMEs, case studies, operator guide, threat model, FAQ, glossary
- Memex Bet conceptual framing (The Memex Bet) — places Agentic-DART in the lineage from Bush 1945 → Karpathy 2026 → Agentic-DART 2026
- 4-minute SANS demo video (mock-screencast pre-cut; live screencast in flight per #14)
- GitHub Social Preview, Devpost project page
| Item | Status | Reference |
|---|---|---|
| Live screencast on SANS SIFT v22.04 (replaces mock-screencast preview) | 🟡 In progress | #14 |
| Devpost submission click | 🟡 Scheduled — 2026-06-13 (T-2) | #15 |
| Accuracy measurement on Ali Hadi Memory Forensic Challenge #1 | 🟡 In progress | #16 |
| Accuracy measurement on NIST CFReDS Hacking Case (re-measure post T1070.006 tightening) | ⏰ TODO | #1, #17 |
| Accuracy measurement on Digital Corpora M57 Patents | ⏰ TODO | #18 |
Final accuracy report committed to docs/accuracy-report.md with reference audit-tail hash |
⏰ TODO | — |
After 6/15, Phase 1 is closed. Bug fixes only on main. Architectural changes go to a Phase 2 branch.
- ❌ Live response. No
kill_process, noquarantine, noblock. The agent reads evidence, never modifies it. Response is Phase 3. - ❌ Sigma rule synthesis. v3 cites Sigma schema and hayabusa-rules as prior art, but Agentic-DART does not yet generate Sigma rules from observed evidence. That is Phase 2 (
dart-synth, #10). - ❌ Cloud DFIR. No CloudTrail, GuardDuty, or cloud-native log analysis. Phase 2 (
analyze_aws_cloudtrail, #11). - ❌ Memory forensics with Volatility. Memory is read for process tree + sockets only. Volatility-style plugin coverage is deferred to Phase 2.
- ❌ Auto-execute YAML playbooks. The v2/v3 YAML is read by the agent but execution still goes through hardcoded Python phase scaffolds. Auto-execution is Phase 2 (#34).
These omissions are intentional — Phase 1 ships a tight, defensible architecture rather than a sprawling feature surface.
Extending the agent from "investigate one case" to "improve the detection corpus from many cases".
- Read a corpus of historical incidents (audit logs from past runs) and surface coverage gaps
- Synthesize new Sigma rules from observed attacker behavior
- Quantify rule overlap, dead rules, and false-positive patterns
- Maintain a versioned detection-as-code repo separate from the agent codebase
-
New package:
dart_synth— Sigma rule synthesizer. Reads audit logs, emits.yml. Pure function:(audit_log → rule_yaml). - No change to MCP boundary. The synthesizer reads JSONL, not evidence. The boundary stays where it is.
-
New playbook:
coverage-gap-analyst-v1.yamlfor the new reasoning class.
- The agent still cannot write to the evidence tree.
- The synthesizer's output is reviewed by a human before it lands in the production rule base. Agentic-DART does not auto-deploy rules.
- Generate Sigma rules from 10+ historical cases
- ≥80% of generated rules pass review without modification
- Detection coverage gaps are surfaced before an analyst notices them in production
Triage, enrichment, and supervised response orchestration.
- Ingest live SIEM alerts, route them to the right playbook
- Enrich with TI, asset context, recent case history
- Produce response drafts (containment, lateral-movement scoping, user notifications)
- Hand off to a human-in-the-loop for any action with side effects
-
New package:
dart_responder— proposes responses, does not execute them. Output is a structured action plan, not a script. - New boundary: response side effects. A separate module owns the verbs that have effects (quarantine, isolate, kill-process, etc.). It is feature-flagged off by default. Enabling it requires per-environment configuration and a human approval step on every action.
-
New audit chain category:
proposed_actionvstaken_action. Proposed actions are logged like findings. Taken actions require cryptographic approval from a human key.
- The DFIR boundary (Phase 1) stays as-is. SOC functions live in a different boundary.
- Human approval is always required for actions with side effects. The architecture refuses to be auto-deployed without a human-in-the-loop.
- Mean time to triage drops by 50% on a representative SOC corpus
- Zero auto-executed actions without human approval (asserted by test_responder_no_auto_execute)
- Response plans match analyst plans on ≥70% of incidents
Vague on purpose. The architecture is designed to support directions we haven't picked yet:
- Continuous detection-engineering loop — Phase 2 + Phase 3 in a single closed feedback loop
- Threat-hunting agent — proactive hypothesis generation against cold storage
- Code-review assistant for security infrastructure — reads PRs to detection-as-code repos and flags rule regressions
-
Cross-environment correlation — multi-tenant
dart-corrfor organizations operating multiple SOCs
What's constant across Phase 4 directions:
- The architectural rules from Phase 1 still hold. Nothing has a general-purpose escape hatch. Every new verb is typed.
- Every action with side effects requires a human-in-the-loop.
- Every reasoning step is replayable from an audit chain.
These have been considered and rejected for the foreseeable future:
Even with high confidence, the architecture refuses to take actions with side effects without a human approving each one. We will not ship this; if you need it, fork and own the consequences.
The whole project is built on the premise that typed surfaces beat prompted ones. A general-purpose tool would defeat that.
The architectural guarantees are only meaningful if the surface is auditable. The project is MIT and will stay open source.
Splunk, Sentinel, XSOAR, etc. — these belong in adapters, not in the core. We will document the adapter interface and keep core surface vendor-neutral.
The hackathon submission is fixed. Post-6/15, the roadmap is open to community input:
-
Issues tagged
roadmap— discuss direction -
Issues tagged
phase-2,phase-3,phase-4— propose specific features for that phase - Pull requests with prototypes — strong signal. A working prototype of a Phase 2 feature is worth more than 100 issues.
If you're a hackathon judge reading this: the roadmap is here to demonstrate that the architecture was designed to expand. Phase 1 is what's submitted. Phases 2-4 are evidence the design is honest.
To be explicit:
| Request | Response |
|---|---|
"Can you add execute_shell for power users?" |
No. |
| "Can the agent auto-terminate processes?" | Not without per-action human approval. |
| "Can we ship a binary blob that 'just works'?" | No. Architecture must be auditable. |
| "Can we make the prompt more permissive 'just for this case'?" | No. Guardrails are architectural. |
- Architecture deep dive — why the design decisions in Phase 1 are load-bearing for Phases 2-4
- Threat model — what the boundary protects, what it doesn't
-
CHANGELOG.md— what has actually shipped
Agentic-DART — autonomous DFIR agent · architecture-first, not prompt-first · MIT license · github.com/Juwon1405/agentic-dart
- The Memex bet ⭐ Why this design
- About the name
- Architecture-first vs prompt-first
- Architecture deep dive
- Threat model
- Glossary
- dart-mcp — typed surface (native + SIFT adapters)
- dart-agent — senior-analyst loop
- dart-corr — cross-artifact correlation
- dart-audit — SHA-256 chained log
- dart-playbook — senior-analyst sequencing rules (v3 default)
- MCP function catalog (native + SIFT adapters)
- Comparison with adjacent tools
- FAQ
- Operator guide — distro-agnostic
- Running on SIFT
- Live mode
- Accuracy report
-
Roadmap ⭐ Phase 1 ~95% complete
- Phase 1 — Agentic DFIR ⭐ dedicated page · SANS submission
-
Phase 2 — Detection engineering
- The self-learning loop ⭐ design note
- Phase 3 — Agentic SOC
- Phase 4 — Broader agentic security