-
Notifications
You must be signed in to change notification settings - Fork 5
Roadmap
This page is the honest version of "where is Agentic-DART going". It is structured as four phases. Phase 1 is the SANS FIND EVIL! 2026 submission and is essentially complete. Phases 2-4 are not promises; they are the directions the architecture was built to support.
The codename DART was chosen to remain accurate as the scope
expands — Detection And Response covers everything below.
This is the SANS FIND EVIL! 2026 hackathon submission. It is architecturally complete and empirically validated against three public datasets.
- 31 typed forensic functions across 11/12 MITRE ATT&CK enterprise tactics
- Read-only MCP boundary, asserted by 6-test bypass suite
- SHA-256 chained audit log, replayable, tamper-evident
- Cross-artifact correlation engine (
dart-corr) withUNRESOLVEDcontradiction surfacing - Two case-study walkthroughs with reproducible runs
- Senior-analyst playbook (
dart-playbook/senior-analyst-v1.yaml) - Live mode (real Claude API + JSON-RPC stdio MCP server)
- Final accuracy measurement run against the bundled IP-KVM case +
Ali Hadi #1 + NIST CFReDS — committed to
docs/accuracy-report.mdwith a reference audit-tail hash - Demo screencast (June 2026) replacing the sample-run stills in the README
- Devpost submission, due 2026-06-15
After 6/15, Phase 1 is closed. Bug fixes only on main.
Extending the agent from "investigate one case" to "improve the detection corpus from many cases".
- Read a corpus of historical incidents (audit logs from past runs) and surface coverage gaps
- Synthesize new Sigma rules from observed attacker behavior
- Quantify rule overlap, dead rules, and false-positive patterns
- Maintain a versioned detection-as-code repo separate from the agent codebase
-
New package:
dart_synth— Sigma rule synthesizer. Reads audit logs, emits.yml. Pure function:(audit_log → rule_yaml). - No change to MCP boundary. The synthesizer reads JSONL, not evidence. The boundary stays where it is.
-
New playbook:
coverage-gap-analyst-v1.yamlfor the new reasoning class.
- The agent still cannot write to the evidence tree.
- The synthesizer's output is reviewed by a human before it lands in the production rule base. Agentic-DART does not auto-deploy rules.
- Generate Sigma rules from 10+ historical cases
- ≥80% of generated rules pass review without modification
- Detection coverage gaps are surfaced before an analyst notices them in production
Triage, enrichment, and supervised response orchestration.
- Ingest live SIEM alerts, route them to the right playbook
- Enrich with TI, asset context, recent case history
- Produce response drafts (containment, lateral-movement scoping, user notifications)
- Hand off to a human-in-the-loop for any action with side effects
-
New package:
dart_responder— proposes responses, does not execute them. Output is a structured action plan, not a script. - New boundary: response side effects. A separate module owns the verbs that have effects (quarantine, isolate, kill-process, etc.). It is feature-flagged off by default. Enabling it requires per-environment configuration and a human approval step on every action.
-
New audit chain category:
proposed_actionvstaken_action. Proposed actions are logged like findings. Taken actions require cryptographic approval from a human key.
- The DFIR boundary (Phase 1) stays as-is. SOC functions live in a different boundary.
- Human approval is always required for actions with side effects. The architecture refuses to be auto-deployed without a human-in-the-loop.
- Mean time to triage drops by 50% on a representative SOC corpus
- Zero auto-executed actions without human approval (asserted by test_responder_no_auto_execute)
- Response plans match analyst plans on ≥70% of incidents
Vague on purpose. The architecture is designed to support directions we haven't picked yet:
- Continuous detection-engineering loop — Phase 2 + Phase 3 in a single closed feedback loop
- Threat-hunting agent — proactive hypothesis generation against cold storage
- Code-review assistant for security infrastructure — reads PRs to detection-as-code repos and flags rule regressions
-
Cross-environment correlation — multi-tenant
dart-corrfor organizations operating multiple SOCs
What's constant across Phase 4 directions:
- The architectural rules from Phase 1 still hold. Nothing has a general-purpose escape hatch. Every new verb is typed.
- Every action with side effects requires a human-in-the-loop.
- Every reasoning step is replayable from an audit chain.
These have been considered and rejected for the foreseeable future:
Even with high confidence, the architecture refuses to take actions with side effects without a human approving each one. We will not ship this; if you need it, fork and own the consequences.
The whole project is built on the premise that typed surfaces beat prompted ones. A general-purpose tool would defeat that.
The architectural guarantees are only meaningful if the surface is auditable. The project is MIT and will stay open source.
Splunk, Sentinel, XSOAR, etc. — these belong in adapters, not in the core. We will document the adapter interface and keep core surface vendor-neutral.
The hackathon submission is fixed. Post-6/15, the roadmap is open to community input:
-
Issues tagged
roadmap— discuss direction -
Issues tagged
phase-2,phase-3,phase-4— propose specific features for that phase - Pull requests with prototypes — strong signal. A working prototype of a Phase 2 feature is worth more than 100 issues.
If you're a hackathon judge reading this: the roadmap is here to demonstrate that the architecture was designed to expand. Phase 1 is what's submitted. Phases 2-4 are evidence the design is honest.
To be explicit:
| Request | Response |
|---|---|
"Can you add execute_shell for power users?" |
No. |
| "Can the agent auto-terminate processes?" | Not without per-action human approval. |
| "Can we ship a binary blob that 'just works'?" | No. Architecture must be auditable. |
| "Can we make the prompt more permissive 'just for this case'?" | No. Guardrails are architectural. |
- Architecture deep dive — why the design decisions in Phase 1 are load-bearing for Phases 2-4
- Threat model — what the boundary protects, what it doesn't
-
CHANGELOG.md— what has actually shipped
Agentic-DART — autonomous DFIR agent · architecture-first, not prompt-first · MIT license · github.com/Juwon1405/agentic-dart
- The Memex bet ⭐ Why this design
- About the name
- Architecture-first vs prompt-first
- Architecture deep dive
- Threat model
- Glossary
- dart-mcp — typed surface (native + SIFT adapters)
- dart-agent — senior-analyst loop
- dart-corr — cross-artifact correlation
- dart-audit — SHA-256 chained log
- dart-playbook — senior-analyst sequencing rules (v3 default)
- MCP function catalog (native + SIFT adapters)
- Comparison with adjacent tools
- FAQ
- Operator guide — distro-agnostic
- Running on SIFT
- Live mode
- Accuracy report
-
Roadmap ⭐ Phase 1 ~95% complete
- Phase 1 — Agentic DFIR ⭐ dedicated page · SANS submission
-
Phase 2 — Detection engineering
- The self-learning loop ⭐ design note
- Phase 3 — Agentic SOC
- Phase 4 — Broader agentic security