Roadmap

This page is the honest version of "where is Agentic-DART going". It is structured as four phases. Phase 1 is the SANS FIND EVIL! 2026 submission and is essentially complete. Phases 2-4 are not promises; they are the directions the architecture was built to support.

The codename DART was chosen to remain accurate as the scope expands — Detection And Response covers everything below.

Phase 1 — Agentic DFIR (current, ~95% complete)

This is the SANS FIND EVIL! 2026 hackathon submission. It is architecturally complete and empirically validated against three public datasets.

Done

31 typed forensic functions across 11/12 MITRE ATT&CK enterprise tactics
Read-only MCP boundary, asserted by 6-test bypass suite
SHA-256 chained audit log, replayable, tamper-evident
Cross-artifact correlation engine (dart-corr) with UNRESOLVED contradiction surfacing
Two case-study walkthroughs with reproducible runs
Senior-analyst playbook (dart-playbook/senior-analyst-v1.yaml)
Live mode (real Claude API + JSON-RPC stdio MCP server)

Remaining for Phase 1

Final accuracy measurement run against the bundled IP-KVM case + Ali Hadi #1 + NIST CFReDS — committed to docs/accuracy-report.md with a reference audit-tail hash
Demo screencast (June 2026) replacing the sample-run stills in the README
Devpost submission, due 2026-06-15

After 6/15, Phase 1 is closed. Bug fixes only on main.

Phase 2 — Agentic detection engineering (~Q3 2026)

Extending the agent from "investigate one case" to "improve the detection corpus from many cases".

Goals

Read a corpus of historical incidents (audit logs from past runs) and surface coverage gaps
Synthesize new Sigma rules from observed attacker behavior
Quantify rule overlap, dead rules, and false-positive patterns
Maintain a versioned detection-as-code repo separate from the agent codebase

What changes architecturally

New package: dart_synth — Sigma rule synthesizer. Reads audit logs, emits .yml. Pure function: (audit_log → rule_yaml).
No change to MCP boundary. The synthesizer reads JSONL, not evidence. The boundary stays where it is.
New playbook: coverage-gap-analyst-v1.yaml for the new reasoning class.

What does not change

The agent still cannot write to the evidence tree.
The synthesizer's output is reviewed by a human before it lands in the production rule base. Agentic-DART does not auto-deploy rules.

Measure of success

Generate Sigma rules from 10+ historical cases
≥80% of generated rules pass review without modification
Detection coverage gaps are surfaced before an analyst notices them in production

Phase 3 — Agentic SOC (~Q1 2027)

Triage, enrichment, and supervised response orchestration.

Goals

Ingest live SIEM alerts, route them to the right playbook
Enrich with TI, asset context, recent case history
Produce response drafts (containment, lateral-movement scoping, user notifications)
Hand off to a human-in-the-loop for any action with side effects

What changes architecturally

New package: dart_responder — proposes responses, does not execute them. Output is a structured action plan, not a script.
New boundary: response side effects. A separate module owns the verbs that have effects (quarantine, isolate, kill-process, etc.). It is feature-flagged off by default. Enabling it requires per-environment configuration and a human approval step on every action.
New audit chain category: proposed_action vs taken_action. Proposed actions are logged like findings. Taken actions require cryptographic approval from a human key.

What does not change

The DFIR boundary (Phase 1) stays as-is. SOC functions live in a different boundary.
Human approval is always required for actions with side effects. The architecture refuses to be auto-deployed without a human-in-the-loop.

Measure of success

Mean time to triage drops by 50% on a representative SOC corpus
Zero auto-executed actions without human approval (asserted by test_responder_no_auto_execute)
Response plans match analyst plans on ≥70% of incidents

Phase 4 — Broader agentic security (~2027 and beyond)

Vague on purpose. The architecture is designed to support directions we haven't picked yet:

Continuous detection-engineering loop — Phase 2 + Phase 3 in a single closed feedback loop
Threat-hunting agent — proactive hypothesis generation against cold storage
Code-review assistant for security infrastructure — reads PRs to detection-as-code repos and flags rule regressions
Cross-environment correlation — multi-tenant dart-corr for organizations operating multiple SOCs

What's constant across Phase 4 directions:

The architectural rules from Phase 1 still hold. Nothing has a general-purpose escape hatch. Every new verb is typed.
Every action with side effects requires a human-in-the-loop.
Every reasoning step is replayable from an audit chain.

What's not on the roadmap

These have been considered and rejected for the foreseeable future:

Auto-remediation without human approval

Even with high confidence, the architecture refuses to take actions with side effects without a human approving each one. We will not ship this; if you need it, fork and own the consequences.

A general-purpose tool ("ask_anything")

The whole project is built on the premise that typed surfaces beat prompted ones. A general-purpose tool would defeat that.

Closed-source distribution

The architectural guarantees are only meaningful if the surface is auditable. The project is MIT and will stay open source.

Vendor-specific integrations as core packages

Splunk, Sentinel, XSOAR, etc. — these belong in adapters, not in the core. We will document the adapter interface and keep core surface vendor-neutral.

How to influence the roadmap

The hackathon submission is fixed. Post-6/15, the roadmap is open to community input:

Issues tagged roadmap — discuss direction
Issues tagged phase-2, phase-3, phase-4 — propose specific features for that phase
Pull requests with prototypes — strong signal. A working prototype of a Phase 2 feature is worth more than 100 issues.

If you're a hackathon judge reading this: the roadmap is here to demonstrate that the architecture was designed to expand. Phase 1 is what's submitted. Phases 2-4 are evidence the design is honest.

Anti-roadmap (what we will refuse)

To be explicit:

Request	Response
"Can you add `execute_shell` for power users?"	No.
"Can the agent auto-terminate processes?"	Not without per-action human approval.
"Can we ship a binary blob that 'just works'?"	No. Architecture must be auditable.
"Can we make the prompt more permissive 'just for this case'?"	No. Guardrails are architectural.

Roadmap

Roadmap

Phase 1 — Agentic DFIR (current, ~95% complete)

Done

Remaining for Phase 1

Phase 2 — Agentic detection engineering (~Q3 2026)

Goals

What changes architecturally

What does not change

Measure of success

Phase 3 — Agentic SOC (~Q1 2027)

Goals

What changes architecturally

What does not change

Measure of success

Phase 4 — Broader agentic security (~2027 and beyond)

What's not on the roadmap

Auto-remediation without human approval

A general-purpose tool ("ask_anything")

Closed-source distribution

Vendor-specific integrations as core packages

How to influence the roadmap

Anti-roadmap (what we will refuse)

Further reading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project

Project links

Clone this wiki locally