Skip to content

Roadmap

Juwon1405 edited this page Apr 30, 2026 · 18 revisions

Roadmap

This page is the honest version of "where is Agentic-DART going". It is structured as four phases. Phase 1 is the SANS FIND EVIL! 2026 submission and is essentially complete. Phases 2-4 are not promises; they are the directions the architecture was built to support.

The codename DART was chosen to remain accurate as the scope expands — Detection And Response covers everything below.


Phase 1 — Agentic DFIR (current, ~95% complete)

This is the SANS FIND EVIL! 2026 hackathon submission. It is architecturally complete and empirically validated against three public datasets.

Done

  • 31 typed forensic functions across 11/12 MITRE ATT&CK enterprise tactics
  • Read-only MCP boundary, asserted by 6-test bypass suite
  • SHA-256 chained audit log, replayable, tamper-evident
  • Cross-artifact correlation engine (dart-corr) with UNRESOLVED contradiction surfacing
  • Two case-study walkthroughs with reproducible runs
  • Senior-analyst playbook (dart-playbook/senior-analyst-v1.yaml)
  • Live mode (real Claude API + JSON-RPC stdio MCP server)

Remaining for Phase 1

  • Final accuracy measurement run against the bundled IP-KVM case + Ali Hadi #1 + NIST CFReDS — committed to docs/accuracy-report.md with a reference audit-tail hash
  • Demo screencast (June 2026) replacing the sample-run stills in the README
  • Devpost submission, due 2026-06-15

After 6/15, Phase 1 is closed. Bug fixes only on main.


Phase 2 — Agentic detection engineering (~Q3 2026)

Extending the agent from "investigate one case" to "improve the detection corpus from many cases".

Goals

  • Read a corpus of historical incidents (audit logs from past runs) and surface coverage gaps
  • Synthesize new Sigma rules from observed attacker behavior
  • Quantify rule overlap, dead rules, and false-positive patterns
  • Maintain a versioned detection-as-code repo separate from the agent codebase

What changes architecturally

  • New package: dart_synth — Sigma rule synthesizer. Reads audit logs, emits .yml. Pure function: (audit_log → rule_yaml).
  • No change to MCP boundary. The synthesizer reads JSONL, not evidence. The boundary stays where it is.
  • New playbook: coverage-gap-analyst-v1.yaml for the new reasoning class.

What does not change

  • The agent still cannot write to the evidence tree.
  • The synthesizer's output is reviewed by a human before it lands in the production rule base. Agentic-DART does not auto-deploy rules.

Measure of success

  • Generate Sigma rules from 10+ historical cases
  • ≥80% of generated rules pass review without modification
  • Detection coverage gaps are surfaced before an analyst notices them in production

Phase 3 — Agentic SOC (~Q1 2027)

Triage, enrichment, and supervised response orchestration.

Goals

  • Ingest live SIEM alerts, route them to the right playbook
  • Enrich with TI, asset context, recent case history
  • Produce response drafts (containment, lateral-movement scoping, user notifications)
  • Hand off to a human-in-the-loop for any action with side effects

What changes architecturally

  • New package: dart_responder — proposes responses, does not execute them. Output is a structured action plan, not a script.
  • New boundary: response side effects. A separate module owns the verbs that have effects (quarantine, isolate, kill-process, etc.). It is feature-flagged off by default. Enabling it requires per-environment configuration and a human approval step on every action.
  • New audit chain category: proposed_action vs taken_action. Proposed actions are logged like findings. Taken actions require cryptographic approval from a human key.

What does not change

  • The DFIR boundary (Phase 1) stays as-is. SOC functions live in a different boundary.
  • Human approval is always required for actions with side effects. The architecture refuses to be auto-deployed without a human-in-the-loop.

Measure of success

  • Mean time to triage drops by 50% on a representative SOC corpus
  • Zero auto-executed actions without human approval (asserted by test_responder_no_auto_execute)
  • Response plans match analyst plans on ≥70% of incidents

Phase 4 — Broader agentic security (~2027 and beyond)

Vague on purpose. The architecture is designed to support directions we haven't picked yet:

  • Continuous detection-engineering loop — Phase 2 + Phase 3 in a single closed feedback loop
  • Threat-hunting agent — proactive hypothesis generation against cold storage
  • Code-review assistant for security infrastructure — reads PRs to detection-as-code repos and flags rule regressions
  • Cross-environment correlation — multi-tenant dart-corr for organizations operating multiple SOCs

What's constant across Phase 4 directions:

  • The architectural rules from Phase 1 still hold. Nothing has a general-purpose escape hatch. Every new verb is typed.
  • Every action with side effects requires a human-in-the-loop.
  • Every reasoning step is replayable from an audit chain.

What's not on the roadmap

These have been considered and rejected for the foreseeable future:

Auto-remediation without human approval

Even with high confidence, the architecture refuses to take actions with side effects without a human approving each one. We will not ship this; if you need it, fork and own the consequences.

A general-purpose tool ("ask_anything")

The whole project is built on the premise that typed surfaces beat prompted ones. A general-purpose tool would defeat that.

Closed-source distribution

The architectural guarantees are only meaningful if the surface is auditable. The project is MIT and will stay open source.

Vendor-specific integrations as core packages

Splunk, Sentinel, XSOAR, etc. — these belong in adapters, not in the core. We will document the adapter interface and keep core surface vendor-neutral.


How to influence the roadmap

The hackathon submission is fixed. Post-6/15, the roadmap is open to community input:

  • Issues tagged roadmap — discuss direction
  • Issues tagged phase-2, phase-3, phase-4 — propose specific features for that phase
  • Pull requests with prototypes — strong signal. A working prototype of a Phase 2 feature is worth more than 100 issues.

If you're a hackathon judge reading this: the roadmap is here to demonstrate that the architecture was designed to expand. Phase 1 is what's submitted. Phases 2-4 are evidence the design is honest.


Anti-roadmap (what we will refuse)

To be explicit:

Request Response
"Can you add execute_shell for power users?" No.
"Can the agent auto-terminate processes?" Not without per-action human approval.
"Can we ship a binary blob that 'just works'?" No. Architecture must be auditable.
"Can we make the prompt more permissive 'just for this case'?" No. Guardrails are architectural.

Further reading

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally