Skip to content
Juwon1405 edited this page May 2, 2026 · 15 revisions

FAQ

Project basics

What is Agentic-DART, in one sentence?

An autonomous DFIR agent on the SANS SIFT Workstation that thinks like a senior analyst — architecture-first, not prompt-first.

What does DART stand for?

Detection And Response Team. See About the name for the full four-phase plan.

Why "Agentic-DART" and not "DART"?

The "Agentic" prefix signals that this is an autonomous loop, not a wrapper around an LLM. The work unit is the agent's iteration, not the prompt.

Is this a fork of something?

No. Original work, MIT licensed. The MCP protocol is from Anthropic, and Claude is the LLM used in live mode, but the architecture and code are independent.

Why did you pick this hackathon?

SANS FIND EVIL! 2026 explicitly asks for autonomous DFIR systems on the SIFT Workstation. The judging criteria align cleanly with the architectural claims this project is making.


Technical

Is the MCP surface really fixed in size?

Yes. tests/test_mcp_surface.py asserts the exact positive set. If a 36th appears or one of the 35 disappears, the test fails on the next CI run.

Does Agentic-DART work without the Claude API?

Yes. The deterministic demo path (bash examples/demo-run.sh) runs end-to-end with no API key. Live mode (real Claude API + MCP stdio) is available but optional. See Live mode.

How big is the audit log?

~3-5 KB per MCP call. A typical 25-iteration run produces an audit log of around 120-200 KB. The chain is verified on every run; tampered logs are detected.

Why DuckDB and not SQLite?

DuckDB handles columnar joins on millions of rows orders of magnitude faster than SQLite, which matters for MFT-scale timeline correlation. SQLite is fine for the audit log; DuckDB is right for dart-corr.

Will it work on macOS / Linux outside SIFT?

Yes. macOS dev mode is documented in Running on macOS. The SIFT Workstation is the production target because that's the hackathon's target environment, but the code does not depend on SIFT-specific paths.

Why Python and not Rust / Go?

Three reasons:

  1. The MCP ecosystem is Python-first
  2. DFIR tooling (Volatility, Plaso, etc.) is Python
  3. The bottleneck is LLM API latency, not Python execution time

If a specific function needed to be rewritten in a faster language (e.g. an MFT parser doing 10M rows), it would still be exposed via the same MCP schema. The MCP surface is what the agent sees; the implementation is opaque.


Safety & guarantees

Can the agent damage evidence?

No. By construction. The MCP surface has no write functions, and the evidence directory is mounted read-only at the OS level. See Architecture-first vs prompt-first.

Can the agent make stuff up?

It can, in the sense that any LLM can. The architectural guarantee is not that the agent never hallucinates. The guarantee is that:

  1. Every claim must cite an audit_id from a real MCP call
  2. The audit log is replayable and tamper-evident
  3. dart-corr flags contradictions as UNRESOLVED rather than hiding them

So a hallucinated finding either (a) lacks an audit_id and gets blocked at write time, or (b) has an audit_id, in which case a human reviewer can replay the call and confirm.

What if the LLM ignores the system prompt?

Doesn't matter. The system prompt is not a security boundary. The MCP surface is. See Architecture-first vs prompt-first.

What's NOT in scope for safety?

  • Confidentiality of the evidence (the agent reads everything you mount)
  • Network egress prevention (run in an air-gapped environment if you care)
  • Resource exhaustion (use container limits)

These are deployment concerns. Agentic-DART addresses them by not being responsible for them.


Comparison with adjacent tools

How is this different from Velociraptor?

Velociraptor is excellent for collection. Agentic-DART is for reasoning over collected evidence. They compose: a Velociraptor flow collects, then dart-agent --case reasons over the output.

How is this different from KAPE?

KAPE is similar — collection / triage. Same compositional answer.

How is this different from a fine-tuned LLM?

This project doesn't fine-tune anything. The LLM is generic; the value comes from the architecture (MCP surface + correlation engine + audit chain + playbook). A fine-tuned LLM could replace the generic one, but it would still need this scaffolding to be safe and auditable.

How is this different from "just give the LLM bash"?

The "just give the LLM bash" approach is exactly what dart-mcp is designed to not be. See Architecture-first vs prompt-first.


Hackathon-specific

Are you submitting solo?

Yes. This is a personal/independent submission. The README's Author section makes that explicit.

Was AI used in the development?

Yes, openly. The "Development approach" section of the README discloses Claude as a coding collaborator. Architectural decisions, threat coverage taxonomy, MITRE mapping, and final review are human-driven; implementation, sample-evidence generation, test scaffolding, and documentation drafting were AI-accelerated. Every commit is reviewed before it lands.

What's the headline metric?

22 / 22 tests passing on a fresh clone. 60 typed MCP functions (35 native + 25 SIFT Workstation adapters). 11 / 12 MITRE ATT&CK enterprise tactics covered.

What are you most proud of?

The bypass tests. They make the architectural claim mechanical, not rhetorical.

What would you change with more time?

Three things:

  1. PCAP analysis for full TA0011 (Command and Control) coverage
  2. Sigma rule synthesis (Phase 2 work)
  3. A real-world dataset run against an Ali Hadi or NIST CFReDS image, with published metrics

← Back to Home

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally