-
Notifications
You must be signed in to change notification settings - Fork 5
Writing case studies
How to add a new bundled case study to Agentic-DART. Case studies are the most useful contribution after typed forensic functions — they extend coverage to new attack patterns and validate the architecture against new ground truth.
| Property | Why |
|---|---|
| Mechanically verifiable ground truth | Every claim the agent makes must be checkable against artifacts. No "the analyst's intuition says X". |
| Cross-artifact correlation required | Single-artifact cases can be solved by string match. Real DFIR is correlation. |
| At least one designed contradiction | This exercises dart-corr. If everything aligns, the case is too easy. |
| Self-contained | Bundled in the repo, no external downloads, sample evidence under 50MB. |
| Reproducible exactly | Same inputs → same MITRE chain (timestamps will differ; chain structure won't). |
A new case lives at examples/case-studies/case-NN-<short-name>/:
case-NN-<short-name>/
├── README.md # operator-facing case description
├── case-overview.md # background, scope (no findings here)
├── ground-truth.json # the N findings the agent must surface
├── playbook.yaml # optional — if the standard playbook isn't a fit
└── evidence/
├── <artifact1>.csv
├── <artifact2>.evtx.csv
├── <artifact3>.db
└── ...
Numbering: case-01, case-02, ... in the order accepted into the repo.
Before writing the case, open an issue tagged case-study describing:
- Attack pattern / case class
- Why existing cases don't cover it
- Approximate ground-truth count (5-30 findings is the sweet spot)
- Source of inspiration (real CTF, public IR report, your own work)
This avoids redundancy. Some attack patterns are already covered.
Two paths:
Synthetic — generate artifacts programmatically. Good for: precise control, reproducibility, no licensing concerns. Used by examples/sample-evidence/ and case-01-ipkvm-insider.
Real (with permission) — use a published forensic dataset. Good for: realism, immediate credibility. Reference dataset name + license + DOI / URL in case-overview.md.
Either way: artifacts are read-only. The agent will assert this. If anything in your evidence directory has the wrong permissions, the run will fail at mount time.
{
"case_id": "ipkvm-insider-01",
"summary": "Contractor's IP-KVM credentials used after their VPN session ended.",
"findings": [
{
"id": "F-001",
"claim": "USB Kingston DataTraveler inserted at 22:31:18 UTC on FILE-SRV-01",
"supporting_artifacts": ["evidence/usb_history.csv"],
"mitre_techniques": ["T1052.001"]
},
{
"id": "F-002",
"claim": "...",
"supporting_artifacts": [...],
"mitre_techniques": [...]
}
]
}Each finding must be supported by an artifact in your evidence/ directory. If a finding requires the agent to "infer", the case is not mechanically verifiable — split it into atomic claims.
export DART_EVIDENCE_ROOT="$PWD/examples/case-studies/case-NN-<name>/evidence"
python3 -m dart_agent --case <case-id> --max-iterations 25Compare the agent's output to ground-truth.json:
- Recall: how many of the N findings did the agent surface?
- False positive rate: did the agent claim things not in ground truth?
- Hallucination count: did the agent claim facts not in any artifact?
If the agent doesn't reach the right MITRE chain in 25 iterations, the playbook needs a hint. Either:
- Add a
next_call_decisionsentry for your case class tosenior-analyst-v1.yaml, or - Write a case-specific
playbook.yamlthat lives in your case directory
Prefer the first if your case class generalizes (insider threat, ransomware, web breach). Use the second only if the heuristics are truly case-unique.
Use Case-PtH-Timestomp and Case-IP-KVM as templates. Include:
- The scenario (what happened, who's involved)
- The artifacts (what evidence is bundled)
- The agent's reasoning trace (call-by-call)
- Where the contradiction was (every good case has one)
- Measured accuracy
- Reproduction commands
In the PR description, include:
- Issue number you opened in step 1
- Number of findings, recall achieved, false-positive rate, audit-chain tail hash
- Any new playbook rules added, with rationale
- Confirmation that all 17 existing tests still pass
Things we will not accept:
| Anti-pattern | Why |
|---|---|
| Findings that require subjective judgment | Ground truth must be mechanical |
| Cases solvable by single function call | No architectural value |
| Cases with no contradictions | Doesn't exercise dart-corr |
| Evidence pulled from a customer / production env | Even with anonymization, risk is too high. Synthetic only, or public-corpus only. |
| Cases that require new destructive verbs to solve | The architecture says no. Ever. |
| Cases that rely on prompt instructions | Use playbook YAML or new typed functions. Not prompts. |
PRs adding case studies go through:
-
Schema check —
ground-truth.jsonvalidates against the schema inexamples/case-studies/SCHEMA.md. - Reproducibility check — CI re-runs the case, asserts the documented findings count is reached, asserts the audit chain verifies.
- Architecture check — no new prompt-based guardrails introduced; any new functions go through the standard surface-extension review (see CONTRIBUTING.md).
- Wiki page accompanying — case-study wiki page must be in the same PR.
Typical review turnaround: 3-7 days.
- Architecture deep dive
- Accuracy — how case-study accuracy gets measured
CONTRIBUTING.md
Agentic-DART — autonomous DFIR agent · architecture-first, not prompt-first · MIT license · github.com/Juwon1405/agentic-dart
- The Memex bet ⭐ Why this design
- About the name
- Architecture-first vs prompt-first
- Architecture deep dive
- Threat model
- Glossary
- dart-mcp — typed surface (native + SIFT adapters)
- dart-agent — senior-analyst loop
- dart-corr — cross-artifact correlation
- dart-audit — SHA-256 chained log
- dart-playbook — senior-analyst sequencing rules (v3 default)
- MCP function catalog (native + SIFT adapters)
- Comparison with adjacent tools
- FAQ
- Operator guide — distro-agnostic
- Running on SIFT
- Live mode
- Accuracy report
-
Roadmap ⭐ Phase 1 ~95% complete
- Phase 1 — Agentic DFIR ⭐ dedicated page · SANS submission
-
Phase 2 — Detection engineering
- The self-learning loop ⭐ design note
- Phase 3 — Agentic SOC
- Phase 4 — Broader agentic security