Writing case studies

How to add a new bundled case study to Agentic-DART. Case studies are the most useful contribution after typed forensic functions — they extend coverage to new attack patterns and validate the architecture against new ground truth.

What makes a good case study

Property	Why
Mechanically verifiable ground truth	Every claim the agent makes must be checkable against artifacts. No "the analyst's intuition says X".
Cross-artifact correlation required	Single-artifact cases can be solved by string match. Real DFIR is correlation.
At least one designed contradiction	This exercises dart-corr. If everything aligns, the case is too easy.
Self-contained	Bundled in the repo, no external downloads, bundled evidence under 50MB.
Reproducible exactly	Same inputs → same MITRE chain (timestamps will differ; chain structure won't).

Layout

A new case lives at examples/case-studies/case-NN-<short-name>/:

case-NN-<short-name>/
├── README.md              # operator-facing case description
├── case-overview.md       # background, scope (no findings here)
├── truth.json      # the N findings the agent must surface
├── playbook.yaml          # optional — if the standard playbook isn't a fit
└── evidence/
    ├── <artifact1>.csv
    ├── <artifact2>.evtx.csv
    ├── <artifact3>.db
    └── ...

Numbering: case-01, case-02, ... in the order accepted into the repo.

Step-by-step

1. Open an issue first

Before writing the case, open an issue tagged case-study describing:

Attack pattern / case class
Why existing cases don't cover it
Approximate ground-truth count (5-30 findings is the sweet spot)
Source of inspiration (real CTF, public IR report, your own work)

This avoids redundancy. Some attack patterns are already covered.

2. Generate or extract artifacts

Two paths:

Synthetic — generate artifacts programmatically. Good for: precise control, reproducibility, no licensing concerns. Used by the bundled self-evaluation cases (case-01..08).

Real (with permission) — use a published forensic dataset. Good for: realism, immediate credibility. Reference dataset name + license + DOI / URL in case-overview.md.

Either way: artifacts are read-only. The agent will assert this. If anything in your evidence directory has the wrong permissions, the run will fail at mount time.

3. Write `truth.json`

{
  "case_id": "ipkvm-insider-01",
  "summary": "Contractor's IP-KVM credentials used after their VPN session ended.",
  "findings": [
    {
      "id": "F-001",
      "claim": "USB Kingston DataTraveler inserted at 22:31:18 UTC on FILE-SRV-01",
      "supporting_artifacts": ["evidence/usb_history.csv"],
      "mitre_techniques": ["T1052.001"]
    },
    {
      "id": "F-002",
      "claim": "...",
      "supporting_artifacts": [...],
      "mitre_techniques": [...]
    }
  ]
}

Each finding must be supported by an artifact in your evidence/ directory. If a finding requires the agent to "infer", the case is not mechanically verifiable — split it into atomic claims.

4. Run the agent

export DART_EVIDENCE_ROOT="$PWD/examples/case-studies/case-NN-<name>/evidence"
python3 -m dart_agent --case <case-id> --out ./out/<case-id> --max-iterations 25

Compare the agent's output to truth.json:

Recall: how many of the N findings did the agent surface?
False positive rate: did the agent claim things not in ground truth?
Hallucination count: did the agent claim facts not in any artifact?

5. Tune the playbook (if needed)

If the agent doesn't reach the right MITRE chain in 25 iterations, the playbook needs a hint. Either:

Add a next_call_decisions entry for your case class to senior-analyst-v3.yaml (default), or
Write a case-specific playbook.yaml that lives in your case directory

Prefer the first if your case class generalizes (insider threat, ransomware, web breach). Use the second only if the heuristics are truly case-unique.

6. Write the case-study wiki page

Use Case-PtH-Timestomp and Case-IP-KVM as templates. Include:

The scenario (what happened, who's involved)
The artifacts (what evidence is bundled)
The agent's reasoning trace (call-by-call)
Where the contradiction was (every good case has one)
Measured accuracy
Reproduction commands

7. Open the PR

In the PR description, include:

Issue number you opened in step 1
Number of findings, recall achieved, false-positive rate, audit-chain tail hash
Any new playbook rules added, with rationale
Confirmation that all 20 existing tests still pass

Anti-patterns

Things we will not accept:

Anti-pattern	Why
Findings that require subjective judgment	Ground truth must be mechanical
Cases solvable by single function call	No architectural value
Cases with no contradictions	Doesn't exercise dart-corr
Evidence pulled from a customer / production env	Even with anonymization, risk is too high. Synthetic only, or public-corpus only.
Cases that require new destructive verbs to solve	The architecture says no. Ever.
Cases that rely on prompt instructions	Use playbook YAML or new typed functions. Not prompts.

Review process

PRs adding case studies go through:

Schema check — truth.json validates against the schema in examples/case-studies/SCHEMA.md.
Reproducibility check — CI re-runs the case, asserts the documented findings count is reached, asserts the audit chain verifies.
Architecture check — no new prompt-based guardrails introduced; any new functions go through the standard surface-extension review (see CONTRIBUTING.md).
Wiki page accompanying — case-study wiki page must be in the same PR.

Typical review turnaround: 3-7 days.

Writing case studies

Writing case studies

What makes a good case study

Layout

Step-by-step

1. Open an issue first

2. Generate or extract artifacts

3. Write truth.json

4. Run the agent

5. Tune the playbook (if needed)

6. Write the case-study wiki page

7. Open the PR

Anti-patterns

Review process

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project

Project links

Clone this wiki locally

3. Write `truth.json`