Skip to content

Writing case studies

Juwon1405 edited this page Jun 15, 2026 · 7 revisions

Writing case studies

How to add a new bundled case study to Agentic-DART. Case studies are the most useful contribution after typed forensic functions — they extend coverage to new attack patterns and validate the architecture against new ground truth.


What makes a good case study

Property Why
Mechanically verifiable ground truth Every claim the agent makes must be checkable against artifacts. No "the analyst's intuition says X".
Cross-artifact correlation required Single-artifact cases can be solved by string match. Real DFIR is correlation.
At least one designed contradiction This exercises dart-corr. If everything aligns, the case is too easy.
Self-contained Bundled in the repo, no external downloads, bundled evidence under 50MB.
Reproducible exactly Same inputs → same MITRE chain (timestamps will differ; chain structure won't).

Layout

A new case lives at examples/case-studies/case-NN-<short-name>/:

case-NN-<short-name>/
├── README.md              # operator-facing case description
├── case-overview.md       # background, scope (no findings here)
├── truth.json      # the N findings the agent must surface
├── playbook.yaml          # optional — if the standard playbook isn't a fit
└── evidence/
    ├── <artifact1>.csv
    ├── <artifact2>.evtx.csv
    ├── <artifact3>.db
    └── ...

Numbering: case-01, case-02, ... in the order accepted into the repo.


Step-by-step

1. Open an issue first

Before writing the case, open an issue tagged case-study describing:

  • Attack pattern / case class
  • Why existing cases don't cover it
  • Approximate ground-truth count (5-30 findings is the sweet spot)
  • Source of inspiration (real CTF, public IR report, your own work)

This avoids redundancy. Some attack patterns are already covered.

2. Generate or extract artifacts

Two paths:

Synthetic — generate artifacts programmatically. Good for: precise control, reproducibility, no licensing concerns. Used by the bundled self-evaluation cases (case-01..08).

Real (with permission) — use a published forensic dataset. Good for: realism, immediate credibility. Reference dataset name + license + DOI / URL in case-overview.md.

Either way: artifacts are read-only. The agent will assert this. If anything in your evidence directory has the wrong permissions, the run will fail at mount time.

3. Write truth.json

{
  "case_id": "ipkvm-insider-01",
  "summary": "Contractor's IP-KVM credentials used after their VPN session ended.",
  "findings": [
    {
      "id": "F-001",
      "claim": "USB Kingston DataTraveler inserted at 22:31:18 UTC on FILE-SRV-01",
      "supporting_artifacts": ["evidence/usb_history.csv"],
      "mitre_techniques": ["T1052.001"]
    },
    {
      "id": "F-002",
      "claim": "...",
      "supporting_artifacts": [...],
      "mitre_techniques": [...]
    }
  ]
}

Each finding must be supported by an artifact in your evidence/ directory. If a finding requires the agent to "infer", the case is not mechanically verifiable — split it into atomic claims.

4. Run the agent

export DART_EVIDENCE_ROOT="$PWD/examples/case-studies/case-NN-<name>/evidence"
python3 -m dart_agent --case <case-id> --out ./out/<case-id> --max-iterations 25

Compare the agent's output to truth.json:

  • Recall: how many of the N findings did the agent surface?
  • False positive rate: did the agent claim things not in ground truth?
  • Hallucination count: did the agent claim facts not in any artifact?

5. Tune the playbook (if needed)

If the agent doesn't reach the right MITRE chain in 25 iterations, the playbook needs a hint. Either:

  • Add a next_call_decisions entry for your case class to senior-analyst-v3.yaml (default), or
  • Write a case-specific playbook.yaml that lives in your case directory

Prefer the first if your case class generalizes (insider threat, ransomware, web breach). Use the second only if the heuristics are truly case-unique.

6. Write the case-study wiki page

Use Case-PtH-Timestomp and Case-IP-KVM as templates. Include:

  • The scenario (what happened, who's involved)
  • The artifacts (what evidence is bundled)
  • The agent's reasoning trace (call-by-call)
  • Where the contradiction was (every good case has one)
  • Measured accuracy
  • Reproduction commands

7. Open the PR

In the PR description, include:

  • Issue number you opened in step 1
  • Number of findings, recall achieved, false-positive rate, audit-chain tail hash
  • Any new playbook rules added, with rationale
  • Confirmation that all 20 existing tests still pass

Anti-patterns

Things we will not accept:

Anti-pattern Why
Findings that require subjective judgment Ground truth must be mechanical
Cases solvable by single function call No architectural value
Cases with no contradictions Doesn't exercise dart-corr
Evidence pulled from a customer / production env Even with anonymization, risk is too high. Synthetic only, or public-corpus only.
Cases that require new destructive verbs to solve The architecture says no. Ever.
Cases that rely on prompt instructions Use playbook YAML or new typed functions. Not prompts.

Review process

PRs adding case studies go through:

  1. Schema checktruth.json validates against the schema in examples/case-studies/SCHEMA.md.
  2. Reproducibility check — CI re-runs the case, asserts the documented findings count is reached, asserts the audit chain verifies.
  3. Architecture check — no new prompt-based guardrails introduced; any new functions go through the standard surface-extension review (see CONTRIBUTING.md).
  4. Wiki page accompanying — case-study wiki page must be in the same PR.

Typical review turnaround: 3-7 days.


See also

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally