Skip to content

dart playbook

Bang Juwon edited this page May 14, 2026 · 9 revisions

dart-playbook · Sequencing rules in YAML

The agent's playbook. A YAML file that encodes "what should a senior analyst look at next, given the current state of the case?" — without writing imperative Python.


Why YAML, not Python

The whole point of architecture-first, not prompt-first is that operator-tunable rules don't live in the model's prompt. They live in YAML the operator can read and edit.

A Python playbook would couple the rules to the agent's release cycle. A YAML playbook is data: an analyst can fork the playbook, tune for their specific case class (web-app breach vs insider threat vs ransomware), and commit it to their own runbook repo.


Bundled playbooks

Playbook Lines Phases Case classes When to use
senior-analyst-v1.yaml 133 4 3 Quick demos / simple scenarios
senior-analyst-v2.yaml 845 10 10 Methodology baseline (Mandiant + Bianco + Diamond)
senior-analyst-v3.yaml Default 10 10 + UC IDs Default. Industrialized — adds ADS + MaGMa + TaHiTI + HMM

v3 is the default for any new case. v2 is retained as the methodology baseline (no v3 industrialization scaffolds) so pre-industrialization runs remain reproducible. v1 is kept for backward compatibility and tutorials.


senior-analyst-v3 — industrialization release ⭐ (default)

v3 is the industrialization release. v2 encoded a senior analyst's reasoning. v3 encodes a mature SOC's operating model around that reasoning as YAML data so it's inspectable, forkable, and citable.

Honest framing. The four framework blocks below ship in v3 as structured YAML data. They define the contract a mature-SOC implementation should satisfy. The runtime activation of these contracts in dart_agent and dart_corr is intentionally a post-SANS work item (tracked in issue #44) — activating any of them at runtime would shift the baseline measured by scripts/measure_accuracy.py mid-window. v2's runtime path (10-phase sequence + next_call_decisions + contradiction_triggers + stop_conditions) is what the agent actually executes today, and it remains intact in v3.

Four new framework blocks layered on top of v2

1. Palantir ADS Framework

github.com/palantir/alerting-detection-strategy-framework

Encoded as ads_template in the v3 YAML. Defines a 9-section documentation contract for every detection: goal, categorization (MITRE ATT&CK), strategy abstract, technical context, blind spots & assumptions, false positives, validation (Atomic Red Team test ID), priority, response (SOAR runbook ref).

Lint modes documented: permissivewarnstrict. The lint pass that enforces the contract on each finding is post-SANS.

2. MaGMa Use Case Framework

FI-ISAC NL · Rob van Os (SOC-CMM author) · full paper

Encoded as magma_ucf in the v3 YAML. Three-tier traceability:

  • L1 business drivers (4 entries) — protect data integrity, detect ransomware before recovery denial, etc.
  • L2 attack patterns (8 entries, MITRE-mapped) — AP-001 ransomware-recovery-denial through AP-008 IP-KVM-physical-access
  • L3 detection coverage — MCP function mapping per L2 pattern

CMMI 5-level maturity scale documented:

  1. Initial (ad-hoc) → 2. Managed (documented) → 3. Defined (ADS-templated) → 4. Quantitatively Managed (FP/TP measured) → 5. Optimizing (TI feedback loop active)

v3 yaml self-declares L3 Defined as the current state. Per-run runtime CMMI scoring is post-SANS work.

3. TaHiTI Threat Hunt Cycle

Rob van Os et al.

Encoded as hunt_cycle in the v3 YAML, with the designed trigger condition confidence < 0.6 AND iterations >= 8 and three phases:

  • H1 Initiate — document hypothesis, attach TI context (M-Trends, DFIR Report, CISA, Sigma)
  • H2 Hunt — execute targeted MCP calls, pivot through Pyramid of Pain
  • H3 Finalize — emit findings + new ADS, OR document negative result, OR hand off

Runtime entry into hunt mode from the agent loop on plateau detection is post-SANS work — the data scaffold is what defines what a TaHiTI-aware run would look like.

4. Bianco Hunting Maturity Model

David Bianco · sans.org

Encoded as hunting_maturity_model in the v3 YAML. Five levels documented with what each implies:

  • HMM0 Initial — no hunt
  • HMM1 Minimal — TI-driven (IOC-based)
  • HMM2 Procedural — published procedures (e.g. ThreatHunter-Playbook)
  • HMM3 Innovative — analyst-formed hypotheses ⭐ v3 yaml self-declares
  • HMM4 Leading — automated hypothesis generation (Phase 2 target)

Per-run runtime self-classification by the agent is post-SANS work. The v3 yaml's agentic_dart_self_classification: HMM3_innovative declares the framework's intended target level.

Reference corpus

42 published references organized into 6 categories. v3 adds +17 net items vs v2's 25 (15 industrialization frameworks + 2 inspiration tools + 2 new vendor research entries; v2's primary_methodology consolidated 8 → 6):

  • industrialization_frameworks_v3 (15, NEW in v3) — Palantir ADS, MaGMa, TaHiTI, SOC-CMM, MITRE 11 Strategies, awesome-soc (cyb3rxp), awesome-incident-response (meirwah), awesome-threat-detection (0x4D31), ThreatHunter-Playbook (OTRF), Florian Roth Detection Engineering Cheat Sheet, Crafting the InfoSec Playbook (Bollinger et al.), Atomic Red Team, Sigma schema
  • related_tools_for_inspiration (2, NEW in v3) — Hayabusa, EnableWindowsLogSettings (both Yamato Security, Tokyo) cited as third-party prior art*
  • primary_methodology (6, consolidated from v2's 8) — Mandiant Targeted Attack Lifecycle, Lockheed Kill Chain, MITRE ATT&CK v16, Bianco Pyramid of Pain, Diamond Model, F3EAD
  • case_studies_2025 (4, carried from v2) — DFIR Report walkthroughs, M-Trends, CISA #StopRansomware advisories
  • vendor_research (10, +2 vs v2: Roberto Rodriguez OTRF, Zach Mathis Yamato Security Tokyo*)
  • standards (5, carried from v2) — NIST SP 800-61/86/150, ISO 27035, ENISA IH

Yamato Security is an independent Tokyo-based DFIR group; Agentic-DART has no affiliation or partnership with them. Their tools are cited as external community references and field-calibration prior art only — no code or rules are imported.


Methodology lineage (inherited from v2, still authoritative in v3)

This section documents the methodological foundation that v2 first encoded and that v3 inherits unchanged. v3's industrialization scaffolds (above) sit on top of this lineage; the runtime path that the agent actually executes today is still the one encoded here. Operators forking v3 should read this section to understand why each phase, decision rule, and contradiction trigger is shaped the way it is.

v2 (released 2026-04-30, 845 lines) synthesizes every authoritative source on modern DFIR practice into a single executable playbook. It is, in effect, an audit-chained encoding of how a senior analyst with 10+ years of frontline IR experience would approach a case.

Primary frameworks

  • Mandiant M-Trends 2026 — 500K hours of 2025 IR engagements; informs the posture block (14-day dwell time, 22-second hand-off, 32%/11%/10% initial-access priors)
  • Mandiant Targeted Attack Lifecycle — 8-phase model from Initial Recon to Complete Mission
  • SANS PICERL — Preparation / Identification / Containment / Eradication / Recovery / Lessons learned
  • Lockheed Martin Cyber Kill Chain — Hutchins, Cloppert & Amin 2011, Intelligence-Driven Computer Network Defense
  • David BiancoPyramid of Pain (TTPs over IOCs) + Hunting Maturity Model
  • Diamond Model of Intrusion Analysis — Caltagirone, Pendergast, Betz 2013 (adversary / capability / infrastructure / victim)
  • MITRE ATT&CK Enterprise v16 — 12 tactics, 200+ techniques, fully mapped
  • F3EAD — Find, Fix, Finish, Exploit, Analyze, Disseminate (originally U.S. military targeting; standard in modern DFIR)
  • NIST SP 800-61 / 800-86 / 800-150

Case studies grounded in frontline reports (2024–2026)

  • The DFIR Report — BlackSuit, Akira, Fog, Lynx, BlueSky, RansomHub, MEOWBACKCONN
  • CISA #StopRansomware — Akira AA24-109A (Nov 2025)
  • Verizon DBIR 2025/2026 — vulnerability exploitation +180%, third-party compromise 30% of breaches

Field practitioners cited per technique

  • Sean Metcalf — Active Directory attack detection, Kerberoasting/AS-REP roasting
  • Sarah Edwards — macOS forensic analysis, KnowledgeC, unified log
  • Patrick WardleThe Art of Mac Malware persistence catalog
  • Hal Pomeranz — Linux IR workflows, auditd methodology
  • Eric Zimmerman — Windows artifact field semantics
  • Andrew Case — memory forensics, Volatility
  • Florian Roth — detection corpus, Sigma rules
  • JPCERT/CCDetecting Lateral Movement through Tracking Event Logs

The 10 phases

P0  Volatility & scope               memory, sockets, credential signals
P1  Initial access vector triage     exploit (32%) / vishing (11%) / IAB (10%)
P2  Timeline reconstruction          MFT + AmCache + Prefetch + auditd + journal
P3  Anomaly surfacing                list anomalies WITHOUT explaining them
P4  Hypothesis formation             falsifiable, MITRE-named, data-source-named
P5  Kill-chain assembly              >=3 tactics, monotonic timestamps, audit_id
P6  Contradiction handling           UNRESOLVED -> revise (architecturally enforced)
P7  Attribution / Diamond Model      adversary / capability / infrastructure / victim
P8  Recovery-denial check            identity / virtualization / backup
P9  Finding emission                 audit_id citation enforced by serializer

Each phase has:

  • rationale — why this order. Cited to source.
  • pyramid_layer — where it sits in Bianco's Pyramid (foundation / middle / top / orientation / deliverable)
  • mcp_calls — which dart-mcp functions to invoke
  • anti_patterns — what naive analysts do wrong
  • senior_analyst_heuristic — what experienced analysts actually do
  • exit_criteria — when the phase is closed

Anatomy of senior-analyst-v3.yaml

senior-analyst-v3.yaml is the canonical and default playbook. Below is its top-level shape — the v2 carry-over keys (target_case_classes, posture, sequence, next_call_decisions, contradiction_triggers, stop_conditions) are unchanged from v2, and four new top-level keys are added for the v3 industrialization frameworks. v2 remains in the repo for reproducibility of pre-industrialization runs.

version: 3
name: senior-analyst-v3
created: 2026-05-01
supersedes: senior-analyst-v2

methodology_lineage:               # 13 cumulative citations (v2 + v3)
  - mandiant_targeted_attack_lifecycle
  - lockheed_kill_chain
  - mitre_attack_v16
  - bianco_pyramid_of_pain
  - diamond_model
  - f3ead
  # v3 additions:
  - palantir_ads_framework
  - magma_ucf
  - tahiti_threat_hunting
  - bianco_hunting_maturity_model
  # ... (full list in file)

# === v3 industrialization additions (4 framework blocks) ===

ads_template:                      # Palantir 9-section detection contract
  required_sections: [Goal, Categorization, Strategy_Abstract,
                      Technical_Context, Blind_Spots_Assumptions,
                      False_Positives, Validation, Priority, Response]
  lint_modes: [permissive, warn, strict]
  current_default: warn

magma_ucf:                         # FI-ISAC NL three-tier UCF
  l1_business_drivers: [...]
  l2_attack_patterns: [...]
  l3_detection_coverage: [...]
  uc_id_format: "UC-DART-NNNN"
  cmmi_levels: 5

hunt_cycle:                        # TaHiTI H1 / H2 / H3
  trigger: "any phase exits with confidence < 0.6 after iterations >= 8"
  phases: [H1_initiate, H2_hunt, H3_finalize]

hunting_maturity_model:            # Bianco HMM 0-4, operationalized
  levels: [HMM0_initial, HMM1_minimal, HMM2_procedural,
           HMM3_innovative, HMM4_leading]
  agentic_dart_self_classification: HMM3_innovative

# === v2 carry-over (unchanged) ===

target_case_classes: [...]         # 10 case classes (insider, remote-hands,
                                   #   LotL, ransomware, identity, vishing,
                                   #   exploit, third-party, cloud-hybrid,
                                   #   division-of-labour)

posture:                           # M-Trends priors
  dwell_time_assumption_days: 14
  initial_access_priors: [...]
  attacker_speed_assumption: {...}

sequence:                          # 10 phases, P0-P9 (unchanged from v2)
  - phase: P0_scope_and_volatility
    pyramid_layer: orientation
    rationale: |
      Memory and network state evaporate on reboot. Process tree,
      open sockets, and loaded drivers must be captured before
      anything else, even before reading disk artifacts.
      Senior-analyst principle (Eric Zimmerman): "Order of volatility
      is not a suggestion; it's a one-way door."
    mcp_calls: [get_process_tree, detect_credential_access]
    anti_patterns:
      - "Pulling the disk image before snapshotting memory"
      - "Rebooting 'to be safe' - destroys all volatile evidence"
      - "Running antivirus scan as first action - may quarantine evidence"
    exit_criteria:
      process_tree_captured: true
      credential_access_signals_logged: true
  # ... (P1-P9, see file for full)

next_call_decisions:               # 24 state -> tool routing rules
  - when_state: "no MFT timeline yet"
    call: extract_mft_timeline
    confidence_gain: 0.20
    rationale: "MFT is foundational - Eric Zimmerman: 'MFT is god'"
  # ...

contradiction_triggers:            # 7 architectural contradictions
  - id: timestomp_predates_alert
    rule: "If $SI < $FN AND mismatch_ts < alert_ts, persistence pre-existed"
    severity: critical
    mitre: T1070.006
  # ...

stop_conditions:                   # 6 termination conditions
  - condition: confidence >= 0.92
    action: emit_findings
  - condition: hypothesis_revision_count >= 5
    action: declare_complex_case_request_human
    note: |
      A case that has revised the hypothesis 5+ times is beyond what
      automated reasoning should commit to. Hand off to a human
      analyst with the audit chain attached.

references: {...}                  # 6 categorized reference groups
operator_notes: |                  # Senior-analyst principles
  ...

How the playbook gets executed

The agent reads the playbook at startup. Each iteration, it:

  1. Determines the current phase based on what's been done
  2. Reads next_call_decisions to pick the next MCP call
  3. Invokes the call through dart-mcp (which is bounded by the architectural-first surface)
  4. Logs result to the audit chain (dart-audit)
  5. Runs dart-corr to surface contradictions
  6. If a contradiction matches a contradiction_trigger, the hypothesis is mandatorily revised
  7. Checks stop_conditions to decide whether to emit findings

In deterministic mode the agent follows this YAML literally. In live mode Claude can deviate, but every call still goes through the typed dart-mcp surface, and the contradiction triggers + stop conditions still apply.


Writing your own playbook

  1. Copy senior-analyst-v3.yaml to <your-name>-v1.yaml
  2. Update target_case_classes for your scope
  3. Tune next_call_decisions for your environment's priorities
  4. Add environment-specific contradiction_triggers
  5. Optionally adjust ads_template, magma_ucf, and hunt_cycle for your SOC's maturity profile
  6. Run with --playbook <your-name>-v1.yaml

The agent will follow your sequencing while the architectural guarantees (read-only, audit-chained, contradiction-aware) are unchanged. A playbook cannot loosen architectural guarantees. It can only choose what to call from the surface, never expand the surface.


Files

dart_playbook/
├── README.md
├── senior-analyst-v1.yaml      # 133 lines, 4 phases (legacy; v0.5.2 fixed memory-fn refs)
├── senior-analyst-v2.yaml      # 845 lines, 10 phases (methodology baseline; retained for reproducibility)
└── senior-analyst-v3.yaml      # Default playbook — ten-phase methodology + 4 framework blocks (DEFAULT)

Six principles every senior analyst remembers

(From senior-analyst-v3.yaml::operator_notes, inherited unchanged from v2)

  1. Phase order is strict. Memory disappears. Volatility before disk, always.
  2. Hypotheses are falsifiable. "Something bad happened" is not a hypothesis. "T1003.001 LSASS dump via comsvcs.dll executed at 14:23:09 UTC" is.
  3. Contradictions are gold. When two artifacts disagree, that's the most valuable signal in the case. Smoothing it over is malpractice.
  4. Recovery-denial check is mandatory for any modern ransomware case (M-Trends 2026 #1 trend). Endpoint encryption is the diversion, not the impact.
  5. Attribution is multi-vector. Diamond Model with 4 corners or no attribution claim. Single-IOC attribution is what gets analysts fired.
  6. Findings cite audit_ids. Always. The serializer refuses anything else — that's not a guideline, that's architecture.

See also

Agentic-DART

Concepts

The 5 packages

Reference

Running it

Case studies

Project


Project links

Clone this wiki locally