wiki: bring release log to v1.2.0, document model-aware auth + live Sigma pack, add self-learning loop design
wiki: correct tool count to live 73 (48 native + 25 SIFT); drop v1.0.2 version pin
The current-surface counts were stale: 72 (47 native) -> 73 (48 native) after the
Sigma matcher tool landed. Fixed in Glossary, Live-mode, Phase-1 (the live-surface
line), and Roadmap. The Glossary's 'As of v1.0.2' version pin is dropped so the
count needn't carry a release number. The Phase-1 changelog row for v0.7.1 keeps
its then-current '72' — that's an accurate historical record, not the live count.
docs(wiki): reconcile tactic coverage to 10/12 (was 11)
FAQ/Phase-1/Roadmap claimed TA0011 (C2) was covered by detect_dns_tunneling
(only TA0009 deferred = 11/12), contradicting accuracy-report + README +
DEVPOST + Pages (10/12). Per the conservative scoped-rule standard, both
TA0009 (Collection) and full TA0011 (C2) are Phase-2; detect_dns_tunneling
adds partial DNS-tunneling C2 indicators. Dated v0.6.1 history rows left
as-is.
docs: align wiki with current live-mode scope
Document live mode through ANTHROPIC_API_KEY and --dry-run, remove public zero-cost/OAuth setup claims, and update Claude MCP registration to dart_mcp.server_stdio.
Refresh accuracy evidence counts to 62 reference files and 67 realistic files, clarify that the measured identical result applies to case-01 F-001/F-013, and remove stale 50-file language.
Update operator, SIFT, macOS, roadmap, and Phase 1 pages to the 72-tool surface and current full-suite validation model without stale 35-tool or 75-test guidance.
Fix the Home architecture link and describe external entries as case-study slots instead of fully measured benchmark rows.
QA: git diff --check passed for the wiki.
wiki QA pass: file count 49->50, test count 31->75 (current snapshots only)
post-v0.7.1 QA audit caught two latent drifts:
evidence file count:
- Accuracy.md L64 sample-evidence-realistic '49 files' was correct at the
v0.7.0 evidence-fidelity enrichment time but v0.7.1 added
linux/cron/sample.crontab fixture, raising the count to 50.
measure_accuracy --variant realistic now reports
evidence_files_measured: 50 against ground truth F-001 + F-013, which
matches the actual repo state.
test count:
- Operator-guide.md L55 step-by-step quick-start
- Phase-1.md L50 Empirical-validation 'fresh clone' summary
- Roadmap.md L60 Phase-1 validation summary
- Running-on-macOS.md L57 step header + L134 Apple Silicon notes
all said '31 tests' (the v0.5.2 snapshot baseline). v0.7.1 ships
'75 of 75 tests passing'. updated only the present-tense fresh-clone
claims; the historical v0.5.2 release row in Phase-1.md L109
('-> 31 tests passing') is preserved verbatim as a dated milestone.
wiki: reflect v0.6.1 TA0011 entry — detect_dns_tunneling ships
Three pages had TA0011 (Command-and-Control) listed as 'deferred to
Phase 2' or 'partial coverage'. v0.6.1's detect_dns_tunneling adds:
- Iodine and dnscat2 tool signature detection
- Shannon-entropy on subdomain labels (threshold 3.8)
- Long-label heuristic (>50 chars, near DNS spec max 63)
- Rare query-type flagging (TXT / NULL / CNAME with subdomain)
- Per-parent-domain volume in sliding window
- BIND9 / dnsmasq / generic FQDN-extraction fallback parsers
This opens active TA0011 coverage at the analysis layer. Full PCAP-based
C2 detection is still Phase 2, but the typed MCP surface now meaningfully
covers the tactic via DNS log analysis.
Pages updated: FAQ.md L99, Phase-1.md L36, Roadmap.md L41.
TA0009 Collection remains the single tactic explicitly deferred — that
is collector-side (live memory capture) rather than analysis-side, which
is by design for an architecture that consumes pre-collected evidence.
wiki: naturalize hardcoded counts (Source of Truth lives in README Hero)
Following the same Single-Source-of-Truth cleanup applied to the main
repo: wiki pages no longer hardcode '67 typed functions / 42 native +
25 SIFT adapters / 10 of 12 MITRE / 55 tests / 1182 lines'. Phrasing
shifts to 'the typed MCP surface', 'native + SIFT adapters', 'broad
MITRE enterprise tactic coverage'.
Phase-1.md historical version table preserves period-specific numbers
(v0.3 = 31 functions, v0.4 = 35 native, v0.5 = 60 functions) because
those are historical facts about what shipped on those dates, not
claims about current state.
The canonical exact name set continues to live in
tests/test_mcp_surface.py — the only place that needs editing when a
function is added or removed.
wiki: sweep stale 35-native / 60-total counts to current 42 / 67
16 wiki pages had pre-v0.6.0 numeric references that survived earlier
QA rounds. Surface count was bumped 60 -> 67 in v0.6.0 (six new
supply-chain IOC functions in dart_mcp._v05_supply_chain), and native
count went 35 -> 42, but a number of wiki pages still showed the old
numbers.
Pages corrected:
About-the-name, Architecture-deep-dive,
Architecture-first-vs-prompt-first, Case-PtH-Timestomp, FAQ,
Glossary, Home, Live-mode, MCP-function-catalog, Phase-1,
Roadmap, SIFT-adapter-layer, The-Memex-Bet, _Sidebar, dart-mcp
Phase-1.md version history table preserves the historical numbers
(v0.4 = 35 native, v0.5 = 60 functions) as those are historical
facts, not current state.
MITRE coverage also corrected from 11/12 -> 10/12 (TA0009 Collection
and TA0011 C2 are Phase 2).
wiki(qa-r5): playbook v3 surface — honest framing + line count + v2/v3 default fixes
Pairs with main repo commit 77f2334. Twelve files touched on the wiki side:
- dart-playbook.md ........... v3 'industrialization' section rewritten
with 'data scaffold; runtime activation post-SANS' framing.
Anatomy section flipped from senior-analyst-v2.yaml to
senior-analyst-v3.yaml with v3-additions vs v2-carry-over grouping.
Bundled-playbooks table line count 1135 → 1182. Forking
instructions now point at v3 as source. Operator-notes citation
moved to v3. 'Six principles every senior analyst remembers'
sourced from v3 (inherited from v2). 'See also' adds v3 link.
- Phase-1.md ................. v3 line count 1135 → 1182. 'Playbook
v3.1' release-history row clarified to 'Playbook v3 patch (no
separate v3.1 file)'.
- Roadmap.md ................. v3 line item rewritten with
'YAML data scaffolds' framing + issue #44 link + line count update.
- SIFT-adapter-layer.md ...... 'playbook v3.1' → 'playbook v3'.
- The-Memex-Bet.md ........... 'Playbook v2' → 'Playbook v3 (default)'.
- Case-IP-KVM.md ............. v1 historical context preserved with
a 'now default in v3' annotation appended.
- Case-PtH-Timestomp.md ...... same v1 historical / v3 current-default
annotation pattern.
- Writing-case-studies.md .... v1 reference → v3 default in the
next_call_decisions tuning instruction.
== Why this matters ==
A SANS judge reading dart-playbook.md and then opening
dart_agent/__init__.py would have found the 'HMM operationalized in
the agent' / 'every run self-classifies' / 'triggered when any phase
exits' claims absent from the runtime path. Round 5 fixes that —
documentation and code now agree, with the runtime activation work
explicitly deferred and tracked at issue #44.
No code changes on the wiki side; pure documentation. Main repo's
77f2334 covers the v3 yaml header and the source tree.
wiki(qa-r3): fix '6-test bypass suite' → '7-test' in 3 locations
mcp_bypass added a 7th test in v0.5.2:
test_correlate_timeline_rejects_sql_injection_attempts
Phase-1.md (1 mention) and Roadmap.md (2 mentions) all updated.
Pairs with main repo commit 4cea439 (QA round 3).
wiki(qa-r2): sync 22→31 tests, add v0.5.1/v0.5.2 timeline, v1 playbook line count
Follow-up sync after main repo's v0.5.2 landed (defensive runtime
guards + 3 regression tests). The recent on-main 'wiki — 13 pages
updated' sweep correctly moved every surface to 60 tools, but the
test-count bumped from 22 to 31 in v0.5.2 and a few wiki pages
hadn't caught up.
Counts (5 files):
- FAQ.md '22 / 22 tests passing' → 31 / 31
- Operator-guide.md 'All 22 tests should print OK' → 31
- Phase-1.md '22 of 22 tests passing' → 31 of 31
- Roadmap.md '22 of 22 tests passing' → 31 of 31
- Running-on-macOS.md 'Run all 22 tests' / 'All 22 tests pass on M1/M2/M3' → 31
Timeline (Phase-1.md):
- Added v0.5.1 row (2026-05-03 — Evergreen visuals + full-surface QA)
- Added v0.5.2 row (2026-05-03 — Defensive runtime guards + 31 tests)
- Reordered v0.4.1 / Playbook v3 / v3.1 chronologically so the table
reads top-to-bottom in actual ship order rather than the previous
near-random sequence
Playbook line counts (dart-playbook.md, 2 places):
- senior-analyst-v1.yaml 128 → 133 lines
(v0.5.2 patched the volatile_first phase to reference real registry
tools; the Memory Capture phase grew by 5 lines with the explanatory
rationale comment)
- Annotated the legacy comment so future readers know why v1 still has
a 'memory' phase even though native memory functions aren't on the
v0.5 registry
Phase-1's two intentionally-historical rows preserved verbatim:
- 'v0.4 → 35 native, 20 tests' — release-time state
- 'v0.5 → 60, 22 tests' — release-time state
These are timeline facts, not status claims, so they do NOT bump.
wiki QA pass: synchronize 13 pages to v0.5 reality (60 tools, 22 tests)
Companion to main repo commit 52f975d (v0.5.1 QA pass).
Updated to reflect the v0.5 SIFT adapter layer (35 native + 25 SIFT
= 60 typed read-only MCP tools) and the v0.5 test suite expansion
(20 → 22 cases):
About-the-name.md
'The 35 typed dart-mcp functions cover...' →
'The typed dart-mcp surface (35 native + 25 SIFT Workstation
adapters = 60 functions) covers...'
Test count 20/20 → 22/22 across all references.
Architecture-deep-dive.md
ASCII architecture box: 'dart-mcp 35 typed forensic functions'
→ 'dart-mcp 60 typed forensic functions (35 native + 25 SIFT)'
Architecture-first-vs-prompt-first.md
'The MCP surface is exactly 35 functions, by name' →
'The MCP surface is exactly 60 typed functions, by name (35
native + 25 SIFT Workstation adapters)'
Case-PtH-Timestomp.md (2 references) updated parallel to docs/.
FAQ.md
Question heading: 'Is the MCP surface really exactly 35
functions?' → 'Is the MCP surface really fixed in size?'
Answer body: counts updated to 60 / 22-22.
Glossary.md
dart-mcp definition: 35 → 60.
'For Agentic-DART v0.4: exactly 35' →
'For Agentic-DART v0.5: 60 (35 native + 25 SIFT Workstation
adapters)'
Home.md (TOC)
'the 35 forensic functions, schema, bypass tests' →
'the 60 forensic functions (35 native + 25 SIFT adapters),
schema, bypass tests'
'why the MCP surface is exactly 35 functions, not 28, not 35'
rephrased to avoid count-anchoring.
Live-mode.md (2 references) parallel to docs/.
MCP-function-catalog.md
Page title: '· 35 typed forensic functions'
→ '· 60 typed forensic functions (35 native + 25 SIFT
Workstation adapters)'
Operator-guide.md
'All 20 tests should print OK' → 'All 22 tests should print OK'
Phase-1.md
Body: '35 typed forensic functions' / '20 of 20 tests passing'
counts updated.
Timeline table: ADDED row for 2026-05-02 v0.5 (SIFT Workstation
tool adapter layer → 60 functions, 22 tests passing). v0.4
historic row preserved verbatim.
Roadmap.md
Three references to 35 / 20-20 updated to v0.5 numbers.
Running-on-macOS.md
'Step 3 — Run all 20 tests' → '... 22 tests'
'All 20 tests pass on M1/M2/M3' → 'All 22 tests pass on M1/M2/M3'
The-Memex-Bet.md
'MCP surface (35 typed functions)' →
'MCP surface (60 typed functions: 35 native + 25 SIFT adapters)'
'The 35 functions are not a guideline...' →
'The 60 functions (35 native + 25 SIFT Workstation adapters)
are not a guideline...'
_Sidebar.md
Two TOC labels: '(35 functions)' → '(60 functions: 35 native +
25 SIFT)'
dart-mcp.md
'exposes exactly 35 typed forensic functions' →
'exposes 60 typed forensic functions (35 native + 25 SIFT
Workstation adapters)'
Section heading 'The 35 functions' → 'The 60 functions (35
native + 25 SIFT adapters)'
SIFT-adapter-layer.md
Preserved verbatim — line 18 'its own 35 forensic functions'
is historic context describing the pre-v0.5 state.
wiki: Phase 1 boost — dedicated page + Roadmap expansion
== The problem ==
Phase 1 was visually understated relative to Phases 2/3/4:
Roadmap.md before: P1=35 lines, P2=40, P3=43, P4=24
P1 was the SMALLEST despite being the current focus.
This created the impression that Phase 1 was a thin foundation
followed by ambitious future plans, when in fact Phase 1 IS the
SANS submission and contains essentially all the load-bearing
architecture.
== Fixes ==
1. Roadmap.md Phase 1 section — expanded from 35 to 79 lines:
* NEW intro paragraph explaining what 'agentic DFIR' means
* NEW 'architecturally complete because' bullet block
enumerating the 5 architectural guarantees that propagate
unchanged into Phases 2/3/4
* REORGANIZED 'Done' into 4 subsections: Core architecture,
Cross-platform coverage, Methodology (3 playbook versions),
Validation, Documentation
* NEW 'Remaining for Phase 1' table with status + issue links
* NEW 'What Phase 1 explicitly does NOT do' section (5 items
with deferred-to-Phase explanation, each with issue link)
2. Roadmap.md intro — added at-a-glance phase summary table
showing Phase 1 status (~95% complete, closes 2026-06-15) at
the top of the page
3. NEW dedicated page: Phase-1.md (~140 lines)
* Operator's-eye summary written for someone who lands on
this page directly without reading the full Roadmap
* Sections: in-one-sentence / what ships / what remains /
what we explicitly DO NOT do / versions shipped / where
to go next
* Versions table chronicles every release Apr 28 → May 01
* Cross-links to Memex Bet, Architecture deep dive, Threat
model, Running guides, dart-playbook
4. _Sidebar.md — P1 link updated:
* Was: anchor link to Roadmap#phase-1
* Now: dedicated [Phase-1] page (more prominent)
* Sidebar Roadmap entry now shows '~95% complete' subtitle
5. Home.md — P1 link updated to dedicated page + bullets enriched
with status / closing date / Phase 2/3/4 timing
== Result ==
Roadmap.md after: P1=79 lines, P2=40, P3=43, P4=24
Plus dedicated Phase-1 page accessible from Sidebar + Home
Wiki broken links: 0 maintained
Wiki page count: 26 → 27
wiki: feature Playbook v3 (industrialization release) on dart-playbook page
== dart-playbook.md ==
- Bundled playbook table updated: v3 (1135 lines) is now default,
v2 (845 lines) demoted to 'methodology baseline', v1 kept for demos
- Added new section 'senior-analyst-v3 — industrialization release'
before the existing v2 section, covering:
* Palantir ADS Framework (9-section detection contract)
* MaGMa Use Case Framework (L1/L2/L3 + CMMI 5-level maturity)
* TaHiTI threat hunt cycle (H1/H2/H3)
* Bianco Hunting Maturity Model (HMM 0-4) operationalized
- Reference corpus expanded to 39 with hyperlinks to awesome-soc,
awesome-incident-response, awesome-threat-detection, ThreatHunter-
Playbook, Atomic Red Team, Sigma schema
== Roadmap.md ==
- Added 'Playbook v3 (2026-05-01)' entry to the Done section
immediately after the v2 entry, summarizing the 14 new references
and four framework additions
Wiki broken links: 0 maintained.
wiki(roadmap): add Playbook v2 to Done section + update v1->v2 reference
Senior-analyst playbook v2 (845 lines, 10 phases, Mandiant + Bianco +
DFIR Report + 8 frameworks) shipped 2026-04-30 but Roadmap Done section
still pointed at v1. Updated:
- Done entry adds 'Playbook v2 (2026-04-30)' between v0.4 and the
rest of the Phase 1 deliverables
- Top of Roadmap now reads 'senior-analyst-v2.yaml default;
v1.yaml legacy' instead of just v1
wiki(roadmap): record v0.4 Linux+macOS expansion in Done
wiki: comprehensive sync 31 → 35 across all pages
v0.4 raised the function count from 31 to 35. Wiki was tracking
old number on multiple pages:
About-the-name.md 'existing 31 functions stay' → 35
Architecture-deep-dive.md 'the 31 typed' → 35
Architecture-first-vs-prompt-first.md '31 functions, by name' → 35
FAQ.md 'is the surface really exactly 31?' → 35
Home.md 'the 31 forensic functions' → 35
Operator-guide.md '31' → '35'
Roadmap.md '31 typed forensic functions' → 35
Threat-model.md (no 31 references — already clean)
dart-mcp.md 'exactly 31 typed' → '35'
MCP-function-catalog.md (header was already 35)
Roadmap also gets a 'v0.4 (2026-04-30)' entry in the Done list to
record the Linux+macOS expansion.
feat: full wiki — Architecture / Operator / Threat model / Roadmap
Five pages, sidebar, written as long-form complement to the README:
Home landing + project status
_Sidebar navigation visible on every page
Architecture-deep-dive why the architecture is shaped this way
Operator-guide run dart-agent on a real SIFT case
Threat-model honest scope of the read-only MCP boundary
Roadmap phase 1-4, anti-roadmap (what we refuse)
Same voice as the README. No marketing language, no overclaim.
The threat model in particular is deliberately honest about what
the architecture does NOT defend against.