wiki: bring release log to v1.2.0, document model-aware auth + live Sigma pack, add self-learning loop design
wiki: correct tool count to live 73 (48 native + 25 SIFT); drop v1.0.2 version pin
The current-surface counts were stale: 72 (47 native) -> 73 (48 native) after the
Sigma matcher tool landed. Fixed in Glossary, Live-mode, Phase-1 (the live-surface
line), and Roadmap. The Glossary's 'As of v1.0.2' version pin is dropped so the
count needn't carry a release number. The Phase-1 changelog row for v0.7.1 keeps
its then-current '72' — that's an accurate historical record, not the live count.
revert(wiki): document ANTHROPIC_API_KEY-only auth, drop 'claude login'
Reusing the local Claude Code login triggers a refresh-token rotation that
logs the Claude Code client out; the wiki now documents the API-key path only.
Keeps the install --full and demo/output structure.
docs(wiki): simplify SIFT/Live-mode setup and surface 'claude login'
Running-on-SIFT: install.sh --full one-shot, separate Authenticate step (API
key or claude login), demo+run, and out/<tier>/<case>/<timestamp>/ outputs.
Live-mode: credentials are API key or claude login; run_eval.py is the entry.
docs: align wiki with current live-mode scope
Document live mode through ANTHROPIC_API_KEY and --dry-run, remove public zero-cost/OAuth setup claims, and update Claude MCP registration to dart_mcp.server_stdio.
Refresh accuracy evidence counts to 62 reference files and 67 realistic files, clarify that the measured identical result applies to case-01 F-001/F-013, and remove stale 50-file language.
Update operator, SIFT, macOS, roadmap, and Phase 1 pages to the 72-tool surface and current full-suite validation model without stale 35-tool or 75-test guidance.
Fix the Home architecture link and describe external entries as case-study slots instead of fully measured benchmark rows.
QA: git diff --check passed for the wiki.
docs(wiki): fix remaining stale 'the 60' (Live-mode) and sonnet-4 model name (Accuracy)
docs(wiki): fix stale 35/60 surface counts and default model name
Live registry is 47 native + 25 SIFT = 72; code default is claude-haiku-4-5
(sonnet-4-6 is the --model higher-fidelity override).
- dart-agent.md: default claude-sonnet-4 -> claude-haiku-4-5; 35-function -> 47.
- Live-mode.md: default sonnet-4 -> haiku-4-5; 60 typed -> 72; cost-example
model name sonnet-4 -> sonnet-4-6 with the haiku default made explicit.
- SIFT-adapter-layer.md: 35 forensic functions -> 47.
(Phase-1.md v0.4/v0.5 timeline rows keep 35/60 as point-in-time history.)
wiki: naturalize hardcoded counts (Source of Truth lives in README Hero)
Following the same Single-Source-of-Truth cleanup applied to the main
repo: wiki pages no longer hardcode '67 typed functions / 42 native +
25 SIFT adapters / 10 of 12 MITRE / 55 tests / 1182 lines'. Phrasing
shifts to 'the typed MCP surface', 'native + SIFT adapters', 'broad
MITRE enterprise tactic coverage'.
Phase-1.md historical version table preserves period-specific numbers
(v0.3 = 31 functions, v0.4 = 35 native, v0.5 = 60 functions) because
those are historical facts about what shipped on those dates, not
claims about current state.
The canonical exact name set continues to live in
tests/test_mcp_surface.py — the only place that needs editing when a
function is added or removed.
wiki: sweep stale 35-native / 60-total counts to current 42 / 67
16 wiki pages had pre-v0.6.0 numeric references that survived earlier
QA rounds. Surface count was bumped 60 -> 67 in v0.6.0 (six new
supply-chain IOC functions in dart_mcp._v05_supply_chain), and native
count went 35 -> 42, but a number of wiki pages still showed the old
numbers.
Pages corrected:
About-the-name, Architecture-deep-dive,
Architecture-first-vs-prompt-first, Case-PtH-Timestomp, FAQ,
Glossary, Home, Live-mode, MCP-function-catalog, Phase-1,
Roadmap, SIFT-adapter-layer, The-Memex-Bet, _Sidebar, dart-mcp
Phase-1.md version history table preserves the historical numbers
(v0.4 = 35 native, v0.5 = 60 functions) as those are historical
facts, not current state.
MITRE coverage also corrected from 11/12 -> 10/12 (TA0009 Collection
and TA0011 C2 are Phase 2).
wiki(qa-r10): kill function-signature + file-existence hallucinations across 6 pages
Pairs with main repo commit 8a1917b. Round 10 was a 'judge follows
every advertised command line by line' pass — surfaced 6 distinct
hallucinations a SANS judge would have hit if they tried to
reproduce anything from the wiki.
== Defects fixed ==
### Accuracy.md — broken script reference
Advertised 'bash scripts/run-accuracy-suite.sh'. That script
doesn't exist and never has. The actual reproducer is
'python3 scripts/measure_accuracy.py' with the standard
PYTHONPATH export. A judge running the README's accuracy claim
through this page would have hit:
bash: scripts/run-accuracy-suite.sh: No such file or directory
Replaced with the real measure_accuracy.py invocation, which
was verified end-to-end (recall=1.0, FPR=0.0,
hallucination_count=0, evidence_integrity_preserved=true).
### Case-PtH-Timestomp.md — 3 function-signature errors
All three are the same class of mistake — the wiki cited
positional/keyword args that don't exist on the actual MCP tools:
'dart-agent --hunt' → 'python3 -m dart_agent --case ... --out ... --mode deterministic'
'get_process_tree(host=...)' → 'get_process_tree(process_csv=...)'
'analyze_windows_logons(host=...)' → 'analyze_windows_logons(security_events_json=...)'
'parse_prefetch(target=...)' → 'parse_prefetch(prefetch_path=...)'
These same mistakes live in docs/case-pth-timestomp.md (fixed
in the paired repo commit). Verified by pulling live
inputSchema.required from list_tools() for each tool.
### dart-agent.md — run_loop() and 4 fictional files
The page advertised:
- 'run_loop() in dart_agent/src/dart_agent/__init__.py'
- A file inventory citing loop.py, decision.py, hypothesis.py,
serializer.py — none of which exist.
The actual structure is __init__.py + __main__.py + live.py.
The senior-analyst loop is the DeterministicAnalyst class's
.run() method (4 internal phases: _phase_timeline →
_phase_hypothesis → _phase_validate_usb → _phase_finalize).
Rewrote both the 'What it owns' bullet and the Files block to
match reality. Added an explanatory note that the agent is
small enough to keep its control flow in __init__.py.
### dart-audit.md — 3 hallucinations in one example
The advertised AuditLogger.log() example used:
- outputs={...} — actual kwarg is 'output' (singular)
- cpu_ms=42 — no such kwarg
- bytes_read=1024 — no such kwarg
Real signature is:
log(tool_name, inputs, output, iteration, token_count_in,
token_count_out, finding_ids=None)
Same page advertised audit_id type as 'UUID4' — actual is
8-character hex (secrets.token_hex(4)). Same page advertised
'output/<run_id>/<audit_id>.json' as the per-call output
storage location — that directory layout doesn't exist; outputs
are referenced by SHA-256 digest only in deterministic mode.
Fixed all three. Verified the corrected example works as a
copy-paste — wrote a test audit log, verified the chain, ran
CLI (verify + trace) all green.
### dart-corr.md — serializer.py hallucination
Page claimed UNRESOLVED contradictions are blocked by 'the
serializer (dart_agent/serializer.py)'. There is no
serializer.py file. The blocking happens inside
DeterministicAnalyst's finding emission path in __init__.py.
Rewrote the sentence to point at the real location.
### Live-mode.md — 2 hallucinations in the headline example
- '--evidence /mnt/case-evidence' — no such CLI flag. Real
pattern is 'export DART_EVIDENCE_ROOT=/path' before invoking
the agent.
- 'Claude sees exactly 35 typed forensic functions' — should
be 60 (35 native + 25 SIFT adapters). Stale from the v0.4
surface, missed in earlier rounds because Live-mode.md
wasn't part of the surface-count grep targets.
Fixed both. Added an explicit '(Add --dry-run to use a scripted
mock Claude with no API key)' line for CI / offline reproduction.
== Verification approach ==
For each defect:
1. Read the wiki claim
2. Pulled the actual code/schema (inputSchema, argparse output,
filesystem ls, AuditLogger signature via inspect)
3. Compared advertised ↔ actual
4. Fixed the wiki, then re-verified the fixed example by either
running it (Accuracy.md, dart-audit.md) or by checking
it would no longer raise on a copy-paste
== Pattern internalised ==
Round 9 caught output-key hallucinations in code examples. Round 10
caught argument-name hallucinations and file-path hallucinations
in tutorial prose — a different surface that print-output dry-runs
don't cover. Going forward, any wiki/docs page that references a
function by name + signature should be diff-checked against the
live inputSchema.required list whenever the underlying code changes.
wiki QA pass: synchronize 13 pages to v0.5 reality (60 tools, 22 tests)
Companion to main repo commit 52f975d (v0.5.1 QA pass).
Updated to reflect the v0.5 SIFT adapter layer (35 native + 25 SIFT
= 60 typed read-only MCP tools) and the v0.5 test suite expansion
(20 → 22 cases):
About-the-name.md
'The 35 typed dart-mcp functions cover...' →
'The typed dart-mcp surface (35 native + 25 SIFT Workstation
adapters = 60 functions) covers...'
Test count 20/20 → 22/22 across all references.
Architecture-deep-dive.md
ASCII architecture box: 'dart-mcp 35 typed forensic functions'
→ 'dart-mcp 60 typed forensic functions (35 native + 25 SIFT)'
Architecture-first-vs-prompt-first.md
'The MCP surface is exactly 35 functions, by name' →
'The MCP surface is exactly 60 typed functions, by name (35
native + 25 SIFT Workstation adapters)'
Case-PtH-Timestomp.md (2 references) updated parallel to docs/.
FAQ.md
Question heading: 'Is the MCP surface really exactly 35
functions?' → 'Is the MCP surface really fixed in size?'
Answer body: counts updated to 60 / 22-22.
Glossary.md
dart-mcp definition: 35 → 60.
'For Agentic-DART v0.4: exactly 35' →
'For Agentic-DART v0.5: 60 (35 native + 25 SIFT Workstation
adapters)'
Home.md (TOC)
'the 35 forensic functions, schema, bypass tests' →
'the 60 forensic functions (35 native + 25 SIFT adapters),
schema, bypass tests'
'why the MCP surface is exactly 35 functions, not 28, not 35'
rephrased to avoid count-anchoring.
Live-mode.md (2 references) parallel to docs/.
MCP-function-catalog.md
Page title: '· 35 typed forensic functions'
→ '· 60 typed forensic functions (35 native + 25 SIFT
Workstation adapters)'
Operator-guide.md
'All 20 tests should print OK' → 'All 22 tests should print OK'
Phase-1.md
Body: '35 typed forensic functions' / '20 of 20 tests passing'
counts updated.
Timeline table: ADDED row for 2026-05-02 v0.5 (SIFT Workstation
tool adapter layer → 60 functions, 22 tests passing). v0.4
historic row preserved verbatim.
Roadmap.md
Three references to 35 / 20-20 updated to v0.5 numbers.
Running-on-macOS.md
'Step 3 — Run all 20 tests' → '... 22 tests'
'All 20 tests pass on M1/M2/M3' → 'All 22 tests pass on M1/M2/M3'
The-Memex-Bet.md
'MCP surface (35 typed functions)' →
'MCP surface (60 typed functions: 35 native + 25 SIFT adapters)'
'The 35 functions are not a guideline...' →
'The 60 functions (35 native + 25 SIFT Workstation adapters)
are not a guideline...'
_Sidebar.md
Two TOC labels: '(35 functions)' → '(60 functions: 35 native +
25 SIFT)'
dart-mcp.md
'exposes exactly 35 typed forensic functions' →
'exposes 60 typed forensic functions (35 native + 25 SIFT
Workstation adapters)'
Section heading 'The 35 functions' → 'The 60 functions (35
native + 25 SIFT adapters)'
SIFT-adapter-layer.md
Preserved verbatim — line 18 'its own 35 forensic functions'
is historic context describing the pre-v0.5 state.
wiki: add 12 missing pages, fix all 32 broken links
The wiki sidebar and Home page referenced 13 pages that didn't exist,
producing the GitHub 'create new page' UI when clicked. Adds:
Concepts:
Glossary — DFIR / agent / MCP terms
The 5 packages:
dart-agent — senior-analyst wrapper loop
dart-corr — cross-artifact correlation engine
dart-audit — SHA-256 chained audit log
dart-playbook — YAML sequencing rules
(dart-mcp already existed)
Reference:
Comparison — vs Velociraptor / Plaso / EZ tools / SOAR / vanilla LLMs
Running it:
Running-on-SIFT — SANS SIFT VM 5-minute setup
Running-on-macOS — macOS-specific mount conventions
Live-mode — real Claude API + MCP stdio integration
Case studies:
Case-PtH-Timestomp — Pass-the-Hash + timestomp pre-existence
Case-IP-KVM — IP-KVM remote-hands insider scenario
Writing-case-studies — guide for contributing new case studies
Project:
Accuracy — reproducible accuracy methodology + numbers
The Roadmap-Phase-2/3/4 links in Home.md were repointed to the
existing Roadmap page's anchors (those were never separate pages).
The Contributing link in dart-mcp.md now points to CONTRIBUTING.md
in the main repo.
_Sidebar.md restructured into 6 named sections so the 25-page wiki
is navigable. Final broken-link count: 0.