wiki(qa-r12): kill 11/12 MITRE + UUID4 audit_id + 5KB audit + rm bypass hallucinations
== Round 12 of QA — FAQ / Glossary / Comparison deep verification ==
FAQ.md, Glossary.md, Comparison.md were the 3 'reference' wiki pages
that earlier rounds touched only at surface level. Round 12 went
through every quantitative/categorical claim on each page and
measured against actual code/runtime behavior.
== Defects fixed ==
### FAQ.md — audit log size claim 5-8x over
Advertised: '~3-5 KB per MCP call. 25-iteration run ~120-200 KB'
Measured: ~568 bytes per call (1704 bytes / 3 entries on the
bundled find-evil-ref-01 demo). 25-iter projection
~13 KB.
The advertised numbers were either pre-v0.5 estimates from when
audit entries carried full output bodies, or just a guess. Either
way, current reality is 5-8x smaller, which actually strengthens
the architectural claim ('audit log is verifiable in one pass on
any laptop'). Fixed to '500-700 bytes per MCP call' and '12-18 KB'
for the 25-iter projection.
### FAQ.md — '11/12 MITRE ATT&CK enterprise tactics' over-claim
Measured by walking dart-mcp function names against MITRE tactic
buckets: 10/12 covered. The two gaps are TA0009 (Collection) and
TA0011 (Command and Control). C2 was already disclosed in the FAQ
'What would you change with more time?' answer ('PCAP analysis for
full TA0011 coverage'); Collection wasn't disclosed.
Fixed the headline metric to '10/12' with explicit TA list and a
link to Phase-1 for the gap analysis. The honest count makes the
Phase-2 roadmap motivation crisper.
### Glossary.md — 'Audit ID — UUID4' (round-10 same defect, different page)
Round 10 fixed wiki/dart-audit.md (UUID4 → 8-char hex) but Glossary
carried the same wrong definition independently. Same code-vs-doc
mismatch: secrets.token_hex(4) produces 8-character hex, never UUID4.
Fixed. Also corrected the next sentence — it claimed 'the serializer
refuses to emit findings'. There is no serializer.py file (round-10
defect class). The actual gate is the finding emitter inside
DeterministicAnalyst (in dart_agent/__init__.py). Phrased it that
way now.
### Glossary.md — 'Bypass test — execute_shell, eval, rm, etc.'
rm is NOT in the bypass test's forbidden list. The actual list
asserted by tests/test_mcp_bypass.py is:
execute_shell, write_file, mount, umount, eval, exec_python,
network_egress, delete_file, system, spawn_process, kill_process
rm was a plausible-looking guess that doesn't appear in the code.
Replaced with the actual full list, which is more concrete and
more impressive than the 'execute_shell, eval, rm, etc.' summary.
### Comparison.md — verified clean
Walked every external URL (Velociraptor docs, Plaso, Eric
Zimmerman's site, SigmaHQ) — all 200. Walked every cross-reference
to phase-2/phase-3 packages (dart-synth #23, dart-responder #26)
— both have tracking issues. The TL;DR matrix entries were
spot-checked against actual capabilities and stand. No fixes
needed.
== Verification methodology for this round ==
1. Read each claim
2. If quantitative: measure with a script (audit log size,
MITRE tactic count, response shape)
3. If categorical: read the cited code/test and confirm the
claim is what the code actually does
4. If external: curl with 10s timeout and assert 200
5. Fix any mismatch; verify the fix doesn't introduce a new one
== Verified ==
- 31/31 pytest green (zero regression — wiki-only changes, no
code touched)
- Bypass test list in Glossary now matches tests/test_mcp_bypass.py
line 29-30 + line 127 'negative' set
- Audit log size in FAQ now matches measured demo run output
- MITRE tactic count in FAQ now matches the actual function-name
coverage measurement