v0.5.4 — NIST CFReDS Hacking Case integration
NIST CFReDS Hacking Case integration — external benchmark validation
This release adds external benchmark validation against the NIST CFReDS "Hacking Case" (Greg Schardt / Mr. Evil) — a community-trusted forensic dataset with published ground-truth answers.
Highlights
- 🆕 New primitive:
parse_registry_hive(general native registry hive parser) - 🆕 New case study:
case-08(CFReDS Hacking Case full traversal) - 📊 3-tier accuracy evaluation now documented in
docs/accuracy-report.md:
| Tier | Dataset | recall (v0.5.4) |
|---|---|---|
| 1 | Synthetic reference (CI baseline) | 1.000 / FPR=0.000 |
| 2 | Noise-injected realistic (~1:30 IOC:benign) | 1.000 / FPR=0.000 |
| 3 | NIST CFReDS Hacking Case | 0.50 strict / 0.80 lenient |
- 🚀 5× CFReDS recall jump from v0.5.3 (0.10 / 0.40) after
parse_registry_hiveshipped — unlocked 4 findings at once (closes #52) - ✅ 43/43 tests pass on Python 3.10/3.11/3.12/3.13 matrix
- 📦 61 MCP tools (36 native + 25 SIFT adapters), all read-only
Why this matters
Synthetic recall=1.000 by itself looks too good to be true. v0.5.4 lets us state honestly that external benchmark recall is 0.50/0.80, and trace the remaining gap to specific paradigm differences — turning "registry parsing is on the wishlist" into "registry parsing unlocks 4 measured findings, ship next."
What's next (Phase 2)
- #53 IE6
index.datparser - #54 Recycle Bin
INFO2parser - #55 Bundled YARA rule library
- #47 Additional external datasets (Ali Hadi, DFRWS, BOTS)
Reference
- Submission target: SANS FIND EVIL! 2026 (findevil.devpost.com)
- Deadline: 2026-06-15 23:45 EDT (JST 2026-06-16 12:45 PM)
- Accuracy methodology: docs/accuracy-report.md