Release v0.5.4 — NIST CFReDS Hacking Case integration · Juwon1405/agentic-dart

NIST CFReDS Hacking Case integration — external benchmark validation

This release adds external benchmark validation against the NIST CFReDS "Hacking Case" (Greg Schardt / Mr. Evil) — a community-trusted forensic dataset with published ground-truth answers.

Highlights

🆕 New primitive: parse_registry_hive (general native registry hive parser)
🆕 New case study: case-08 (CFReDS Hacking Case full traversal)
📊 3-tier accuracy evaluation now documented in docs/accuracy-report.md:

Tier	Dataset	recall (v0.5.4)
1	Synthetic reference (CI baseline)	1.000 / FPR=0.000
2	Noise-injected realistic (~1:30 IOC:benign)	1.000 / FPR=0.000
3	NIST CFReDS Hacking Case	0.50 strict / 0.80 lenient

🚀 5× CFReDS recall jump from v0.5.3 (0.10 / 0.40) after parse_registry_hive shipped — unlocked 4 findings at once (closes #52)
✅ 43/43 tests pass on Python 3.10/3.11/3.12/3.13 matrix
📦 61 MCP tools (36 native + 25 SIFT adapters), all read-only

Why this matters

Synthetic recall=1.000 by itself looks too good to be true. v0.5.4 lets us state honestly that external benchmark recall is 0.50/0.80, and trace the remaining gap to specific paradigm differences — turning "registry parsing is on the wishlist" into "registry parsing unlocks 4 measured findings, ship next."

What's next (Phase 2)

#53 IE6 index.dat parser
#54 Recycle Bin INFO2 parser
#55 Bundled YARA rule library
#47 Additional external datasets (Ali Hadi, DFRWS, BOTS)

Reference

Submission target: SANS FIND EVIL! 2026 (findevil.devpost.com)
Deadline: 2026-06-15 23:45 EDT (JST 2026-06-16 12:45 PM)
Accuracy methodology: docs/accuracy-report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.4 — NIST CFReDS Hacking Case integration

Choose a tag to compare

Sorry, something went wrong.