Releases · Juwon1405/agentic-dart

15 Jun 21:07

v1.2.0

07ed64f

v1.2.0 — SANS Find Evil! 2026 Latest

Latest

Agentic-DART v1.2.0 — SANS Find Evil! 2026 submission build.

Autonomous DFIR agent on the SANS SIFT Workstation. The language model analyzes evidence in read-only mode and seals every inference into a SHA-256 audit chain. 73 typed, read-only MCP tools (48 native pure-Python + 25 SIFT-tool adapters) — destructive operations are absent from the tool registry and CI-enforced, so even a fully successful prompt injection has no destructive function to call. Architecture-first, not prompt-first.

This release

Sigma detection pack v2 — 11 rules (DCSync, Golden Ticket, ransomware shadow-copy deletion, web-shell creation, local account creation, Kerberoasting, AS-REP roasting, HID insertion, remote exec, event-log clearing).
Model-aware authentication — Haiku resolves to an OAuth subscription token; Sonnet/Opus to a metered API key. New dart-auth command.
Persistent install aliases — dart-pull, dart-auth.
Unified per-case ledger — append-only, per-case timestamps.
case-02 ground-truth fix — Hadi Challenge #1 is Windows XAMPP, not Linux; recall 0% -> 60%.

142 tests passing. Full history in CHANGELOG.md.

Assets 2

15 Jun 05:36

Juwon1405

v1.1.0

18d3519

v1.1.0 — Stable release (SANS FIND EVIL! 2026)

Agentic-DART is an autonomous DFIR agent on the SANS SIFT Workstation. It runs a senior-analyst reasoning loop over a custom MCP server of 73 typed, read-only forensic tools (48 native pure-Python + 25 SIFT adapters) and produces a courtroom-traceable report. Evidence integrity is enforced by the shape of the system — destructive operations (execute_shell, write_file, mount) simply do not exist on the wire — not by asking the model to behave.

This is the first genuinely stable release, verified end-to-end from a clean clone.

Why 1.1.0 supersedes everything before it

Earlier tags that claimed "stable" did not actually run clean in a fresh environment. 1.1.0 is the result of a full correctness pass — install, benchmark, scoring, external disk-image handling, and the test suite all fixed and re-verified. The prior 1.0.2 "stable" tag has been removed to avoid confusion.

Tests are green from anywhere. 156 tests pass. The Phase-2 placeholder suite is now explicitly skipped (not failed) wherever it's collected, so pytest is clean whether you run it from the repo root or any subdirectory.
No version is pinned in docs or tests. The release number lives in pyproject.toml only; READMEs, the wiki, the site, and the version test were all genericized, so a future bump touches one file.

Highlights

73 typed read-only MCP tools — 48 native forensic functions + 25 SIFT adapters (Volatility 3, MFTECmd, EvtxECmd, PECmd, RECmd, AmcacheParser, YARA, Plaso), plus a versioned Sigma detection-rule matcher.
11 case studies / 99 ground-truth findings across two tiers: 8 internal self-evaluation cases (ready evidence) + 3 external full-disk public images (NIST CFReDS Hacking Case, Ali Hadi DFIR Challenge #1, Digital Corpora M57).
External is a first-class tier. Full-disk images are adapted via ewfmount + mmls + tsk_recover (partition-offset aware) into an evidence tree, then analyzed. Run the tiers as separate processes — scripts.eval.demo / scripts.eval.self / scripts.eval.external — each independently debuggable; an append-only docs/benchmarks/HISTORY.md records every self/external run.
Linux-only host, hardened installer — refuses to run under sudo, stages the full toolchain, verifies it, and offers to fetch the external images at the end.

Requirements / dependencies

Host OS — Linux only. Verified on the SANS SIFT Workstation (Ubuntu 22.04); RHEL / Rocky / AlmaLinux 8+ and Fedora work via dnf/yum. macOS and Windows are not supported as the host — the Plaso / libyal toolchain does not build cleanly there. Default shell is bash.

Requirement	Version	Verified
Python	3.10+ (CI: 3.10 – 3.13)	3.10, 3.12
OS	Ubuntu 22.04 (SANS SIFT) primary; RHEL/Rocky/Alma 8+, Fedora	SIFT

Python libraries (lower bounds; installed by scripts/install.sh):

Library	Minimum	Role
`anthropic`	≥ 0.40	Claude API client (live mode)
`mcp`	≥ 1.0	MCP client/server transport
`duckdb`	≥ 1.5.3, < 2.0	in-memory correlation store
`python-registry`	≥ 1.3	Windows registry hive parsing
`PyYAML`	≥ 6.0	playbook / Sigma rule loading
`requests`	≥ 2.25	dataset download (benchmarks)

External forensic tools (staged by the installer; SIFT ships most): sleuthkit (mmls, tsk_recover), ewfmount (ewf-tools / libewf), Volatility 3, Plaso (log2timeline.py, psort.py), EZ Tools, YARA, Velociraptor.

Install

git clone https://github.com/Juwon1405/agentic-dart.git
cd agentic-dart
bash scripts/install.sh          # Linux only; refuses sudo. Offers to fetch external images (~13 GB).
export ANTHROPIC_API_KEY='sk-ant-...'
python3 -m scripts.eval.demo                                           # deterministic, no key
python3 -m scripts.eval.self     --models claude-haiku-4-5-20251001   # 8 bundled cases
python3 -m scripts.eval.external --models claude-haiku-4-5-20251001   # public disk images

License: MIT. SANS FIND EVIL! 2026 submission.

Assets 2

11 Jun 00:01

Juwon1405

v1.0.1

7b1c240

v1.0.1 — Platform overhaul: run_eval CLI, tiered case layout, OS-aware installer

Highlights

run_eval.py — the new primary user-facing command. Live mode only: fails fast with an actionable message when ANTHROPIC_API_KEY is unset; discovers cases dynamically from both tiers; writes out/<tier>/<case-id>/<timestamp>/{findings,report,summary}.json.
Tiered, self-contained case studies — examples/case-studies/self-evaluation/case-01..08 and external-evaluation/case-01..03 (NIST CFReDS, Ali Hadi, Digital Corpora M57-Patents/Jo). Index-only folder names, truth.json per case, canonical bundled evidence at self-evaluation/case-01/evidence_root/. The public --variant selector is gone.
OS-aware installer — scripts/install.sh --os auto|ubuntu|centos|macos, venv-first, clones+installs the collector adapter, optional SIFT (--install-sift, via cast) and Eric Zimmerman Tools (--install-eztools, .NET 9 builds, URLs validated before download). Plus root requirements.txt and an API-free scripts/healthcheck.py.
Downloader hardening — browser-like headers on every request (incl. resumed range requests), pure-Python streaming split-image reassembly, --dry-run / --check-urls.
Hardening (earlier in this line) — MCP call_tool() schema validation before dispatch, Plaso outputs isolated to DART_DERIVED_ROOT, benchmark summary no longer fabricates rows, hallucination scoring requires resolvable audit IDs.

Measured QA at this tag

Full pytest suite green (tests/ + dart_corr/tests/); benchmark-integrity and CI workflows green on this commit.
scripts/measure_accuracy.py: recall 1.0, FPR 0.0, hallucinations 0, evidence integrity preserved (67 files).
validate_ground_truth.py: FAIL 0 (6 documented external-tier warnings).

Known limitations

The adapter's --source image (Velociraptor dead-disk) path is covered by mocked end-to-end tests and has not been exercised against a live Velociraptor binary in CI.
External-tier evaluations require a one-time multi-GB dataset download; no external-dataset accuracy numbers are claimed at this tag.

Full details: CHANGELOG.md

Assets 2

16 May 10:36

Juwon1405

v0.7.1

f64ae00

v0.7.1 — Linux DFIR triplet + ground-truth function reconciliation

Highlights

Closed 6 of 10 missing-function gaps identified by post-release MCP surface audit against the 11-case ground-truth library.

Added — Linux DFIR triplet (2 new MCP functions)

parse_linux_text_log — parses Apache/nginx combined access logs, syslog (RFC3164), /var/log/messages, /var/log/secure, and auditd dispatcher text mode. Returns parsed records plus suspicious-content tags across 10 patterns covering T1003.008 shadow read, T1190 path traversal + SQLi, T1505.003 webshell patterns, T1105 remote download to shell, T1071.001 netcat, T1046 scanner invocation, T1222.002 dangerous chmod, T1059.004 reverse-shell oneliners, T1213.002 database credential use, plus a scanner-user-agent meta-rule (T1595.002).
parse_linux_shell_history — parses bash/zsh history with HISTTIMEFORMAT awareness (epoch comment lines). Detects 11 attacker patterns including T1098.004 SSH key persistence, T1070.003 history clear, T1053.003 cron mutation, T1027 base64 obfuscation.

(parse_linux_cron_jobs already existed in v0.6.1 — exposed via evidence_root + flagged_only schema. Not duplicated.)

Changed — case-09 ground-truth function names reconciled

Pre-v0.7.1 case-09 (Ali Hadi Challenge 1) referenced three functions that did not exist in the MCP surface. Now mapped to actual capabilities:

Finding	Pre-v0.7.1 (missing)	v0.7.1 (implemented)
F-HADI1-002	`detect_web_shell_indicators`	`detect_webshell`
F-HADI1-007	`enumerate_filesystem_anomalies`	`parse_linux_text_log`
F-HADI1-009	`detect_log_tampering_indicators`	`detect_defense_evasion`

Ground-truth coverage post-reconciliation

Of 36 expected functions referenced across all 11 cases:

32 implemented (89%)
4 remain as tracked Phase 2 gaps: parse_recycle_bin_metadata (#54), parse_ie_history (#53), parse_outlook_dbx (#55), parse_usn_journal (post-release issue)

Added — test coverage

tests/test_parse_linux_dfir.py — 7 new tests covering auditd dispatcher format, http access combined format (Nikto UA + path traversal + shadow read), HISTTIMEFORMAT epoch parsing, per-hit required-keys contract, missing-file error contract, path traversal rejection. Total suite: 75 green (up from 68).

Added — sample evidence

examples/sample-evidence-realistic/linux/cron/sample.crontab — fixture exercising v0.6.1 parse_linux_cron_jobs with 4 suspicious patterns (remote-pipe-shell, exec from world-writable, reverse-shell oneliner, base64 obfuscation) plus benign baseline jobs.

Post-release counts

Surface	Value
Native MCP functions	72 (was 67)
Total ground-truth findings	99
Ground-truth coverage (implemented / expected)	32 / 36 (89%)
Bundled case studies	11
Unit tests	75 green (was 68)

Verification

recall:                       1.000   (F-001 + F-013)
false_positive_rate:          0.000
hallucination_count:          0
evidence_integrity_preserved: true
self_correction_observed:     true

Compare: v0.7.0...v0.7.1

Assets 2

16 May 07:52

Juwon1405

v0.7.0

7c13365

v0.7.0 — case-11 supply-chain/ESC8 + evidence schema fidelity

Highlights

Two major additions targeted at SANS FIND EVIL! 2026 submission.

case-11 supply-chain entry → AD certificate-services abuse

examples/case-studies/case-11-supplychain-ad-zeroday/ ships 12 ground-truth findings reproduced deterministically by seven MCP functions on bundled evidence. The chain:

Trojanized signed vendor binary (SolarWinds SUNBURST class entry, T1195.002)
Low-and-slow C2 beaconing with calibrated sub-SIEM-threshold cadence
PetitPotam (CVE-2021-36942) coercion of DC01$ (T1187)
ntlmrelayx --adcs relay to CA01 Web Enrollment endpoint (T1557.001)
Certificate issued for DC01$ under DomainController template (ESC8, T1649)
Rubeus asktgt /certificate + s4u /impersonateuser:domadmin (T1550.003)
4624 type-9 NewCredentials on DC (S4U2self DA impersonation)
PsExec / wmiexec overpass-the-hash lateral to DC, file server, endpoint (T1021.002, T1021.006, T1550.002)
ntdsutil ifm create full (T1003.003) + mimikatz dcsync /user:krbtgt (T1003.006)
AdminSDHolder ACL modification (T1098.005 — self-healing privileged persistence via SDProp)
Golden Ticket forged with KRBTGT hash (T1558.001) used next morning
Three sequential wevtutil cl + EventID 1102 self-emission (T1070.001)

Chain composed entirely from public references (CISA AA20-352A, SpecterOps "Certified Pre-Owned", MS-EFSRPC CVE, MITRE T1098.005/T1003.006/T1558.001). All hosts/IPs/domain (ent.example.local)/SIDs are RFC1918/RFC5737/RFC2606 synthetic with zero cross-reference to any real environment.

Every sample evidence file enriched to native forensic-tool dump fidelity

Prior versions of sample-evidence-realistic/ files were too sparse to look like genuine forensic-tool captures. This release replaces every file with the on-disk schema produced by the corresponding real tool — without breaking any detection.

Surface	Now matches output of
Windows event logs	EvtxECmd (full EVTX field set, ms timestamps, consistent SIDs)
Network flows	Zeek conn.log (uid, ja3, ja3s, tls_version, http_method, user_agent)
$MFT	MFTECmd 25-column (both 0x10 SI and 0x30 FN timestamps, USN, LSN, SecurityId)
Shellbags	SBECmd (BagPath, NodeSlot, AbsolutePath, LastInteracted, HasExplored)
Run keys / services / shimcache	RECmd / AppCompatCacheParser
Prefetch	PECmd JSON (Volumes, FilesLoaded, run times)
Chrome History	Hindsight (transition, danger_type, opened, referrer, etag)
Linux journal	systemd-journald (__REALTIME_TIMESTAMP, _BOOT_ID, _MACHINE_ID, _AUDIT_LOGINUID)
Linux auditd	SYSCALL+EXECVE+CWD+PATH+PROCTITLE+USER_LOGIN+CRED_ACQ+USER_CMD+USER_AUTH
macOS unified log	`log show` (thread, type, subsystem, category, sender)
macOS FSEvents	FSEventsParser (id, mask, flags, inode, node_id, sha256_at_event)
Memory image info	winpmem metadata (kernel_base, KDBG offset, physical layout, yara hits)

Fixed

setupapi.dev.log was missing from realistic variant — agent F-013 IP-KVM detection silently failed and dropped recall to 0.5 on --variant realistic. Restored with full setupapi log fidelity around the IP-KVM (VID 0557 PID 2419 ATEN) signal.

Post-release counts

Surface	Value
Native MCP functions	67
Total ground-truth findings	99
↳ Layer 1 (8 cases: 01–07 + 11)	69
↳ Layer 2 (3 cases: 08 CFReDS, 09 Hadi, 10 M57)	30
Bundled case studies	11
Evidence files in realistic variant	49
MITRE ATT&CK tactic coverage	11 of 12
Unit tests	68 green

Verification

recall:                      1.000   (F-001 + F-013)
false_positive_rate:         0.000
hallucination_count:         0
evidence_integrity_preserved: true
self_correction_observed:    true
audit_chain_length:          3 entries, SHA-256-linked

Full Changelog

See CHANGELOG.md for the complete diff.

Compare: v0.6.1...v0.7.0

Assets 2

14 May 09:07

Juwon1405

v0.6.1

c94c1c4

v0.6.1 — macOS quarantine + Linux cron + DNS tunneling

Three new native MCP functions, plus the Single-Source-of-Truth cleanup that closes the v0.6.0 drift loop.

Added

Function	Purpose	MITRE
`parse_macos_quarantine`	macOS `LSQuarantineEvent` reader — download provenance, non-browser downloader flagging, pastesite/raw-IP/darknet origin detection	T1204, T1566.002, T1105
`parse_linux_cron_jobs`	Enumerate `/etc/crontab`, `cron.d/`, `cron.{hourly,daily,weekly,monthly}/`, `/var/spool/cron/` — flag curl-pipe-shell, base64 decode, `@reboot` triggers, `/tmp/*.sh`, netcat listeners	T1053.003, T1059.004, T1546
`detect_dns_tunneling`	DNS query log analysis (BIND9/dnsmasq/generic) — Shannon entropy + long-label + rare-qtype + volume + Iodine/dnscat2 signatures. Opens TA0011 (Command-and-Control) coverage	T1071.004, T1568.002, T1572

17 new unit tests in test_v06_macos_linux.py. Full test suite passes on a clean clone.

Fixed

CI workflow (ci.yml), examples/sift-adapter-demo.sh, and scripts/install.sh no longer hardcode native/total counts. Drift-safe invariant checks (count > 0, native + sift == total, no forbidden tool names) replaced exact-count assertions.
This was the root cause of ten consecutive failed CI runs between v0.6.0 (2026-05-13) and the SoT cleanup commit on 2026-05-14.

Changed

Companion repo agentic-dart-collector-adapter flipped from Apache-2.0 to MIT for ecosystem consistency.
Hardcoded counts removed from ~25 locations across README body, docs, wiki, and profile surfaces. Numbers now live only in: README L92+L259 Hero, DEVPOST_SUBMISSION.md, DEMO_STORYBOARD.md, and tests/test_mcp_surface.py canonical name set.

Surface

Runtime list_tools() returns the typed read-only MCP surface (45 native pure-Python forensic functions + 25 SIFT Workstation adapters). The canonical name set is asserted in tests/test_mcp_surface.py::test_registered_tools_are_exact_set.

Full changelog: CHANGELOG.md

Assets 2

12 May 00:24

Juwon1405

v0.5.4

67e3f6e

v0.5.4 — NIST CFReDS Hacking Case integration

NIST CFReDS Hacking Case integration — external benchmark validation

This release adds external benchmark validation against the NIST CFReDS "Hacking Case" (Greg Schardt / Mr. Evil) — a community-trusted forensic dataset with published ground-truth answers.

Highlights

🆕 New primitive: parse_registry_hive (general native registry hive parser)
🆕 New case study: case-08 (CFReDS Hacking Case full traversal)
📊 3-tier accuracy evaluation now documented in docs/accuracy-report.md:

Tier	Dataset	recall (v0.5.4)
1	Synthetic reference (CI baseline)	1.000 / FPR=0.000
2	Noise-injected realistic (~1:30 IOC:benign)	1.000 / FPR=0.000
3	NIST CFReDS Hacking Case	0.50 strict / 0.80 lenient

🚀 5× CFReDS recall jump from v0.5.3 (0.10 / 0.40) after parse_registry_hive shipped — unlocked 4 findings at once (closes #52)
✅ 43/43 tests pass on Python 3.10/3.11/3.12/3.13 matrix
📦 61 MCP tools (36 native + 25 SIFT adapters), all read-only

Why this matters

Synthetic recall=1.000 by itself looks too good to be true. v0.5.4 lets us state honestly that external benchmark recall is 0.50/0.80, and trace the remaining gap to specific paradigm differences — turning "registry parsing is on the wishlist" into "registry parsing unlocks 4 measured findings, ship next."

What's next (Phase 2)

#53 IE6 index.dat parser
#54 Recycle Bin INFO2 parser
#55 Bundled YARA rule library
#47 Additional external datasets (Ali Hadi, DFRWS, BOTS)

Reference

Submission target: SANS FIND EVIL! 2026 (findevil.devpost.com)
Deadline: 2026-06-15 23:45 EDT (JST 2026-06-16 12:45 PM)
Accuracy methodology: docs/accuracy-report.md

Assets 2

Releases: Juwon1405/agentic-dart

v1.2.0 — SANS Find Evil! 2026

This release

Uh oh!

v1.1.0 — Stable release (SANS FIND EVIL! 2026)

Why 1.1.0 supersedes everything before it

Highlights

Requirements / dependencies

Install

Uh oh!

v1.0.1 — Platform overhaul: run_eval CLI, tiered case layout, OS-aware installer

Highlights

Measured QA at this tag

Known limitations

Uh oh!

v0.7.1 — Linux DFIR triplet + ground-truth function reconciliation

Highlights

Added — Linux DFIR triplet (2 new MCP functions)

Changed — case-09 ground-truth function names reconciled

Ground-truth coverage post-reconciliation

Added — test coverage

Added — sample evidence

Post-release counts

Verification

Uh oh!

v0.7.0 — case-11 supply-chain/ESC8 + evidence schema fidelity

Highlights

case-11 supply-chain entry → AD certificate-services abuse

Every sample evidence file enriched to native forensic-tool dump fidelity

Fixed

Post-release counts

Verification

Full Changelog

Uh oh!

v0.6.1 — macOS quarantine + Linux cron + DNS tunneling

Added

Fixed

Changed

Surface

Uh oh!

v0.5.4 — NIST CFReDS Hacking Case integration

NIST CFReDS Hacking Case integration — external benchmark validation

Highlights

Why this matters

What's next (Phase 2)

Reference

Uh oh!