Skip to content

v0.51.0 — first real corporate-share benchmark + Snaffler head-to-head

Choose a tag to compare

@byevincent byevincent released this 10 Jun 21:50
· 10 commits to main since this release

v0.51.0 — first real corporate-share benchmark

The first published head-to-head against upstream Snaffler on a
real Windows NTFS share, not LLM-curated paths.

The number

Tool Caught Missed FPs F1 at Red+
Upstream Snaffler 16 59 4 0.337
ShareSift v0.51 54 21 62 0.565

2525 files. 75 synthetic-but-format-shaped credentials across 16
categories. Operator triage policy (Red+).

ShareSift catches 3.4× more credentials than Snaffler. At the
cost of 15× more false positives, which is the genuine tradeoff:
the path classifier is aggressive on binary-extension noise (.msi
/.iso/.psd). Run Black-only for P=0.833 if you don't want them;
run Red+ if you don't want 59 real credentials silently missed.

Why this corpus exists

The v0.50 scorecard had one honesty caveat: the Windows precision
number (P=0.984 on snaffler-blind) came from LLM-labeled paths,
not real share content. v0.51 replaces it with:

  • 2525 actual files on an NTFS partition built from a reproducible
    JSON manifest via Stauffer's DiskForge
  • 75 positives across 16 categories — one per ShareSift rule
    generation v0.46→v0.50, plus the classic high-value categories
  • 2420 corporate-share noise + 20 precision-stress filenames
  • UNC backslash form (\\corp-fs01\…) — what the rule engine sees
    on real SMB shares
  • One docker run from the committed seed → byte-identical corpus

Honest caveat

The 16 positive categories were authored to exercise ShareSift's
rule coverage. Snaffler's defaults don't ship with rules for
German cred filenames, CMD set "VAR=val", browser-creds
meta-coverage, etc. A neutral-curated corpus would show Snaffler
at maybe 40–50% recall. The categories ShareSift covers are real
corporate-share shapes (operator-reported in Snaffler's own issue
tracker), not invented for benchmark-chasing — but the
operational gap is amplified by category selection. Full
disclosure in docs/diskforge_winshare_v1_results.md.

What didn't change

The 4-generation held-out discipline cycle is still the
methodology contribution. v3 still at 100%, v4 still at 70%
baseline. The benchmark adds the operational head-to-head story
on top.

Reproducing

git clone --branch v0.51.0 https://github.com/byevincent/ShareSift.git
cd ShareSift
uv sync --group pysnaffler-integration
bash tools/diskforge_winshare/build_corpus.sh
.venv/bin/python tools/run_full_sweep.py

Same seed = byte-identical corpus = same numbers.

Artifacts

  • sharesift — 77MB single-file binary (Stage 1 + rule engine)
  • Full source — git clone --branch v0.51.0

🤖 Generated with Claude Code