Releases: byevincent/ShareSift
v0.55.2 — Cascade walker/decode/rules
Four engagement-blocking fixes from HTB Cascade smoke test + top priorities from the rule corpus audit. Net: TightVNC .reg password catch lands Red end-to-end.
Fixed
Walker ACCESS_DENIED no longer crashes the share scan
Cascade Data share crashed on the first denied subdir (Contractors/) even though IT/ (containing the VNC password) was readable. Both share/smb.py and share/smb_impacket.py walkers now catch STATUS_ACCESS_DENIED, record skipped subtree, continue.
UTF-16LE files (.reg exports) decode correctly
extract.extract_text was UTF-8-decoding everything, garbling UTF-16 into W\\x00i\\x00n\\x00 strings where content regexes couldn't match. Now BOM-aware (UTF-16 LE/BE + UTF-8 BOM detection).
Added (3 rules)
- ShareSiftKeepVncPasswordHex (Red) — TightVNC/UltraVNC
\"Password\"=hex:...in .reg. Live-validated on HTB Cascade. - ShareSiftKeepRegistryAutoLogonPassword (Red) — generalizes to DefaultPassword, AutoAdminLogon, EncMasterPassword (WinSCP), PortablePassword.
- ShareSiftKeepGitleaksHighConfidencePrefixes (Red) — Slack
xox[bpe]-, GitHubgh[psuor]_/github_pat_, Stripe live, Vaulthvs., Shopify, Twilio, SendGrid, npm. Closes the modern-SaaS gap Snaffler upstream predates.
Live-validated on HTB Cascade
sharesift hunt //10.129.13.58 -u r.thompson -p 'rY4n5eva':
- All 4 shares walked (was 2 before)
- Data\IT\Temp\s.smith\VNC Install.reg → Red with ShareSiftKeepVncPasswordHex
- SYSVOL went from crashed to 14 files, 5 tier-flagged
Tests
+19. Full suite: 1458 passed, 29 skipped, 0 failed.
v0.55.1 — Kerberos ccache fixes from HTB Sauna
Three Kerberos ccache findings from HTB Sauna (EGOTISTICAL-BANK.LOCAL). All three surfaced live; the clock-skew fix is the biggest operational win because HTB labs commonly run ~7h ahead of attacker-box time.
Fixed
Auth(kerberos=True) no longer requires -u
The user principal lives in the ccache; pre-fix ShareSift forced redundant -u <principal> on the CLI.
impacket kerberosLogin was called without kdcHost
Without an explicit KDC host, impacket falls back to DNS lookup for <realm>:88 which fails on attacker boxes without proper resolv.conf. New Auth.kdc_host field; both share.discovery._do_login and share.smb_impacket._do_login now pass kdcHost=auth.kdc_host or target_host, falling back to the SMB target for the AD case where DC == target.
Auto clock-skew shim
New share.auth.install_kerberos_clock_offset() reads the ccache's authtime, compares to local clock, and (if offset > 60s) monkey-patches impacket.krb5.kerberosv5.datetime to add the offset to all datetime.datetime.now(tz) calls. Surgical — only impacket's krb5 module is affected; the rest of Python sees real time. Called automatically from both impacket login dispatch sites.
Live-validated
KRB5CCNAME=/tmp/fsmith.ccache sharesift hunt //10.129.13.53 --use-kcache:
- Clock skew (~7h) → corrected by auto-shim
- No
-urequired (read from ccache) - kdcHost defaulted to target host
- Hunt advances past AP-REQ to
KDC_ERR_S_PRINCIPAL_UNKNOWN— that's the engagement-prep SPN-on-IP issue (operator adds DC FQDN to/etc/hostsand uses FQDN as target).
Tests
+14 (test_kerberos_fixes_v0p55p1.py). Full suite: 1439 passed, 29 skipped, 0 failed.
v0.55.0 — DFS namespace root walking (Multimaster live-validated)
Closes the Multimaster DFS scenario end-to-end. After v0.54.1 let DFS shares pass the probe gate, the walker still failed with STATUS_INVALID_PARAMETER on the namespace root — smbprotocol's regular Open + query_directory doesn't work because the namespace root isn't a real directory, just a referral table.
Fixes
_list_directory DFS root fallback
When tree is DFS-capable and CREATE returns INVALID_PARAMETER, fall back to smbclient.scandir which handles the namespace-root listing via its internal _resolve_dfs.
walk() PATH_NOT_COVERED graceful skip
DFS-link descent typically fails because the resolved fileserver needs operator-managed DNS (standard engagement prep — /etc/hosts entry). Walker now catches PATH_NOT_COVERED, records the skipped link in self._skipped_dfs_links, and continues. Share scan completes cleanly.
Smbclient package shadow workaround
impacket ships a smbclient.py script in venv bin/ that shadows the smbprotocol package under uv run. New _import_real_smbclient helper strips bin dirs from sys.path during the import.
Live-validated against HTB Multimaster
sharesift hunt //10.129.13.28 -u tushikikatomo -p finance1:
dfsshare probe → R ✅ (v0.54.1)- Namespace root listing →
Developmentlink ✅ (v0.55 fallback) - Link descent → PATH_NOT_COVERED → skipped gracefully ✅ (v0.55 walk fix)
- Share scan completes, pipeline continues to NETLOGON + SYSVOL ✅
The v0.53 resolver correctly resolved Development → \\FSMO\Development; walking that requires FSMO in /etc/hosts (engagement prep).
Tests
+7 (test_smb_dfs_walk_v0p55.py). Full suite: 1425 passed, 29 skipped, 0 failed.
Status
DFS scenario is now end-to-end correct from probe → list root → discover links → walk-or-skip. Combined with v0.53's referral resolution and v0.54's three engagement fixes, ShareSift handles:
- Anonymous SMB shares (Active.htb pattern)
- Legacy SMB targets (Server 2008 R2)
- DFS namespace roots + links
Queued for v0.56: GOAD-validated head-to-head benchmark.
v0.54.0 — three engagement fixes from HTB smoke tests
Three real bugs from yesterday's HTB Active + Multimaster smoke tests, all fixed and live-validated where possible.
v0.54.1 — DFS-namespace-root probe (LIVE-VALIDATED)
Surfaced on Multimaster's `\\\dfs` share. Regular SMB2 CREATE on a DFS namespace root returns STATUS_INVALID_PARAMETER (DFS-aware Open required). v0.53's R/W probe was filtering DFS shares out before the walker could touch them.
`SmbShare._probe_access_mask` now treats INVALID_PARAMETER as probe-inconclusive with caller-supplied fallback: read=True (DFS roots ARE walkable), write=False (namespace roots aren't writable). Validated live on `\\10.129.13.28\dfs` — share enters target list with `access: R`.
v0.54.2 — SMB3 encryption auto-fallback
Surfaced on Active.htb (Server 2008 R2, only does SMB 2.0/2.1). Default `--encrypt=True` failed with "SMB encryption is required but the connection does not support it."
`SmbShare._ensure_connected` inspects the negotiated dialect after `Connection.connect()`. Below SMB 3.0 (0x0300) and not `--require-encrypt`: session built with `require_encryption=False`. Legacy Windows targets just work. New `--require-encrypt` flag for opsec engagements where unencrypted is unacceptable.
v0.54.3 — Anonymous SMB via impacket fallback
Surfaced on Active.htb's `Replication` share. smbprotocol+pyspnego rejects empty credentials (`SpnegoError (16): Operation not supported or available`). impacket's null-session login works.
`SmbShare` now lazily constructs an `ImpacketSmbWalker` backend when `auth.anonymous=True`, delegating walk/read_bytes/probe_share_access. Mirrors the smbprotocol contract: sorted deterministic walk, UNC output, byte-cap on reads. Live re-validation pending (Active.htb despawned between fix and re-test).
Tests
+32 tests: 5 DFS-probe + 5 encrypt-fallback + 17 anonymous (split TestAnonymousDispatch + TestImpacketWalker) + 5 v0.35 updates. Full suite: 1418 passed, 29 skipped, 0 failed.
Queued for v0.55
- DFS-aware Opens for walking INTO namespace roots (smbprotocol `tree.is_dfs_share` flag handling). v0.54.1 lets DFS shares enter the target list; v0.55 lets the walker descend into them. Multimaster's `Development` link still trips this — `_list_directory` on the namespace root needs DFS flags set.
- GOAD-validated head-to-head benchmark.
v0.53.1 — HTB Active smoke-test patch + MD4 LDAP fix
End-to-end validation against a real AD lab. First real-AD smoke test (HTB Active, 10.129.13.21, Server 2008 R2) — ShareSift caught the GPP cpassword in Groups.xml as Red tier with the gpp_xml parser, confidence 0.99. That's the exact credential the box is designed to leak.
Three real bugs surfaced; this patch ships the highest-priority fix.
Fixed
ldap3 NTLM bind on OpenSSL 3.x
hashlib.new('md4') raised ValueError: unsupported hash type MD4 on modern Python+OpenSSL (Kali default), blocking the entire v0.52 authenticated LDAP path. share/ad.py now installs a Cryptodome.Hash.MD4-backed shim at module import. Idempotent; no-op when hashlib already supports MD4 (older OpenSSL or legacy provider enabled).
Before:
$ sharesift discover --ad-domain active.htb --dc 10.129.13.21 -u SVC_TGS -p 'X'
ldap discovery failed: ValueError: unsupported hash type MD4
After:
$ sharesift discover --ad-domain active.htb --dc 10.129.13.21 -u SVC_TGS -p 'X'
ldap: 1 enabled computer object(s)
Anonymous LDAP empty-result UX
When AD policy blocks anonymous searches (operationsError, typical on modern AD), we now print a hint pointing at -u/-p, -H, or -k instead of silently reporting 0 results.
Documented
docs/v0p53_htb_smoke_test.md — full HTB Active run writeup with the headline GPP cpassword catch, three bugs surfaced, queued v0.54 fixes.
Queued for v0.54
- smbprotocol anonymous fallback to impacket for SMB walks (pyspnego rejects empty creds;
discoverworks because it uses impacket, buthunt --no-passfails at the per-share probe). - Auto-detect SMB3 capability and fallback to unencrypted (Server 2008 R2 only does SMB 2.0/2.1; current default
--encrypt=Truefails). New--require-encryptflag for the opsec case. - Live-DC validation of v0.53 DFS resolver (Active.htb has no DFS — DFS still unvalidated against real AD).
Tests
Full suite: 1391 passed, 29 skipped, 0 failed.
v0.53.0 — DFS referral resolution + GOAD benchmark harness
DFS just works. v0.52's hunt command now handles \\corp.local\dept\hr-shaped UNCs transparently:
# No flag needed — auto-resolved
sharesift hunt //corp.local/dept/hr -u alice -p PW \
--output-dir /tmp/dfs-huntBehind the scenes: SmbShare catches STATUS_PATH_NOT_COVERED on tree-connect, queries FSCTL_DFS_GET_REFERRALS over IPC$, parses the referral chain, and retargets to the resolved fileserver. Implementation mirrors smbclient._pool.dfs_request (private API in jborean93/smbprotocol; we reimplement using public primitives so we don't bind to internals).
What shipped
DFS referral resolution
share/dfs.py—DfsResolutiondataclass +dfs_request_via_ipc(IOCTL wire-format) +first_target_unc+resolve_dfs_path(orchestration) +is_path_not_coveredshare/smb.py—SmbShare.auto_resolve_dfs=True(default), catchesPathNotCovered, chases referrals via IPC$, retries against the resolved fileserver. Original target preserved as_original_target.hunt --detect-dfsis now informational-only — auto-resolution runs regardless.
GOAD benchmark harness
For when you stand up GOAD (or any AD lab):
python tools/goad_benchmark.py \
--ad-domain sevenkingdoms.local --dc 192.168.56.10 \
-u khal.drogo -p horse \
--snaffler-tsv ./snaffler_run.tsv \
--output-dir ./goad_bench_$(date +%Y-%m-%d)Produces scorecard.md with per-category recall comparison across 19 buckets (GPP cpassword, KeePass, AWS, browser stores, SCCM NAA, etc.) clustering Snaffler's rule labels and ShareSift's rule IDs around shared credential shapes. See docs/goad_benchmark_methodology.md for the lab setup recipe.
Tests
+36 tests (18 DFS resolution + 18 GOAD harness). Full suite: 1391 passed, 29 skipped, 0 failed.
Honest caveats
- DFS resolution mocked-only — no live-DC validation yet. The first run against a real domain DFS namespace will surface any wire-format edge cases (V4-specific
server_typebits, multi-target priority ordering when proximity differs). - GOAD benchmark harness pure-function-tested — the actual
subprocess.runinvocation and TSV-file roundtrip await the lab being up. - v0.52 LDAP smoke test still pending — until ShareSift is pointed at a real AD (HTB, GOAD, work), the LDAP + DFS paths are mock-validated only.
What v0.53 doesn't handle
- Interlink referrals (referral chains across namespaces)
- Referral caching (every connection re-queries)
- Sticky target hints (always picks first entry, no failover)
- Multi-DC LDAP failover
All queued for v0.54+.
See docs/v0p53_results.md for the full sprint writeup.
v0.52.0 — Snaffler-replacement enumeration sprint
One command Snaffler replacement. ShareSift becomes a self-contained Linux-native attacker workflow:
sharesift hunt --ad-domain corp.local --dc dc01.corp.local \
-u alice -p PW --output-dir ./engagementTakes a domain + creds and returns ranked credential findings across every joined host's readable shares. No Snaffler binary, no nxc --shares glue, no shell pipe.
What shipped
| Capability | Module / CLI |
|---|---|
| LDAP-based AD computer object enumeration | share/ad.py |
| AD-wide share discovery | sharesift discover --ad-domain corp.local -u U -p P |
| End-to-end Snaffler-replacement sweep | sharesift hunt --ad-domain corp.local -u U -p P --output-dir ./out |
| Pass-the-Hash via LDAP NTLM | share/ad.py (lm:nt password encoding) |
| Kerberos via LDAP SASL GSSAPI | share/ad.py (KRB5CCNAME ccache) |
| DFS detection utilities (opt-in) | hunt --detect-dfs |
Operator workflows
AD-wide credential hunt:
sharesift hunt --ad-domain corp.local --dc dc01.corp.local \
-u alice -p PW --output-dir ./engagementPass-the-Hash from dumped NT hash:
sharesift hunt --ad-domain corp.local \
-u svc_backup -H 'aad3b...:1c63...' \
--output-dir ./engagementKerberos via existing ccache:
kinit alice@CORP.LOCAL
sharesift hunt --ad-domain corp.local --use-kcache \
--output-dir ./engagementFindings from the foundation audit
Most of the originally-scoped v0.52-v0.55 sprint (R/W ACL probe fixing Snaffler #184, Snaffler skip-list, Kerberos ccache, NetrShareEnum) was already shipped in v0.39 + v0.40. Real gaps were three: LDAP discovery, DFS, hunt command. Sprint compressed from ~5 weeks to one session.
Honest scope caveats
- LDAP path tested against ldap3 mocks, not a live DC. First-run on GOAD will validate.
- DFS referral resolution not yet shipped — detection utilities only, opt-in via
--detect-dfs(heuristic false-positives on every FQDN host). Full referral chasing queues for v0.53. - No live-AD head-to-head benchmark yet.
sharesift huntvsSnaffler.exe -s -d corp.localon a GOAD-class lab queues for v0.55.
Tests
46 new (24 LDAP discovery + 11 DFS detection + 11 hunt orchestration). Full suite: 1299 passed, 51 skipped, 0 failed.
See docs/v0p52_results.md and docs/v0p52_snaffler_replacement_plan.md for the full sprint writeup.
v0.51.0 — first real corporate-share benchmark + Snaffler head-to-head
v0.51.0 — first real corporate-share benchmark
The first published head-to-head against upstream Snaffler on a
real Windows NTFS share, not LLM-curated paths.
The number
| Tool | Caught | Missed | FPs | F1 at Red+ |
|---|---|---|---|---|
| Upstream Snaffler | 16 | 59 | 4 | 0.337 |
| ShareSift v0.51 | 54 | 21 | 62 | 0.565 |
2525 files. 75 synthetic-but-format-shaped credentials across 16
categories. Operator triage policy (Red+).
ShareSift catches 3.4× more credentials than Snaffler. At the
cost of 15× more false positives, which is the genuine tradeoff:
the path classifier is aggressive on binary-extension noise (.msi
/.iso/.psd). Run Black-only for P=0.833 if you don't want them;
run Red+ if you don't want 59 real credentials silently missed.
Why this corpus exists
The v0.50 scorecard had one honesty caveat: the Windows precision
number (P=0.984 on snaffler-blind) came from LLM-labeled paths,
not real share content. v0.51 replaces it with:
- 2525 actual files on an NTFS partition built from a reproducible
JSON manifest via Stauffer's DiskForge - 75 positives across 16 categories — one per ShareSift rule
generation v0.46→v0.50, plus the classic high-value categories - 2420 corporate-share noise + 20 precision-stress filenames
- UNC backslash form (
\\corp-fs01\…) — what the rule engine sees
on real SMB shares - One docker run from the committed seed → byte-identical corpus
Honest caveat
The 16 positive categories were authored to exercise ShareSift's
rule coverage. Snaffler's defaults don't ship with rules for
German cred filenames, CMD set "VAR=val", browser-creds
meta-coverage, etc. A neutral-curated corpus would show Snaffler
at maybe 40–50% recall. The categories ShareSift covers are real
corporate-share shapes (operator-reported in Snaffler's own issue
tracker), not invented for benchmark-chasing — but the
operational gap is amplified by category selection. Full
disclosure in docs/diskforge_winshare_v1_results.md.
What didn't change
The 4-generation held-out discipline cycle is still the
methodology contribution. v3 still at 100%, v4 still at 70%
baseline. The benchmark adds the operational head-to-head story
on top.
Reproducing
git clone --branch v0.51.0 https://github.com/byevincent/ShareSift.git
cd ShareSift
uv sync --group pysnaffler-integration
bash tools/diskforge_winshare/build_corpus.sh
.venv/bin/python tools/run_full_sweep.pySame seed = byte-identical corpus = same numbers.
Artifacts
sharesift— 77MB single-file binary (Stage 1 + rule engine)- Full source —
git clone --branch v0.51.0
🤖 Generated with Claude Code
v0.48.0 — close v0.47 held-out underfit, cleanly
ShareSift v0.48.0 — same-day follow-up to v0.47. Closes the held-out underfit by running the discipline experiment properly: lock NEW held-out FIRST, then write rules from OLD held-out failures only, then validate.
TL;DR
| Gate | v0.47 | v0.48 |
|---|---|---|
| Corpus (training) | 18/19 (95%) | 18/19 (95%) |
| Held-out v1 | 4/11 (36%) | 10/11 (91%) |
| Held-out v2 (new locked) | n/a | 7/10 (70%) |
| MSF3 / MSF2 / DiskForge recall | 1.000 / 1.000 / 0.923 | 1.000 / 1.000 / 0.923 (held) |
| v0.48 rule FP contribution | n/a | 0 across all three |
The generalization signal: ShareSiftKeepBrowserSavedCreds was authored as "generalize Firefox to other Chromium-base browsers." It directly closed 2 held-out v2 probes (Chrome + Edge Login Data) that were locked BEFORE the rule was written — pattern-level generality catching parallel patterns. That's the discipline working as intended.
Full writeup in docs/v0p48_results.md.
Seven new rules (close OLD held-out, sourced #78/#135/#67/#46)
| Rule | Tier | Match | Closes |
|---|---|---|---|
| ShareSiftKeepCiscoEnableSecret | Red | Content | #78 (Cisco IOS enable secret/password/type-7) |
| ShareSiftKeepCiscoSnmpCommunity | Red | Content | #78 (SNMP RW community) |
| ShareSiftKeepCiscoSnmpCommunityRo | Yellow | Content | #78 (SNMP RO community) |
| ShareSiftKeepFileZillaSavedSites | Black | FilePath | #135 (sitemanager.xml saved FTP/SFTP) |
| ShareSiftKeepFileZillaRecentServers | Yellow | FilePath | #135 (recentservers.xml) |
| ShareSiftKeepDotNetAppSettingsConnString | Red | Content | #67 (.NET appsettings.json conn string) |
| ShareSiftKeepBrowserSavedCreds | Black | FilePath | #46 (Chrome/Edge/Brave/Opera Login Data) |
Both extra_rules.json (engine) and extra_rules.py (pysnaffler compat).
New held-out v2 (locked test set)
benchmarks/snaffler_issues/heldout_v2.jsonl — 10 probes from previously-unread Snaffler PR sources:
- #198 (CMD
set PASSWORD=) - #155 (Azure CLI
az login --password) - #124 (XML
<password>with nested tag) - #98 (loose "credential" filename keyword)
- #46 (Chrome + Edge
Login Data— Firefox cousins)
Pre-rule baseline: 5/10 (the v0.47 KeepDoubleDashPassphrase already generalized to Azure CLI patterns — free signal). Post-rule: 7/10 (browser-creds meta-rule catches Chrome + Edge).
eval_snaffler_issues.py grows --set {corpus,heldout,heldout_v2,all}.
What's NOT in v0.48 (deliberate discipline)
3 held-out v2 fails come from sources I MINED for held-out v2:
heldout-v2-198-cmd-set-pgpassword-quoted—set "PGPASSWORD=val"heldout-v2-98-credential-in-filename—credentials_2024.xlsxheldout-v2-98-credentials-export—CustomerCredentialsExport.csv
Adding rules for these in v0.48 would be tuning toward held-out v2 (discipline violation). They become v0.49 candidates — a future held-out v3 will validate them against patterns I haven't yet read.
This is how a discipline-honest research cycle should grow: each version locks the next test set BEFORE writing the rules that close the previous one.
Existing benchmark impact
| Benchmark | v0.47 R | v0.48 R | v0.48 rule FP |
|---|---|---|---|
| MSF3 | 1.000 | 1.000 | 0 |
| MSF2 | 1.000 | 1.000 | 0 |
| DiskForge | 0.923 | 0.923 | 0 |
Zero v0.48 rules fired on any of the three (neither TP nor FP). The Cisco IOS / FileZilla / ADO / browser-creds patterns don't appear in those substrates — MSF3 is AD Windows-shaped, MSF2 Linux Metasploitable, DiskForge a forensic disk image. Rules are surgical to corporate-share patterns.
Binary
77.2 MB single-file binary attached (sharesift). Verified:
wget https://github.com/byevincent/ShareSift/releases/latest/download/sharesift
chmod +x sharesift
./sharesift --version # sharesift 0.48.0v0.49 candidate list
- Close held-out v2 remaining gaps (CMD
set "VAR=val"quoted variant, loose "credential" filename keyword) - Lock held-out v3 from yet-unread sources (#112 SCCM, #140 Kerberos, #139 MDE Linux)
- After v0.49: three generations of held-out signal = calibrated confidence in "corporate-share benchmark progress"
🤖 Generated with Claude Code
v0.46.0 — drop-on-Kali binary + DB exporters
ShareSift v0.46.0 — combined ship covering engagement-DB exporters and the PyInstaller single-file binary breakthrough.
Headline
| Workflow | Before | After |
|---|---|---|
| Get findings into the report tool | grep + hand-format | sharesift export --format ghostwriter |
| Get findings into SysReptor | not supported | sharesift export --format sysreptor |
| Drop ShareSift on a fresh Kali box | pipx install + 100MB deps | wget .../sharesift && chmod +x |
| Binary size | 1.5 GB (v0.38 attempt) | 77 MB (20× smaller) |
| Tests passing | 1309 | 1309 |
Single-file binary (77 MB)
wget https://github.com/byevincent/ShareSift/releases/latest/download/sharesift
chmod +x sharesift
./sharesift --version
# sharesift 0.46.0Covers score-paths, scan-files (rule + extractor), to-snaffler-tsv, sort, query, export. Operators wanting SMB-direct, network discovery, verify, content-classifier, or report rendering use pipx install 'sharesift[smb,network-enum,content-inference,verify,report]' instead.
The size shrink came from a minimal build venv (no torch transitive pulls) + aggressive PyInstaller excludes. Two gotchas worth recording: strip and upx corrupt scipy's OpenBLAS shared lib (binary crashes at import); --clean breaks PyInstaller's PYZ archive. Both documented in docs/v0p46_results.md.
Engagement DB exporters
Three new formats off the v0.41 SQLite datastore:
sharesift export --db engagement.db --format markdown --output findings.md
sharesift export --db engagement.db --format ghostwriter --output findings.csv
sharesift export --db engagement.db --format sysreptor --output sysreptor.json- Markdown — pastes into Dradis, GhostWriter, SysReptor, Notion, Slack, plain delivery docs
- GhostWriter CSV — direct CSV import; columns match the findings-page schema, tier maps to severity
- SysReptor JSON —
projects/v1envelope with lowercased severities
All three sort tier > host > share > rel_path.
Path-prefix dedup deferred
Diagnostic showed MSF3 top-12-30 dominated by 19 copies of an Internet Explorer cache backup. Fixing requires either a path-prefix penalty or rule-action awareness (treat Yellow-from-Relay as Green); both are research-y patterns. v0.28's falsified extension-frequency hypothesis is the cautionary precedent. Top-10 already at 0.80 — not worth disturbing for a marginal gain. Re-open if a future benchmark shows the duplicate-backup pattern materially hurting top-K precision.
What's in the binary
Bundled at runtime:
- Stage 1 path classifiers (Windows + Linux LightGBM models, ~39 MB combined)
- Rule sets:
snaffler_default.json(88 base) +extra_rules.json(v0.12 blind-spot + Gitleaks modern SaaS + v0.42 Linux gap closure)
Excluded (use pipx extras instead):
- Content classifier (torch, ~1.5 GB)
- SMB-direct (smbprotocol, ~30 MB)
- Network discovery (impacket, ~100 MB)
- Verifiers (requests/paramiko/ldap3/jwt/boto3, ~50 MB)
- Report rendering (jinja2)
Changelog
See CHANGELOG.md and docs/v0p46_results.md for the full write-up.
Honest assessment vs Snaffler
v0.45's assessment said ShareSift was technically on-par for most engagement workflows but lagged Snaffler on two fronts: "drop binary on a box" and "feed straight into the report." v0.46 closes both. Open gaps for v0.47+: status heartbeat on long scans, HTML report's Markdown twin, path-prefix dedup with rule-action awareness.
🤖 Generated with Claude Code