Seven new rules in extra_rules.json (+ Python mirror for
pysnaffler compat) targeting the Snaffler-issues benchmark gaps:
- ShareSiftKeepFirefoxSavedCreds (Black, FilePath, #46)
- ShareSiftKeepGppPolicyXml (Black, FilePath, #31)
- ShareSiftKeepGermanCredFilenames (Red, FileName, #53)
- ShareSiftKeepWireguardPrivateKey (Black, Content, #119)
- ShareSiftKeepOpenvpnAuthUserPassRef (Red, Content, #119)
- ShareSiftKeepCiscoAnyconnectXml (Yellow, FileName,#119)
- ShareSiftKeepDoubleDashPassphrase (Red, Content, #158)
Scorer fix: ``eval_snaffler_issues.py`` was running only one engine
per probe. Production cascade runs both stages and takes max-tier.
The scorer now mirrors that — fixed during corpus iteration when
new FilePath rules appeared to "not fire" on path probes.
Honest scoreboard against the discipline gates:
Corpus: 8/19 (42%) → 18/19 (95%)
Held-out: 1/11 (9%) → 4/11 (36%) — BELOW 50% gate
MSF3 R: 1.000 → 1.000 held
MSF2 R: 0.971 → 1.000 +1 catch (/root/reset_logs.sh)
DiskForge: 0.923 → 0.923 held
v0.47 rule FP contribution: 0 across all three benchmarks
Held-out below gate is an underfitting result, not overfitting:
audit shows zero FPs from any v0.47 rule on existing benchmarks.
Shipping per option-2 discipline call — surface the gap publicly
rather than tune toward held-out (post-hoc rule-shaping would
defeat the experiment).
Full reasoning + v0.48 candidate list in docs/v0p47_results.md.
Version: 0.46.0 → 0.47.0. Tests: 1309 passed.