feat(ENG-70): enable SFE preprocessing to reduce Bash false positives#15
Conversation
Adds fasttext.wasm peer dep and turns on Defender's optional Structural Field Extraction (SFE) preprocessor. SFE classifies each input field via FastText char-n-grams and drops metadata-shaped fields before the Tier 2 ML classifier sees them. This fixes a false-positive class observed in production: routine Bash output (ls -lh, grep on JSON, file listings) was scoring high enough on the Tier 2 classifier to block legitimate tool calls. With SFE enabled, these fields are dropped pre-classification while real injection content still reaches Tier 2 unchanged. Verified empirically: ls -lh metadata : 0.691 (medium) → dropped, no warning JSON grep version : 0.999 (high) → dropped, no warning DAN injection : 1.000 (high) → not dropped, still flagged Also bumps @stackone/defender from ^0.5.8 to ^0.6.3 (this bump was prepared in an earlier branch but never landed on main). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the Claude PostToolUse scanning hook to enable StackOne Defender's new SFE preprocessing in order to reduce false positives on Bash/tool output before content reaches the Tier 2 classifier.
Changes:
- Bumps
@stackone/defenderfrom^0.5.8to^0.6.3. - Adds
fasttext.wasmas a dependency to satisfy the new optional SFE peer dependency. - Enables
useSfe: truein the PostToolUse scanner and documents the intended false-positive reduction.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
scripts/scan-tool-result.mjs |
Enables SFE in the Defender hook and adds explanatory comments. |
package.json |
Updates Defender and adds fasttext.wasm dependency. |
package-lock.json |
Locks the new Defender version and the added FastText package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two reviewer issues from PR #15: 1. Stop pre-stringifying object responses (MCP/WebFetch). Defender's defendToolResult accepts arbitrary objects; flattening them to a single JSON string limited SFE to a binary keep-or-drop decision over the whole payload. Passing the raw object through gives SFE per-field granularity — it can drop metadata fields like messageId, headers.*, and timestamps while preserving content fields like body and snippet for Tier 2 classification. 2. Bootstrap now checks every top-level dep from package.json rather than only @stackone/defender. Adding a new peer dep (e.g. fasttext.wasm for SFE) on an existing plugin install previously wouldn't trigger reinstall, so SFE could fail at runtime with defender unable to load fasttext.wasm. The new check looks for any missing dep and runs npm install if found. Verified empirically: Bash string metadata : SFE drops 'output' field, no FP MCP Gmail w/ injection : SFE drops headers/IDs, body reaches Tier 2, tier2Score 1.000, BLOCKED correctly MCP benign metadata : all metadata fields dropped, no FP Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Requires human review: Changes core security scanning logic (SFE, payload handling) and production dependencies. High risk of impacting threat detection, requiring human review to ensure no false negatives or breakage.
Claude Code's PostToolUse hook delivers Bash stdout as a JSON-encoded
envelope string ('{"stdout":"...","stderr":"...","exit_code":0}'), not
as raw stdout. The envelope syntax itself ({"stdout":"..., "stderr":"...)
inflates Tier 2 scores significantly — we observed 0.138 → 0.996 on
identical content purely from the wrapper punctuation matching
JSON-shaped patterns the classifier saw during training.
Fix: try JSON.parse on string inputs; if the result is an object, scan
its structured form so SFE walks individual fields (stdout, stderr) and
Tier 2 only sees real content, not framing.
Verified empirically against the 6 documented Claude Code FPs:
FP-3 (git status) : 0.997 high (BLOCK) → skip pass (SFE drops)
FP-5 (status + log) : 0.997 high (BLOCK) → skip pass (SFE drops)
FP-6 (git log -2) : 0.998 high (BLOCK) → 0.658 medium pass
FP-1, FP-2 : already passing via SFE; behaviour unchanged
FP-4 (lefthook) : surfaces a deeper Tier-2-prose-shape FP that
this fix can't address — needs corroborated-
signal block policy in a follow-up
Real injection sanity check: DAN-style prompt still scores 1.000 high
and blocks correctly. TPR preserved.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Requires human review: Changes modify core security scanning logic (parsing, SFE preprocessor, dependency upgrades) which could impact false positive/negative rates. Requires human review to verify correctness and safety of
Aikido reviewer flagged the empty catch around JSON.parse(raw). The
catch is intentional control flow — most Bash outputs aren't JSON
envelopes and routinely fail to parse, so logging every miss would
generate stderr noise per call.
Compromise: log only when the string looked JSON-shaped (starts with
{ or [) but failed to parse. That's a genuine anomaly worth surfacing
(corrupted Claude Code hook payload, non-JSON tool with JSON-like
prefix, etc.) without spamming on routine plain-string inputs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Requires human review: Modifies core security scanning logic (tool result parsing, SFE feature), introduces new production dependency (fasttext.wasm), and changes how outputs are prepared for classification. Any bug could c
Summary
Turns on StackOne Defender's optional Structural Field Extraction (SFE) preprocessor in the PostToolUse hook. SFE drops metadata-shaped fields (file listings, JSON snippets, `ls -lh` output) before they reach the Tier 2 ML classifier, eliminating a false-positive class observed in real Claude Code sessions.
Changes
Why
Without SFE, Defender flagged routine Bash output as high-risk:
The classifier was trained on prose-shaped data and treats dense terminal output as suspicious. SFE's FastText preprocessor recognises these fields by structural cues (depth, field name, value preview) and drops them before classification.
Verified empirically
After the change:
So FPs disappear without weakening true-positive detection.
Test plan
🤖 Generated with Claude Code
Summary by cubic
Enabled StackOne Defender’s Structural Field Extraction (SFE) in
scripts/scan-tool-result.mjs, parse Bash’s JSON envelopes before scanning, and add targeted logging for JSON-shaped parse failures to cut false positives without weakening real injection detection. Addresses ENG-70.Bug Fixes
{ output }.stdout/stderrfields, not wrapper punctuation.fasttext.wasmloads after upgrades.Dependencies
fasttext.wasm@^1.0.0.@stackone/defenderto^0.6.3.Written for commit 63f1b64. Summary will update on new commits.