Skip to content

feat(ENG-70): enable SFE preprocessing to reduce Bash false positives#15

Merged
hiskudin merged 4 commits into
mainfrom
feat/enable-sfe-fasttext
May 5, 2026
Merged

feat(ENG-70): enable SFE preprocessing to reduce Bash false positives#15
hiskudin merged 4 commits into
mainfrom
feat/enable-sfe-fasttext

Conversation

@hiskudin
Copy link
Copy Markdown
Contributor

@hiskudin hiskudin commented May 2, 2026

Summary

Turns on StackOne Defender's optional Structural Field Extraction (SFE) preprocessor in the PostToolUse hook. SFE drops metadata-shaped fields (file listings, JSON snippets, `ls -lh` output) before they reach the Tier 2 ML classifier, eliminating a false-positive class observed in real Claude Code sessions.

Changes

  • Add `fasttext.wasm@^1.0.0` to dependencies (required peer dep for SFE)
  • Bump `@stackone/defender` from `^0.5.8` to `^0.6.3` (prepared earlier but not landed)
  • Enable `useSfe: true` in `scripts/scan-tool-result.mjs` PromptDefense constructor

Why

Without SFE, Defender flagged routine Bash output as high-risk:

Input tier2Score Outcome
`ls -lh` file metadata 0.919 blocked (FP)
`grep version` on package.json 0.999 blocked (FP)
Directory listing with `injection` in filenames 0.401 warning (FP)

The classifier was trained on prose-shaped data and treats dense terminal output as suspicious. SFE's FastText preprocessor recognises these fields by structural cues (depth, field name, value preview) and drops them before classification.

Verified empirically

After the change:

Input Behaviour
`ls -lh` file metadata dropped by SFE → tier2 skipped → no warning
JSON `grep version` snippet dropped by SFE → tier2 skipped → no warning
DAN-style injection prompt not dropped → tier2Score 1.000 → correctly flagged high-risk

So FPs disappear without weakening true-positive detection.

Test plan

  • Install plugin in fresh Claude Code session, run `ls -lh` on a file with `injection` in the path → should not be blocked
  • Fetch a known DAN/STAN prompt via curl → should still be blocked

🤖 Generated with Claude Code


Summary by cubic

Enabled StackOne Defender’s Structural Field Extraction (SFE) in scripts/scan-tool-result.mjs, parse Bash’s JSON envelopes before scanning, and add targeted logging for JSON-shaped parse failures to cut false positives without weakening real injection detection. Addresses ENG-70.

  • Bug Fixes

    • Pass MCP/WebFetch responses as raw objects so SFE can drop metadata per field; wrap plain strings as { output }.
    • Parse Bash’s JSON-encoded envelopes before scanning so SFE sees stdout/stderr fields, not wrapper punctuation.
    • Log JSON-shaped parse failures only when the string looks like JSON to surface anomalies without noise.
    • Check all top-level deps and reinstall if any are missing to ensure fasttext.wasm loads after upgrades.
  • Dependencies

    • Added fasttext.wasm@^1.0.0.
    • Bumped @stackone/defender to ^0.6.3.

Written for commit 63f1b64. Summary will update on new commits.

Adds fasttext.wasm peer dep and turns on Defender's optional Structural
Field Extraction (SFE) preprocessor. SFE classifies each input field
via FastText char-n-grams and drops metadata-shaped fields before the
Tier 2 ML classifier sees them.

This fixes a false-positive class observed in production: routine Bash
output (ls -lh, grep on JSON, file listings) was scoring high enough on
the Tier 2 classifier to block legitimate tool calls. With SFE enabled,
these fields are dropped pre-classification while real injection content
still reaches Tier 2 unchanged.

Verified empirically:
  ls -lh metadata    : 0.691 (medium) → dropped, no warning
  JSON grep version  : 0.999 (high)   → dropped, no warning
  DAN injection      : 1.000 (high)   → not dropped, still flagged

Also bumps @stackone/defender from ^0.5.8 to ^0.6.3 (this bump was
prepared in an earlier branch but never landed on main).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 2, 2026 15:36
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Requires human review: This PR modifies core security logic by enabling a new preprocessor (SFE) in the PromptDefense pipeline, which requires manual verification to ensure it doesn't create false negatives.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Claude PostToolUse scanning hook to enable StackOne Defender's new SFE preprocessing in order to reduce false positives on Bash/tool output before content reaches the Tier 2 classifier.

Changes:

  • Bumps @stackone/defender from ^0.5.8 to ^0.6.3.
  • Adds fasttext.wasm as a dependency to satisfy the new optional SFE peer dependency.
  • Enables useSfe: true in the PostToolUse scanner and documents the intended false-positive reduction.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

File Description
scripts/scan-tool-result.mjs Enables SFE in the Defender hook and adds explanatory comments.
package.json Updates Defender and adds fasttext.wasm dependency.
package-lock.json Locks the new Defender version and the added FastText package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/scan-tool-result.mjs
Comment thread scripts/scan-tool-result.mjs
@hiskudin hiskudin changed the title feat: enable SFE preprocessing to reduce Bash false positives feat(ENG-70): enable SFE preprocessing to reduce Bash false positives May 5, 2026
Two reviewer issues from PR #15:

1. Stop pre-stringifying object responses (MCP/WebFetch). Defender's
   defendToolResult accepts arbitrary objects; flattening them to a
   single JSON string limited SFE to a binary keep-or-drop decision over
   the whole payload. Passing the raw object through gives SFE per-field
   granularity — it can drop metadata fields like messageId, headers.*,
   and timestamps while preserving content fields like body and snippet
   for Tier 2 classification.

2. Bootstrap now checks every top-level dep from package.json rather
   than only @stackone/defender. Adding a new peer dep (e.g.
   fasttext.wasm for SFE) on an existing plugin install previously
   wouldn't trigger reinstall, so SFE could fail at runtime with
   defender unable to load fasttext.wasm. The new check looks for any
   missing dep and runs npm install if found.

Verified empirically:
  Bash string metadata    : SFE drops 'output' field, no FP
  MCP Gmail w/ injection  : SFE drops headers/IDs, body reaches Tier 2,
                            tier2Score 1.000, BLOCKED correctly
  MCP benign metadata     : all metadata fields dropped, no FP

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Changes core security scanning logic (SFE, payload handling) and production dependencies. High risk of impacting threat detection, requiring human review to ensure no false negatives or breakage.

Claude Code's PostToolUse hook delivers Bash stdout as a JSON-encoded
envelope string ('{"stdout":"...","stderr":"...","exit_code":0}'), not
as raw stdout. The envelope syntax itself ({"stdout":"..., "stderr":"...)
inflates Tier 2 scores significantly — we observed 0.138 → 0.996 on
identical content purely from the wrapper punctuation matching
JSON-shaped patterns the classifier saw during training.

Fix: try JSON.parse on string inputs; if the result is an object, scan
its structured form so SFE walks individual fields (stdout, stderr) and
Tier 2 only sees real content, not framing.

Verified empirically against the 6 documented Claude Code FPs:
  FP-3 (git status)    : 0.997 high (BLOCK) → skip pass (SFE drops)
  FP-5 (status + log)  : 0.997 high (BLOCK) → skip pass (SFE drops)
  FP-6 (git log -2)    : 0.998 high (BLOCK) → 0.658 medium pass
  FP-1, FP-2           : already passing via SFE; behaviour unchanged
  FP-4 (lefthook)      : surfaces a deeper Tier-2-prose-shape FP that
                         this fix can't address — needs corroborated-
                         signal block policy in a follow-up

Real injection sanity check: DAN-style prompt still scores 1.000 high
and blocks correctly. TPR preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread scripts/scan-tool-result.mjs Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Changes modify core security scanning logic (parsing, SFE preprocessor, dependency upgrades) which could impact false positive/negative rates. Requires human review to verify correctness and safety of

Aikido reviewer flagged the empty catch around JSON.parse(raw). The
catch is intentional control flow — most Bash outputs aren't JSON
envelopes and routinely fail to parse, so logging every miss would
generate stderr noise per call.

Compromise: log only when the string looked JSON-shaped (starts with
{ or [) but failed to parse. That's a genuine anomaly worth surfacing
(corrupted Claude Code hook payload, non-JSON tool with JSON-like
prefix, etc.) without spamming on routine plain-string inputs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Modifies core security scanning logic (tool result parsing, SFE feature), introduces new production dependency (fasttext.wasm), and changes how outputs are prepared for classification. Any bug could c

@hiskudin hiskudin merged commit b65ca10 into main May 5, 2026
4 checks passed
@hiskudin hiskudin deleted the feat/enable-sfe-fasttext branch May 5, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants