feat(ENG-12699): TypeScript parity and synced ONNX bundle#8
Merged
Conversation
…ndle - Port dangerous-key filtering, fractional cumulative risk, and traversal config. - Add packed-chunk Tier 2 flow, density adjustment, and ONNX batch chunking. - Add optional SFE (fasttext) with bundled model and extras. - Sync minilm-full-aug artifacts (quantized ONNX, tokenizer, config) with @stackone/defender. - Bump version and release metadata; update changelog and README. Made-with: Cursor
fasttext-wheel 0.9.2 has no cp313 wheels; resolving it in the dev group forced a broken sdist build on GitHub Actions. Remove it from dev deps (SFE tests use mocks). Gate the [sfe] extra with a Python version marker and document 3.13 behavior in the README. Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Release 0.6.1 updates the Python stackone-defender package to match the current TypeScript @stackone/defender behavior, including refreshed bundled MiniLM ONNX assets and new preprocessing/scoring behavior.
Changes:
- Adds optional SFE preprocessing (
use_sfe) with bundled FastText model support (fail-open when unavailable). - Updates Tier 2 flow to packed-chunk batching, density-adjusted scoring, and ONNX batch chunking to bound memory.
- Hardens traversal/sanitization: dangerous key filtering (
__proto__,constructor,prototype) and fractional cumulative-risk thresholds.
Reviewed changes
Copilot reviewed 20 out of 23 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds fasttext-wheel (and transitive deps) and bumps project version to 0.6.1 with new extras metadata. |
| pyproject.toml | Bumps version to 0.6.1; adds sfe extra and dev dependency for fasttext-wheel. |
| README.md | Documents SFE extra/usage; updates Tier 2 description; documents new DefenseResult fields. |
| CHANGELOG.md | Adds 0.6.1 release notes and breaking-change callouts. |
| .release-please-manifest.json | Updates manifest version to 0.6.1. |
| src/stackone_defender/config.py | Introduces DANGEROUS_KEYS + MAX_TRAVERSAL_DEPTH; deep-copies defaults; adds fractional cumulative-risk thresholds. |
| src/stackone_defender/types.py | Extends config/metadata/result types for fractional thresholds, dangerous-key reporting, and new result fields. |
| src/stackone_defender/core/tool_result_sanitizer.py | Filters dangerous keys during traversal; adjusts cumulative risk accounting to support fractional thresholds. |
| src/stackone_defender/core/prompt_defense.py | Adds use_sfe; switches Tier 2 to chunk prep + batched chunk scoring; reports fields_dropped/truncated_at_depth. |
| src/stackone_defender/sfe/preprocess.py | New SFE preprocessing implementation with predictor caching and depth-bounded traversal. |
| src/stackone_defender/sfe/init.py | Exports SFE public API. |
| src/stackone_defender/classifiers/onnx_classifier.py | Adds bounded batch chunking and token counting/max-length helpers. |
| src/stackone_defender/classifiers/tier2_classifier.py | Adds chunk preparation + packed-sentence chunking path; batch chunk passthrough API. |
| src/stackone_defender/init.py | Exposes SFE symbols at package top-level. |
| src/stackone_defender/models/minilm-full-aug/config.json | Syncs bundled model metadata with TS assets. |
| src/stackone_defender/models/minilm-full-aug/tokenizer_config.json | Syncs tokenizer config with TS assets. |
| tests/test_tier2_classifier.py | Adds tests for prepare_chunks skipping and chunk-batch passthrough. |
| tests/test_onnx_classifier.py | Adds test coverage for ONNX batch chunking behavior. |
| tests/test_sfe.py | New tests for SFE preprocessing and PromptDefense integration (fields_dropped). |
| tests/test_integration.py | Adds dangerous-key removal test; updates Tier 2 scoping tests to new chunk-based flow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
7 issues found across 23 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="README.md">
<violation number="1" location="README.md:168">
P2: Use `field(default_factory=list)` instead of `[]` for the dataclass list default in the README snippet.</violation>
</file>
<file name="src/stackone_defender/core/tool_result_sanitizer.py">
<violation number="1" location="src/stackone_defender/core/tool_result_sanitizer.py:277">
P2: Accessing `medium_fraction`/`patterns_fraction` without defaults can raise `KeyError` for valid custom threshold dicts that omit the new keys.</violation>
</file>
<file name="src/stackone_defender/config.py">
<violation number="1" location="src/stackone_defender/config.py:90">
P2: `tool_overrides` is shallow-copied, so nested lists are shared with global defaults and can be mutated across configs.</violation>
</file>
<file name="src/stackone_defender/classifiers/onnx_classifier.py">
<violation number="1" location="src/stackone_defender/classifiers/onnx_classifier.py:134">
P2: `count_tokens` returns padded sequence length, not the actual token count, because tokenizer padding is enabled globally.</violation>
</file>
<file name="src/stackone_defender/sfe/preprocess.py">
<violation number="1" location="src/stackone_defender/sfe/preprocess.py:66">
P2: TOCTOU race: the lock is released between the cache-miss check and the model load, so concurrent threads can each load the model redundantly. Hold the lock across the full check-and-populate block to prevent duplicate expensive loads.</violation>
<violation number="2" location="src/stackone_defender/sfe/preprocess.py:200">
P2: Depth tracking is inconsistent between `_extract_fields` (arrays don't increment `depth`) and `_filter_by_paths` (arrays do increment `depth`). For deeply nested array structures, fields extracted for drop-classification may not be reachable by the filter, so they silently survive. Either both functions should count array levels the same way, or `_filter_by_paths` should mirror `_extract_fields` by using a separate `stack_depth` parameter.</violation>
</file>
<file name="src/stackone_defender/classifiers/tier2_classifier.py">
<violation number="1" location="src/stackone_defender/classifiers/tier2_classifier.py:133">
P1: `count_tokens` always returns 256 because the tokenizer has `enable_padding(length=256)` set, so `len(encoding.ids)` includes padding tokens. Since `get_max_length()` also returns 256, the condition `total_tokens <= model_max_len` is always true and the entire chunk-splitting branch below is dead code. The same applies to `prepare_chunks` and `_pack_sentences`.
`count_tokens` should strip padding tokens before returning, e.g. by counting non-pad ids or using `len(encoding.tokens)` without padding, or by temporarily disabling padding for the count.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
fasttext-wheel lacks reliable cp313 wheels and fails sdist builds on CI. fasttext-ng provides the same fasttext import namespace, supports Python 3.11+, and declares numpy>=2.3. Add it to dev so SFE-related tests run with the real module when available; refresh lockfile and docs. Made-with: Cursor
- Tier 2 string extraction: when tier2_fields is None, scope to Tier 1 risky_field_names when present; else all strings. Align integration test. - ONNX count_tokens: sum attention_mask so padded length does not disable chunk splitting; add regression test. - Cumulative escalation: merge defaults into sanitizer thresholds; use .get with defaults in _should_escalate for partial custom dicts. - create_config: deep-copy tool_overrides list values. - SFE: hold predictor lock across import/load; align list depth in filter/compact. - README DefenseResult snippet: field(default_factory=list). - Tier2Config docstring: clarify None vs empty list semantics. Made-with: Cursor
glebedel
approved these changes
Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This release aligns
stackone-defenderPython with the current@stackone/defenderTypeScript behavior and refreshes the bundled MiniLM ONNX assets so byte-for-byte hashes match the TS repo.What changed
__proto__,constructor,prototype).use_sfe, bundledsfe/model.ftz, andfasttext-wheeloptional extra.DefenseResult.fields_dropped,truncated_at_depth, and relatedcreate_configmerges.minilm-full-aug/(model_quantized.onnx,config.json,tokenizer.json,tokenizer_config.json) copied fromdefenderso Python matches TS.Testing
uv run pytest— 188 passed.Made with Cursor
Summary by cubic
Aligns
stackone-defenderPython with@stackone/defender0.6.1 and syncs the MiniLM ONNX bundle. Meets ENG-12699 parity with packed-chunk Tier 2, optional SFE preprocessing, and traversal hardening for safer, more accurate detection.New Features
use_sfeflag with bundledsfe/model.ftz; install viastackone-defender[sfe]; usesfasttext-ng; fails open if unavailable.__proto__,constructor,prototype), adds fractional cumulative-risk thresholds and stack-depth cap.DefenseResult.fields_dropped,DefenseResult.truncated_at_depth, andSanitizationMetadata.dangerous_keys_removed; improvedcreate_configmerges; MiniLM artifacts synced to match TS.Bug Fixes
tier2_fieldsisNone, use Tier 1risky_field_names; otherwise all strings.Written for commit bf173ac. Summary will update on new commits.