Skip to content

fix(ENG-12518): fix field filter and batch sentence classification#35

Merged
hiskudin merged 5 commits intomainfrom
feat(ENG-12518)/globalthis-session-cache
Mar 31, 2026
Merged

fix(ENG-12518): fix field filter and batch sentence classification#35
hiskudin merged 5 commits intomainfrom
feat(ENG-12518)/globalthis-session-cache

Conversation

@hiskudin
Copy link
Copy Markdown
Collaborator

@hiskudin hiskudin commented Mar 31, 2026

Context

Two issues causing high Tier 2 latency:

  1. extractStrings() field filter was broken — the traverse function had a fallthrough case that collected every leaf string under non-matching keys, making tier2Fields and the riskyFieldNames fallback from Tier 1 completely ineffective. For Gmail emails, this meant classifying ~58 sentences (MIME headers, DKIM signatures, base64 blobs) instead of ~4 (just the snippet field identified by Tier 1). At ~35ms per ONNX inference on server CPU, this caused ~2s latency per request instead of ~140ms.

  2. classifyBySentence() called classify() per sentence in a loop — submitting N separate session.run() calls to the libuv thread pool. Under concurrency, these compete for threads and stack up. Switched to classifyBatch() which pads all sentences into a single batched tensor and runs one session.run() call.

Changes

  • src/core/prompt-defense.ts: Fix extractStrings() to skip strings under non-matching keys when field filtering is active. Handle bare string input as edge case.
  • src/classifiers/tier2-classifier.ts: Refactor classifyBySentence() to collect all sentences first, then call classifyBatch() once instead of classify() N times.

Checklist

  • All 174 tests pass
  • Lint clean
  • Verified locally with real Gmail payloads: scores identical, latency reduced
  • Concurrent test (20 parallel calls): no stacking, all complete in ~3.8s total

Post-merge

  • release-please bumps version
  • Bump @stackone/defender in unified-cloud-api and connect-handler

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 31, 2026 09:49
@hiskudin hiskudin requested a review from a team as a code owner March 31, 2026 09:49
@hiskudin hiskudin changed the title feat(onnx): use globalThis for session cache to share across CJS/ESM feat(ENG-12518): use globalThis for session cache to share across CJS/ESM Mar 31, 2026
cubic-dev-ai[bot]
cubic-dev-ai Bot previously approved these changes Mar 31, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Auto-approved: Standard fix for sharing state across CJS/ESM modules using globalThis. Minimal risk, addresses significant performance overhead without changing core logic.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the ONNX classifier’s session/promise caching so it is shared across CommonJS and ESM evaluations within the same Node.js process, preventing duplicate model loads when the package is consumed in dual-module-format setups.

Changes:

  • Moved ONNX session cache and in-flight load promise tracking from module-level Maps to globalThis-backed singletons.
  • Added a dedicated SessionCacheEntry type and expanded inline documentation describing the CJS/ESM sharing rationale.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/classifiers/onnx-classifier.ts Outdated
Comment thread src/classifiers/onnx-classifier.ts Outdated
Comment thread src/classifiers/onnx-classifier.ts Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/config.ts">

<violation number="1" location="src/config.ts:176">
P3: The comment claims the Tier 2 field list adds "data" and "value", but the array only includes "snippet". Either add the missing fields or update the comment to avoid misleading configuration guidance.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/config.ts Outdated
@hiskudin hiskudin force-pushed the feat(ENG-12518)/globalthis-session-cache branch from f274fe8 to a36f52a Compare March 31, 2026 10:50
hiskudin and others added 2 commits March 31, 2026 11:56
The traverse function in extractStrings() had a fallthrough case that
collected every leaf string reached during recursion, even under
non-matching field keys. This made the tier2Fields filter (and the
riskyFieldNames fallback from Tier 1) completely ineffective — Tier 2
always scanned every string in the payload.

For Gmail emails this meant classifying ~58 sentences (MIME headers,
DKIM signatures, base64 blobs) instead of ~4 (just the snippet field
identified by Tier 1). At ~35ms per ONNX inference on server CPU,
this caused ~2s latency per request instead of ~140ms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
classifyBySentence() was calling classify() per sentence in a loop,
submitting N separate session.run() calls to the libuv thread pool.
Under concurrency, these compete for threads and stack up.

Switch to classifyBatch() which pads all sentences into a single
batched tensor and runs one session.run() call. This reduces ONNX
calls from N-per-request to 1-per-request, eliminating stacking
under concurrent load.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hiskudin hiskudin force-pushed the feat(ENG-12518)/globalthis-session-cache branch from a36f52a to 53622f7 Compare March 31, 2026 10:57
@hiskudin hiskudin requested a review from Copilot March 31, 2026 11:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/core/prompt-defense.ts
Comment thread src/core/prompt-defense.ts
Comment thread src/core/prompt-defense.ts
Comment thread src/classifiers/tier2-classifier.ts
Comment thread src/classifiers/tier2-classifier.ts Outdated
@hiskudin hiskudin changed the title feat(ENG-12518): use globalThis for session cache to share across CJS/ESM fix(tier2): fix field filter and batch sentence classification Mar 31, 2026
@hiskudin hiskudin changed the title fix(tier2): fix field filter and batch sentence classification fix(ENG-12518): fix field filter and batch sentence classification Mar 31, 2026
…entence

- extractStrings: verify strings under non-matching keys are skipped,
  bare string input is collected, and riskyFieldNames fallback works
- classifyBySentence: verify maxSentence selection, sentenceScores
  alignment, skip reasons, and score parity with individual classify

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: Modifies core filtering and classification logic in a security-sensitive component (Prompt Defense). Such changes to input traversal and execution batching require human verification.

…cores

- extractStrings: handle bare array input consistently with bare string
  when tier2Fields is set (both collect directly via collectAll)
- classifyBySentence: guard against NaN/Infinity scores from batch
  classification with Number.isFinite check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/core/prompt-defense.ts">

<violation number="1" location="src/core/prompt-defense.ts:71">
P2: Arrays at the root now bypass field filtering and collect every string. If tool results are arrays of objects, tier2Fields is ignored and non-matching fields get classified again. Keep arrays in the filtered traversal path.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/core/prompt-defense.ts Outdated
Arrays of objects at the root need field filtering — collectAll would
bypass tier2Fields and classify every string. Only bare strings need
the early return; arrays are already handled by traverse().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: This PR modifies core security/filtering logic in the prompt defense system and introduces batch processing for ONNX classification, which requires human validation to ensure no bypasses.

@hiskudin hiskudin merged commit 878b542 into main Mar 31, 2026
3 checks passed
@hiskudin hiskudin deleted the feat(ENG-12518)/globalthis-session-cache branch March 31, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants