fix(ENG-12604): chunk batch classification to bound ONNX memory usage by hiskudin · Pull Request #44 · StackOneHQ/defender

hiskudin · 2026-04-08T18:19:55Z

Summary

classifyBatch() now processes sentences in chunks of 32 instead of all at once
Prevents OOM in Lambda environments when scanning large list responses (e.g. 100+ notes)

Problem

When a list response (e.g. ats_list_notes with 100 items) is passed to defendToolResult(), extractStrings() collects all text and classifyBySentence() splits it into hundreds of sentences. classifyBatch() then allocated a single ONNX tensor for all sentences — the attention matrices scale as O(batch × seqLen²) in native memory (outside V8 heap), reaching several GB for large batches and crashing Lambda.

Fix

Split classifyBatch() into chunks of 32 sentences max. Each chunk runs a separate session.run() call with bounded memory (~50MB). Results are concatenated.

Changes

src/classifiers/onnx-classifier.ts — classifyBatch() now loops in chunks via classifyBatchChunk(); added MAX_BATCH_CHUNK = 32 constant
specs/onnx-classifier.spec.ts — added test with 40 texts to verify cross-chunk correctness

Test plan

Existing classifyBatch test passes (3 items, single chunk)
New test with 40 items passes (forces 2 chunks)
All tests pass
Memory stays bounded for large payloads

🤖 Generated with Claude Code

Summary by cubic

Chunked batch classification to bound ONNX native memory and prevent Lambda OOM on large payloads. Addresses ENG-12604 by processing texts in chunks of 32 per inference without changing outputs.

Bug Fixes
- Process batches in chunks (max 32) via a new internal classifyBatchChunk(); concatenates results and caps memory per call (~50MB).
- Added a spec with 40 texts to verify cross-chunk ordering and scores.

^{Written for commit 86e4a60. Summary will update on new commits.}

classifyBatch() previously allocated a single tensor for all sentences, causing O(N × seqLen²) native memory in ONNX attention layers. For large payloads (e.g. 100-item list responses), this could reach several GB and crash Lambda environments. Now processes in chunks of 32 sentences max, capping native memory at ~50MB per inference call regardless of input size. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cubic-dev-ai

No issues found across 2 files

_{Auto-approved: Safely introduces batch chunking to the ONNX classifier to prevent OOM issues. Includes unit tests for the new logic.}

Copilot

Pull request overview

This PR updates the ONNX-based prompt-injection classifier to avoid Lambda OOMs by bounding native (non-V8) memory usage during batch inference, especially when classifying hundreds of sentences extracted from large tool/list responses.

Changes:

Split OnnxClassifier.classifyBatch() into fixed-size chunks (max 32 texts per session.run()).
Added classifyBatchChunk() to run a single bounded ONNX inference call and concatenate results.
Added a test that classifies 40 texts to exercise cross-chunk behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/classifiers/onnx-classifier.ts`	Implements chunked batch inference to cap ONNX attention-matrix memory usage.
`specs/onnx-classifier.spec.ts`	Adds a larger batch test to validate correctness across multiple chunks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T18:23:12Z

  });

+  it('should handle batches larger than chunk size', async () => {
+    // arrange — 40 texts forces multiple chunks (MAX_BATCH_CHUNK = 32)


The test comment hard-codes MAX_BATCH_CHUNK = 32, but that constant is private to OnnxClassifier and could change later; the comment would then become inaccurate even if the test still passes. Consider making the comment value-agnostic (e.g., “40 texts exceeds the default chunk size”) or deriving the threshold from the implementation if you want to guarantee multi-chunk behavior.

Suggested change

// arrange — 40 texts forces multiple chunks (MAX_BATCH_CHUNK = 32)

// arrange — use enough texts to exceed the default chunk size

Copilot AI review requested due to automatic review settings April 8, 2026 18:20

hiskudin requested a review from a team as a code owner April 8, 2026 18:20

Copilot started reviewing on behalf of hiskudin April 8, 2026 18:20 View session

cubic-dev-ai Bot approved these changes Apr 8, 2026

View reviewed changes

Copilot AI reviewed Apr 8, 2026

View reviewed changes

elgordomac approved these changes Apr 8, 2026

View reviewed changes

hiskudin merged commit 46e6548 into main Apr 8, 2026
9 checks passed

hiskudin deleted the fix/ENG-12604-chunked-batch-classification branch April 8, 2026 19:45

stackone-devops-service-account mentioned this pull request Apr 8, 2026

chore(main): release defender 0.5.8 #45

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ENG-12604): chunk batch classification to bound ONNX memory usage#44

fix(ENG-12604): chunk batch classification to bound ONNX memory usage#44
hiskudin merged 1 commit intomainfrom
fix/ENG-12604-chunked-batch-classification

hiskudin commented Apr 8, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// arrange — 40 texts forces multiple chunks (MAX_BATCH_CHUNK = 32)
	// arrange — use enough texts to exceed the default chunk size

Conversation

hiskudin commented Apr 8, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Changes

Test plan

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hiskudin commented Apr 8, 2026 •

edited by cubic-dev-ai Bot

Loading