Skip to content

fix(ENG-12604): chunk batch classification to bound ONNX memory usage#44

Merged
hiskudin merged 1 commit intomainfrom
fix/ENG-12604-chunked-batch-classification
Apr 8, 2026
Merged

fix(ENG-12604): chunk batch classification to bound ONNX memory usage#44
hiskudin merged 1 commit intomainfrom
fix/ENG-12604-chunked-batch-classification

Conversation

@hiskudin
Copy link
Copy Markdown
Collaborator

@hiskudin hiskudin commented Apr 8, 2026

Summary

  • classifyBatch() now processes sentences in chunks of 32 instead of all at once
  • Prevents OOM in Lambda environments when scanning large list responses (e.g. 100+ notes)

Problem

When a list response (e.g. ats_list_notes with 100 items) is passed to defendToolResult(), extractStrings() collects all text and classifyBySentence() splits it into hundreds of sentences. classifyBatch() then allocated a single ONNX tensor for all sentences — the attention matrices scale as O(batch × seqLen²) in native memory (outside V8 heap), reaching several GB for large batches and crashing Lambda.

Fix

Split classifyBatch() into chunks of 32 sentences max. Each chunk runs a separate session.run() call with bounded memory (~50MB). Results are concatenated.

Changes

  • src/classifiers/onnx-classifier.tsclassifyBatch() now loops in chunks via classifyBatchChunk(); added MAX_BATCH_CHUNK = 32 constant
  • specs/onnx-classifier.spec.ts — added test with 40 texts to verify cross-chunk correctness

Test plan

  • Existing classifyBatch test passes (3 items, single chunk)
  • New test with 40 items passes (forces 2 chunks)
  • All tests pass
  • Memory stays bounded for large payloads

🤖 Generated with Claude Code


Summary by cubic

Chunked batch classification to bound ONNX native memory and prevent Lambda OOM on large payloads. Addresses ENG-12604 by processing texts in chunks of 32 per inference without changing outputs.

  • Bug Fixes
    • Process batches in chunks (max 32) via a new internal classifyBatchChunk(); concatenates results and caps memory per call (~50MB).
    • Added a spec with 40 texts to verify cross-chunk ordering and scores.

Written for commit 86e4a60. Summary will update on new commits.

classifyBatch() previously allocated a single tensor for all sentences,
causing O(N × seqLen²) native memory in ONNX attention layers. For large
payloads (e.g. 100-item list responses), this could reach several GB and
crash Lambda environments.

Now processes in chunks of 32 sentences max, capping native memory at
~50MB per inference call regardless of input size.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 8, 2026 18:20
@hiskudin hiskudin requested a review from a team as a code owner April 8, 2026 18:20
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Auto-approved: Safely introduces batch chunking to the ONNX classifier to prevent OOM issues. Includes unit tests for the new logic.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the ONNX-based prompt-injection classifier to avoid Lambda OOMs by bounding native (non-V8) memory usage during batch inference, especially when classifying hundreds of sentences extracted from large tool/list responses.

Changes:

  • Split OnnxClassifier.classifyBatch() into fixed-size chunks (max 32 texts per session.run()).
  • Added classifyBatchChunk() to run a single bounded ONNX inference call and concatenate results.
  • Added a test that classifies 40 texts to exercise cross-chunk behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/classifiers/onnx-classifier.ts Implements chunked batch inference to cap ONNX attention-matrix memory usage.
specs/onnx-classifier.spec.ts Adds a larger batch test to validate correctness across multiple chunks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

});

it('should handle batches larger than chunk size', async () => {
// arrange — 40 texts forces multiple chunks (MAX_BATCH_CHUNK = 32)
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test comment hard-codes MAX_BATCH_CHUNK = 32, but that constant is private to OnnxClassifier and could change later; the comment would then become inaccurate even if the test still passes. Consider making the comment value-agnostic (e.g., “40 texts exceeds the default chunk size”) or deriving the threshold from the implementation if you want to guarantee multi-chunk behavior.

Suggested change
// arrange — 40 texts forces multiple chunks (MAX_BATCH_CHUNK = 32)
// arrange — use enough texts to exceed the default chunk size

Copilot uses AI. Check for mistakes.
@hiskudin hiskudin merged commit 46e6548 into main Apr 8, 2026
9 checks passed
@hiskudin hiskudin deleted the fix/ENG-12604-chunked-batch-classification branch April 8, 2026 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants