Skip to content

fix: retry with token-level truncation on ONNX OOM in embedding worker#457

Merged
BYK merged 1 commit into
mainfrom
fix-onnx-oom-retry
May 22, 2026
Merged

fix: retry with token-level truncation on ONNX OOM in embedding worker#457
BYK merged 1 commit into
mainfrom
fix-onnx-oom-retry

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 22, 2026

Summary

Fixes ONNX Runtime OOM errors (e.g. 284432024) that occur when embedding dense-token content like code, base64, CJK text, or JSON with short keys, where the chars-per-token ratio is much lower than the ~4 assumed for English prose.

Problem

The existing character-level pre-truncation (LOCAL_MAX_CHARS = 16384, assuming ~4 chars/token) can produce 6000-8000+ tokens for dense content — well above the ~4096 safe threshold for ONNX inference. The FeatureExtractionPipeline in transformers.js hardcodes { truncation: true } without forwarding max_length, so its built-in truncation caps at the model max (8192 tokens), which still OOMs.

Fix

Instead of aggressively pre-truncating every request, the worker now:

  1. Tries inference at full length first — normal English text passes through untouched
  2. On OOM, catches the error using the existing isOomError() detector
  3. Retries with token-level truncation using the pipeline's actual tokenizer (encode → slice → decode), progressively halving the limit: full → 4096 → 2048 → 1024 tokens

This preserves maximum semantic content for normal texts while adaptively handling dense-token edge cases.

Changes

  • Extracts runInference() helper from processEmbed() for retry-ability
  • Adds truncateTexts() that uses the real tokenizer for exact content-token counting (add_special_tokens: false to exclude [CLS]/[SEP])
  • Stashes a reference to pipe.tokenizer during pipeline init
  • Adds OOM retry loop (1 initial attempt + 3 truncated retries) with halved token limits
  • Logs console.warn on each retry for observability
  • Updates LOCAL_MAX_CHARS comment to reference the new worker-level defense

@BYK BYK self-assigned this May 22, 2026
@BYK BYK force-pushed the fix-onnx-oom-retry branch from c34c862 to 11cb536 Compare May 22, 2026 19:04
@BYK BYK force-pushed the fix-onnx-oom-retry branch from 11cb536 to c412285 Compare May 22, 2026 19:06
@BYK BYK merged commit 9cd94ec into main May 22, 2026
7 checks passed
@BYK BYK deleted the fix-onnx-oom-retry branch May 22, 2026 19:09
@craft-deployer craft-deployer Bot mentioned this pull request May 22, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant