fix: prevent OOM in embedding backfill with long texts by BYK · Pull Request #289 · BYK/loreai

BYK · 2026-05-13T15:42:35Z

Summary

Add truncation: true to the transformers.js pipeline call, capping individual texts at the model's max token length
Reduce BACKFILL_CHUNK_SIZE from 32 to 8, limiting peak tensor size when long texts are batched
Detect OOM errors (opaque ONNX numeric codes like 287180544) and return human-friendly messages with batch diagnostics
Upgrade backfill error logging from log.info to log.error so failures are captured by Sentry

Root Cause

Nomic v1.5 pads ALL texts in a batch to the longest sequence length. A batch of 24 texts where one has 1383 tokens creates a [24, 1383] input tensor — the resulting intermediate activations cause a ~287 MB allocation failure in onnxruntime. The error was reported as an opaque numeric code "287180544" with no human-readable context.

Fix Details

Change	File	Effect
`truncation: true`	`embedding-worker.ts:170`	Caps each text at model max (8192 tokens), preventing any single oversized input
`BACKFILL_CHUNK_SIZE = 8`	`embedding.ts:1097`	Limits batch×sequence tensor size; worst case `[8, 8192]` vs previous `[32, 8192]`
`isOomError()` helper	`embedding-worker.ts:166`	Detects numeric-only error codes (≥6 digits) and OOM patterns, wraps in descriptive message
`log.error` in catch blocks	`embedding.ts:1140,1197`	Sends to Sentry via `captureException` (was `log.info` which skips Sentry)

Nomic v1.5 pads all texts in a batch to the longest sequence length. A batch of 24 texts with one 1383-token text creates a [24, 1383] tensor, causing a ~287 MB allocation failure in onnxruntime. Three fixes: - Add truncation: true to pipeline call, capping individual texts at the model's max length (8192 tokens) - Reduce BACKFILL_CHUNK_SIZE from 32 to 8, limiting peak tensor size - Detect OOM errors (opaque numeric codes like '287180544') and return a human-friendly message; upgrade backfill catch blocks to log.error so failures are captured by Sentry

## Summary Follow-up to PR #289 — fixed batch size of 8 still OOM'd on long distillation observations (4476+ chars, ~1119 tokens). Replaces fixed `BACKFILL_CHUNK_SIZE` with adaptive `nextBatch()` that caps total tensor area. ## Root Cause ONNX runtime pads all texts in a batch to the longest sequence. With `BACKFILL_CHUNK_SIZE=8`, a batch containing one 4476-char distillation observation (~1119 tokens) creates a `[8, 1119]` tensor — still too large. The error from PR #289's OOM detection confirmed it: `ONNX runtime out of memory (batch=8, longest≈4476 chars)`. ## Fix Replace fixed chunk size with **token-budget batching** via `nextBatch()`: - Estimates token count per text (~4 chars/token for WordPiece) - Caps total batch "area" (`batch_size × max_tokens_in_batch`) at `MAX_BATCH_TOKEN_AREA = 4096` - Still respects `MAX_BACKFILL_CHUNK = 8` for priority queue interleaving | Input | Batch size | Area | Fits? | |---|---|---|---| | 8 × 300-token knowledge entries | 8 | 2400 | yes | | 1 × 1119-token distillation + 7 short | 1-3 | ≤4096 | yes | | 1 × 2000-token long distillation | 1 | 2000 | yes (solo) | Both knowledge and distillation backfill loops updated to use `nextBatch()`.

BYK merged commit fee878b into main May 13, 2026
7 checks passed

BYK deleted the fix/embedding-oom branch May 13, 2026 15:45

BYK mentioned this pull request May 13, 2026

fix: use token-budget batching to prevent OOM on long texts #290

Merged

This was referenced May 13, 2026

publish: BYK/loreai@0.18.0 #294

Closed

publish: BYK/loreai@0.18.0 #296

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent OOM in embedding backfill with long texts#289

fix: prevent OOM in embedding backfill with long texts#289
BYK merged 1 commit into
mainfrom
fix/embedding-oom

BYK commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented May 13, 2026

Summary

Root Cause

Fix Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant