Skip to content

fix: pre-truncate long texts to prevent ONNX OOM and report embedding errors to Sentry#343

Merged
BYK merged 1 commit into
mainfrom
fix/embedding-oom-sentry
May 15, 2026
Merged

fix: pre-truncate long texts to prevent ONNX OOM and report embedding errors to Sentry#343
BYK merged 1 commit into
mainfrom
fix/embedding-oom-sentry

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 15, 2026

Summary

  • Pre-truncate texts to ~4096 tokens (LOCAL_MAX_CHARS=16384 chars) in LocalProvider.embed() before sending to the ONNX worker, preventing OOM on single inputs near the model's 8192-token max (error code 284432024)
  • Upgrade 7 embedding error paths from log.info (or no logging) to log.error so failures reach Sentry via captureException

Problem

ONNX OOM: A single text tokenized to 8192 tokens (the model's max sequence length) caused ONNX runtime allocation failure. nextBatch() always includes at least 1 item regardless of MAX_BATCH_TOKEN_AREA, so the budget guard was bypassed. The worker's truncation: true caps at the model max, but that's already too large for ONNX to allocate.

Silent Sentry: Embedding errors were invisible in Sentry because:

  • pipeline.ts:591 — top-level backfill catch used log.info
  • embedding.ts:335 — worker init failure used log.info
  • embedding.ts:853,873,898 — fire-and-forget catches used log.info
  • embedding.ts:351-359,361-373 — worker crash/exit handlers had no logging at all

Only log.error() calls sink?.captureException(err) via the Sentry bridge.

Changes

File Change
packages/core/src/embedding.ts Add LOCAL_MAX_CHARS constant and pre-truncation in LocalProvider.embed()
packages/core/src/embedding.ts Upgrade init-error, crash, exit handlers to log.error
packages/core/src/embedding.ts Upgrade fire-and-forget catches to log.error
packages/gateway/src/pipeline.ts Upgrade backfill catch to log.error

… errors to Sentry

Pre-truncate texts to ~4096 tokens (LOCAL_MAX_CHARS=16384) before sending
to the ONNX worker. The Nomic v1.5 model supports 8192 tokens max, but
ONNX runtime OOMs on inputs near that ceiling (error codes 284432024,
287180544, 144786472). nextBatch() always includes at least 1 item, so the
MAX_BATCH_TOKEN_AREA guard was bypassed for single long texts.

Upgrade embedding error reporting from log.info to log.error so failures
reach Sentry via captureException:
- Worker init-error handler (embedding.ts)
- Worker crash/exit handlers (embedding.ts, previously had no logging)
- Fire-and-forget embedding catches for knowledge/distillation/temporal
- Top-level startup backfill catch (pipeline.ts)
@BYK BYK merged commit e853edd into main May 15, 2026
7 checks passed
@BYK BYK deleted the fix/embedding-oom-sentry branch May 15, 2026 15:03
BYK added a commit that referenced this pull request May 15, 2026
…air truncation (#344)

## Summary

Follow-up to #343. Addresses Sentry noise and a surrogate pair edge case
found during self-review.

- Add `isAvailable()` guard to fire-and-forget embedding functions to
short-circuit when provider is broken
- Break backfill loops on `LocalProviderUnavailableError` to avoid
O(batches) Sentry events per startup
- Extract `safeLocalTruncate()` helper that avoids splitting UTF-16
surrogate pairs at the truncation boundary

## Problem

PR #343 upgraded `log.info` to `log.error` in fire-and-forget embedding
catches (`embedKnowledgeEntry`, `embedDistillation`,
`embedTemporalMessage`). But when the local provider is broken, **every
single call** to these functions would throw and fire `log.error` →
`captureException()` — potentially 50-200+ Sentry events per session.

Similarly, the backfill loops would retry every batch even after the
first one fails with `LocalProviderUnavailableError`, producing
O(items/batchSize) Sentry events on startup.

The `String.slice()` truncation could also split a UTF-16 surrogate pair
(emoji, CJK supplementary chars), producing an invalid lone surrogate
passed to the tokenizer.

## Changes

| Location | Fix |
|---|---|
| `embedKnowledgeEntry()` | Add `if (!isAvailable()) return;` early exit
|
| `embedDistillation()` | Add `if (!isAvailable()) return;` early exit |
| `embedTemporalMessage()` | Add `if (!isAvailable()) return;` early
exit |
| `backfillEmbeddings()` catch | `break` on
`LocalProviderUnavailableError` |
| `backfillDistillationEmbeddings()` catch | `break` on
`LocalProviderUnavailableError` |
| `LocalProvider.embed()` | Use `safeLocalTruncate()` helper instead of
raw `slice()` |
This was referenced May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant