Skip to content

fix(embedding): set kv_unified=True when embedding=True to enable batch processing#2217

Merged
abetlen merged 2 commits into
abetlen:mainfrom
SanjanaB123:fix/kv-unified-batch-embeddings
May 13, 2026
Merged

fix(embedding): set kv_unified=True when embedding=True to enable batch processing#2217
abetlen merged 2 commits into
abetlen:mainfrom
SanjanaB123:fix/kv-unified-batch-embeddings

Conversation

@SanjanaB123
Copy link
Copy Markdown
Contributor

Fixes #2216

Problem

When using embedding=True, n_seq_max is set to 256 but kv_unified
remains False (the default). This causes n_ctx_seq to be computed
as n_ctx / n_seq_max = 256 tokens per sequence, making batch
embeddings essentially unusable for inputs longer than 256 tokens.

Fix

Set self.context_params.kv_unified = True inside the if embedding:
block, which allows each sequence to use the full context window
instead of n_ctx / n_seq_max.

Testing

Verified that both single and batch embeddings work correctly after
the fix:

  • Single embedding: ✅ works, returns 384-dim vector
  • Batch embedding with long inputs: ✅ works, returns correct embeddings

@abetlen abetlen merged commit 95ccb19 into abetlen:main May 13, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need to set kv_unified=True to enable batch processing

2 participants