fix(embedding): set kv_unified=True when embedding=True to enable batch processing by SanjanaB123 · Pull Request #2217 · abetlen/llama-cpp-python

SanjanaB123 · 2026-05-13T18:19:29Z

Problem

When using embedding=True, n_seq_max is set to 256 but kv_unified
remains False (the default). This causes n_ctx_seq to be computed
as n_ctx / n_seq_max = 256 tokens per sequence, making batch
embeddings essentially unusable for inputs longer than 256 tokens.

Fix

Set self.context_params.kv_unified = True inside the if embedding:
block, which allows each sequence to use the full context window
instead of n_ctx / n_seq_max.

Testing

Verified that both single and batch embeddings work correctly after
the fix:

Single embedding: ✅ works, returns 384-dim vector
Batch embedding with long inputs: ✅ works, returns correct embeddings

…ch processing

SanjanaB123 and others added 2 commits May 13, 2026 14:17

fix(embedding): set kv_unified=True when embedding=True to enable bat…

90cb787

…ch processing

chore: update changelog for batch embedding fix

7fee871

abetlen merged commit 95ccb19 into abetlen:main May 13, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(embedding): set kv_unified=True when embedding=True to enable batch processing#2217

fix(embedding): set kv_unified=True when embedding=True to enable batch processing#2217
abetlen merged 2 commits into
abetlen:mainfrom
SanjanaB123:fix/kv-unified-batch-embeddings

SanjanaB123 commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SanjanaB123 commented May 13, 2026

Problem

Fix

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants