Need to set kv_unified=True to enable batch processing

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code (0.3.23). Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Trying to run embedding with a batch of inputs a la #2199 . This fails if the batch is longer than 256 tokens.
The problem is that currently we use `Llama.context_params.kv_unified = False` (the default), which according to https://github.com/ggml-org/llama.cpp/blob/master/src/llama-context.cpp#L183-L197 leads to `n_ctx_seq` being computed as `n_ctx / n_seq_max` leading to `n_ctx_seq` (length per individual input sequence) = 256 😥 (also reported as 256 in the verbose info from the context initialisation)

# Current Behavior

`n_seq_max` is fixed at 256 (as of #2206 included in 0.3.23). This leads to `n_ctx_seq` being VERY small. -> embedding with batches is essentially unusable, because inputs are rarelly shorter than 256 tokens.

# Failure Logs

`decode: n_ctx is not divisible by n_seq_max - rounding down to XXX`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to set kv_unified=True to enable batch processing #2216

Prerequisites

Expected Behavior

Current Behavior

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Need to set kv_unified=True to enable batch processing #2216

Description

Prerequisites

Expected Behavior

Current Behavior

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions