server: process prompt fairly accross slots

**Note: This issue was copied from [https://github.com/ggml-org/llama.cpp/issues/6607](https://github.com/ggml-org/llama.cpp/issues/6607)**

**Original Author:** @phymbert
**Original Issue Number:** #6607
**Created:** 2024-04-11T10:23:54Z

---

### Context

At the moment we implement a FIFO approach to batch prompt tokens. So if a large prompt is to be processed it blocks all other slots.

Proposal: implement a fair batch usage of prompt processing accross all pending slots.

References:
- https://github.com/ggerganov/llama.cpp/issues/4216#issuecomment-2043558080
- https://github.com/ggerganov/llama.cpp/issues/5851#issuecomment-1975120585


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: process prompt fairly accross slots #273

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: process prompt fairly accross slots #273

Description

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions