Skip to content

server: process prompt fairly accross slots #273

@jakexcosme

Description

@jakexcosme

Note: This issue was copied from ggml-org#6607

Original Author: @phymbert
Original Issue Number: ggml-org#6607
Created: 2024-04-11T10:23:54Z


Context

At the moment we implement a FIFO approach to batch prompt tokens. So if a large prompt is to be processed it blocks all other slots.

Proposal: implement a fair batch usage of prompt processing accross all pending slots.

References:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions