Skip to content

Is the server can be run with multiple states? #257

Answered by abetlen
alexeyche asked this question in Q&A
Discussion options

You must be logged in to vote

@alexeyche that's correct that the object can only process a single request at a time. llama.cpp doesn't yet support batching requests so there's no real way to make this possible until that happens. The alternative "solution" that I'm working on is to allow users to load multiple models at the same time, but this will take twice the amount of RAM and likely be quite slow.

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@alexeyche
Comment options

Answer selected by alexeyche
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants