New issue

Jump to bottom

[feat] Ollama concurrent requests #82

Open

pkarw opened this issue Jan 13, 2025 · 1 comment

Contributor

pkarw commented Jan 13, 2025

Ollama by default doesn't support concurrent requests. We need to work on it as it's pretty huge bottleneck for now.

More info: https://www.reddit.com/r/LocalLLaMA/comments/1dt5n6l/ollama_now_runs_inference_concurrently_by_default/?rdt=57766

Maybe we'll need to migrate to vllm - https://github.com/vllm-project/vllm

Contributor Author

pkarw commented Jan 13, 2025

... also this gets higher priority: #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment