feature: Add vLLM engine for generation tasks #3981

Matthieu-Tinycoaching · 2023-06-23T11:31:14Z

Feature request

As it is done with Triton Inference server, it would be great to integrate vLLM (https://github.com/vllm-project/vllm) as a higly optimized engine for LLM generation based on continuous batching.

Motivation

No response

Other

No response

aarnphm · 2023-06-23T12:18:31Z

I think this is more suited for OpenLLM, rather than bentoml itself.

aarnphm · 2023-06-23T14:06:40Z

Let's track the decision and progress there. I don't think it is suited for BentoML yet.

Matthieu-Tinycoaching mentioned this issue Jun 23, 2023

feat: Add vLLM engine for generation tasks bentoml/OpenLLM#55

Closed

aarnphm closed this as completed Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Add vLLM engine for generation tasks #3981

feature: Add vLLM engine for generation tasks #3981

Matthieu-Tinycoaching commented Jun 23, 2023

aarnphm commented Jun 23, 2023

aarnphm commented Jun 23, 2023

feature: Add vLLM engine for generation tasks #3981

feature: Add vLLM engine for generation tasks #3981

Comments

Matthieu-Tinycoaching commented Jun 23, 2023

Feature request

Motivation

Other

aarnphm commented Jun 23, 2023

aarnphm commented Jun 23, 2023