-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Description
Your current environment
Hi there,
First of all, thank you for all the hard work on vllm—it’s an excellent project!
I am currently exploring the use of torch.compile within vllm to optimize inference performance. I have seen that many decoder-only models (such as GPT-series and LLaMA) work well with torch.compile. However, I am particularly interested in using the Qwen2-VL model and could not find any documentation or discussion regarding torch.compile support for it.
Could you please clarify the following:
Is Qwen2-VL currently supported with torch.compile in the latest version of vllm?
If not, are there any plans to add support for Qwen2-VL with torch.compile in the near future?
Are there any known workarounds or tips for using torch.compile with multi-modal models like Qwen2-VL?
Any guidance or insights would be greatly appreciated!
Thank you for your time and assistance.
How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.