-
Notifications
You must be signed in to change notification settings - Fork 29.1k
Why can't InternVL3-8B start vLLM after being converted to the Hugging Face format? It shows the error: `ValueError: 'limit_mm_per_prompt' is only supported for multimodal models.' #38000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hm, I don't think the converted weights would be compatible with vLLM yet. Usually vLLM implements the multimodal models themselves using the official checkpoints. Support for transformers style multimodal model is planned already and we'll try to add it soon |
Hello, thank you for your response. Could you please advise if there’s any way to convert a Hugging Face (HF) format model back to its original format? My training task only generated the model in HF format. Looking forward to your reply! |
Ah, you are using a custom tuned model. I don't think we have an easy way to do reverse conversion, unless you can write your own conversion script Thought I will cc @hmellor as well, not sure if there's way to run converted models in vLLM without using transformers backend |
I can't find any mention of needing to convert this model for it to be compatible with vLLM. You should be able to use it direcltly. I think the problem is that this is a custom model, and you're not using This is necessary to get the custom processor from the Hub repo. I've found a document showing that InternVL is an example of a model with a custom processor in https://docs.vllm.ai/en/latest/contributing/model/multimodal.html#custom-hf-processor. |
Btw not sure if that can help, but there's already a converted checkpoint on the hub for the native Transformers InternVL3-8B: https://huggingface.co/OpenGVLab/InternVL3-8B-hf |
In vLLM CI we use https://hf.co/OpenGVLab/InternVL2-1B, so I think unconverted is ok |
I use the following command to start vllm, i can start successfully.
but when i push a request, it has an error, it seems limit-mm-per-prompt param does not work, how to fix it?
|
This is a vLLM problem, not a Transformers problem. Let's discuss in vLLM. |
@FloSophorae Don't know if this help, but I've managed to convet a Internvl3-1B-hf model back to the original OpenGVLab/InternVL3-1B format via the following script:
|
System Info
vllm 0.8.5.post1
transformers 4.52.0.dev0
Who can help?
@amyeroberts
@qubvel
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
I downloaded the model from OpenGVLab/InternVL3-8B, which natively supports running OpenAI-style chat completions with vLLM. However, after converting it to the Hugging Face format using the script
transformers/src/transformers/models/internvl/convert_internvl_weights_to_hf.py
, launching vLLM resulted in the error:ValueError: 'limit_mm_per_prompt' is only supported for multimodal models.
The command I used to launch vllm is as follows:
The system runs correctly when I set MODEL_PATH to the original OpenGVLab/InternVL3-8B address. But throws an error when I change the path to the converted InternVL-3B-hf format :
ValueError: 'limit_mm_per_prompt' is only supported for multimodal models.
Could someone explain why this is happening and suggest solutions?
Thank you very much!
The text was updated successfully, but these errors were encountered: