-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Gemma3 Offline Batch Inference: Attempted to assign XXX multimodal tokens to YYY placeholders #14897
Comments
Does this only happen for the 27b model? |
@DarkLight1337 The same issue occurs in 12b as well. |
Yes, I suppose it happens in all sizes. |
Does this happen on V0 ( |
@DarkLight1337 this error only happens to V1 |
@BiEchi Would it be possible to share a reproducible example with inputs? |
@WoosukKwon Observing the same with NVLM in V1. |
@WoosukKwon Also observing this with NVLM in V0 when using |
Chunked prefill in general is not supported for multi-modal models in V0. Did you also set |
@DarkLight1337 I was trying with |
Chunked prefill is supported for multi-modal models in V1. Can you show the prompt which you are using? |
@DarkLight1337 It seems to be non-deterministic. I think it depends on the batch of requests. Running the individually failing query separately does not result in any issues. So I am guess it has something to do with concurrency/batch size. Not sure what pointers I can provide to help reproduce it? My test case has a large number of requests that I go through with some degree of concurrency so it's a bit tricky to get the exact failing batch again. |
I am not changing the batch size and upon launch, I see the following log: Does it make sense to increase this? Default for NVLM is |
Can you try out #14980 and see if it can solve the problem? |
Sure, I can try today. Is that expected to help with NVLM too? Just making sure it's not just for Gemma3. |
No, that PR only fixes Gemma3 |
Hmm -- I am running into a lot of these errors with NVLM as well on vLLM |
@DarkLight1337 Gemma3 works fine so far on your bugfix branch. |
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
@WoosukKwon This error is likely a processor-related error.
The error happens for both llm.chat() and llm.generate(). It says
Attempted to assign XXX multimodal tokens to YYY placeholders
. This error only happens when there are image inputs, but is arbitrary to image (i.e. it remains when replacing images with other images). This error happens only whenlen(messages)>=32
, i.e. if I input messages individually forlen(messages)
times or using a mini-batched version, it does not raise an error.Minimum reproduction example:
Error:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: