-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Potential bug in Qwen 2/2.5 VL Image Preprocessor #38003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@ritwickchaudhry correct! There was another issue with PR somewhere (#37350), probably got stale. I have forgot about it due to low priority. Would you like to open a PR for this? LMK if you can't contribute, I can finalize and merge the existing PR later next week :) |
Thanks @zucchini-nlp ! Sure, let me create a PR soon! |
Hi @ritwickchaudhry and @zucchini-nlp, I've also encountered this issue and have implemented the fix based on the discussion here. |
@anshulsc I'll be releasing the PR very soon, as I finished most of it. Thanks for the offer though! |
@ritwickchaudhry great !! |
Done actually! @zucchini-nlp can you please review the PR: #38076 |
transformers/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py
Line 278 in 5c47d08
The
temporal_patch_size
is used to group consecutive video frames. However, if the number of frames are not divisible, then the last frame is repeated. The current number of repetitions istemporal_patch_size - 1
. While this will work fortemporal_patch_size = 2
but it wouldn't work for larger patch sizes.In my opinion, the code should be modified to:
The text was updated successfully, but these errors were encountered: