Skip to content

Training Qwen2.5 VL with dynamic image size using more balanced Sampler for each GPU mem usage #37914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
OpenJarvisAI opened this issue May 1, 2025 · 2 comments
Labels
Feature request Request for a new feature

Comments

@OpenJarvisAI
Copy link

Feature request

HI, currently training Qwen2.5 VL with peft, the dataloader using lazy load.

But since Qwen2.5 VL uses dynamic input size, each training sample actually will have very divesity training length (very image tokens are not same)

This make training very tricky, the GPU mem usage extremly imbalanced.

Motivation

Is there a way to support it?

Your contribution

Is there a way to support it?

@OpenJarvisAI OpenJarvisAI added the Feature request Request for a new feature label May 1, 2025
@zucchini-nlp
Copy link
Member

@OpenJarvisAI didn't really get the issue, is the training failing due to different first dim shapes per input (because it doesn't reflect true batch size)?

We had issues with multi-gpu training of qwen-vl where the batches wouldn't get split correctly per gpu (#33666). Are you training with distributed multiple GPUs as well?

@OpenJarvisAI
Copy link
Author

OpenJarvisAI commented May 4, 2025

Am using multiple-GPUs but just one machine with 8 GPUs.

My issues is that, each GPU mem usage it not very balanced, this is a common issue when training MLLM models, however, for LLaVA like models, since each image token is fixed, this can be pre-calculated before iteration all dataset.

When dealing with QwenVL, since the input is dynamic and image input size is not known, when training with a Lazy dataset loader (which is now a common usage), this caused GPU mem hard to balanced, and since each GPU mem is not balanced, it could be easily OOM if one GPU had more heavier load.

AM thinkking is there a way to solve this issue, but not pre-calculated each image dimension (this could be cubersome if images are a lot)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants