Training Qwen2.5 VL with dynamic image size using more balanced Sampler for each GPU mem usage #37914

OpenJarvisAI · 2025-05-01T11:34:46Z

Feature request

HI, currently training Qwen2.5 VL with peft, the dataloader using lazy load.

But since Qwen2.5 VL uses dynamic input size, each training sample actually will have very divesity training length (very image tokens are not same)

This make training very tricky, the GPU mem usage extremly imbalanced.

Motivation

Is there a way to support it?

Your contribution

Is there a way to support it?

zucchini-nlp · 2025-05-01T12:46:15Z

@OpenJarvisAI didn't really get the issue, is the training failing due to different first dim shapes per input (because it doesn't reflect true batch size)?

We had issues with multi-gpu training of qwen-vl where the batches wouldn't get split correctly per gpu (#33666). Are you training with distributed multiple GPUs as well?

OpenJarvisAI · 2025-05-04T03:35:51Z

Am using multiple-GPUs but just one machine with 8 GPUs.

My issues is that, each GPU mem usage it not very balanced, this is a common issue when training MLLM models, however, for LLaVA like models, since each image token is fixed, this can be pre-calculated before iteration all dataset.

When dealing with QwenVL, since the input is dynamic and image input size is not known, when training with a Lazy dataset loader (which is now a common usage), this caused GPU mem hard to balanced, and since each GPU mem is not balanced, it could be easily OOM if one GPU had more heavier load.

AM thinkking is there a way to solve this issue, but not pre-calculated each image dimension (this could be cubersome if images are a lot)

OpenJarvisAI added the Feature request label May 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training Qwen2.5 VL with dynamic image size using more balanced Sampler for each GPU mem usage #37914

Training Qwen2.5 VL with dynamic image size using more balanced Sampler for each GPU mem usage #37914

OpenJarvisAI commented May 1, 2025

zucchini-nlp commented May 1, 2025

Uh oh!

OpenJarvisAI commented May 4, 2025 •

edited

Loading

Uh oh!

Training Qwen2.5 VL with dynamic image size using more balanced Sampler for each GPU mem usage #37914

Training Qwen2.5 VL with dynamic image size using more balanced Sampler for each GPU mem usage #37914

Comments

OpenJarvisAI commented May 1, 2025

Feature request

Motivation

Your contribution

zucchini-nlp commented May 1, 2025

Uh oh!

OpenJarvisAI commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OpenJarvisAI commented May 4, 2025 •

edited

Loading