You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@OpenJarvisAI didn't really get the issue, is the training failing due to different first dim shapes per input (because it doesn't reflect true batch size)?
We had issues with multi-gpu training of qwen-vl where the batches wouldn't get split correctly per gpu (#33666). Are you training with distributed multiple GPUs as well?
Am using multiple-GPUs but just one machine with 8 GPUs.
My issues is that, each GPU mem usage it not very balanced, this is a common issue when training MLLM models, however, for LLaVA like models, since each image token is fixed, this can be pre-calculated before iteration all dataset.
When dealing with QwenVL, since the input is dynamic and image input size is not known, when training with a Lazy dataset loader (which is now a common usage), this caused GPU mem hard to balanced, and since each GPU mem is not balanced, it could be easily OOM if one GPU had more heavier load.
AM thinkking is there a way to solve this issue, but not pre-calculated each image dimension (this could be cubersome if images are a lot)
Feature request
HI, currently training Qwen2.5 VL with peft, the dataloader using lazy load.
But since Qwen2.5 VL uses dynamic input size, each training sample actually will have very divesity training length (very image tokens are not same)
This make training very tricky, the GPU mem usage extremly imbalanced.
Motivation
Is there a way to support it?
Your contribution
Is there a way to support it?
The text was updated successfully, but these errors were encountered: