Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uneven GPU memory caused by multi-gpu training #5

Closed
lqzhao opened this issue Dec 2, 2021 · 9 comments
Closed

uneven GPU memory caused by multi-gpu training #5

lqzhao opened this issue Dec 2, 2021 · 9 comments

Comments

@lqzhao
Copy link

lqzhao commented Dec 2, 2021

Hi, Alex,
Thanks for your nice work. I'm facing the problem of uneven GPU memories when training the model with multiple GPUs. It costs much more memory on GPU#0 than others. I think the main reason is that DataParallel can only compute losses on GPU#0. Would you give some advice to balance the GPU memory? Thanks in advance.

@alexklwong
Copy link
Owner

Strange, I never had that problem before. What sort of GPUs are you using? What about batch size? Is the imbalance causing an issue?

It could be related to this:
https://discuss.pytorch.org/t/dataparallel-imbalanced-memory-usage/22551/9

But then moving loss computation into the nn.Module doesn't quite make logical sense (since loss requires multiple images and their relative pose to compute) and also add overhead.

@lqzhao
Copy link
Author

lqzhao commented Dec 2, 2021

I use four 2080Ti GPUs with a batch size of 24. It's a nonnegligible issue when using a large batch size. Please see this.
image

@alexklwong
Copy link
Owner

I think if you can also do something like this in your bash:

export CUDA_VISIBLE_DEVICES=0,1

if you want to use a smaller batch size and it should allocate them onto a second GPU.

@lqzhao
Copy link
Author

lqzhao commented Dec 2, 2021

Thanks, you mean like this? export CUDA_VISIBLE_DEVICES=0,1; bash bash/kitti/train_kbnet_kitti.sh
I tried this but it didn't work...

@lqzhao
Copy link
Author

lqzhao commented Dec 2, 2021

I separated the function of computing_loss as a Module and Dataparallel it. When I train with the same settings, the GPU memories are like this:
image
It seems to be slightly alleviated...But not very promising.

@alexklwong
Copy link
Owner

I did some digging. I think this is the nature of PyTorch where it replicates anything that is on data parallel across GPUs. But the first GPU is still the ``master'' so it needs to hold optimizer, parameters, any operation that is not parallelized.

@alexklwong
Copy link
Owner

Thanks, you mean like this? export CUDA_VISIBLE_DEVICES=0,1; bash bash/kitti/train_kbnet_kitti.sh I tried this but it didn't work...

As for the above, I think you'll need to replace the export statement in the bash file
https://github.com/alexklwong/calibrated-backprojection-network/blob/master/bash/kitti/train_kbnet_kitti.sh#L3

@lqzhao
Copy link
Author

lqzhao commented Dec 9, 2021

I did some digging. I think this is the nature of PyTorch where it replicates anything that is on data parallel across GPUs. But the first GPU is still the ``master'' so it needs to hold optimizer, parameters, any operation that is not parallelized.

Thanks for your reply. I also found that Dataparallel would cause very low training efficiency due to the cross-GPU interactions. Thus I use 2 GPUs to keep the balance between efficiency and batch size. I can easily reproduce the results of your paper. Thanks again.

@alexklwong
Copy link
Owner

Great, thanks, closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants