uneven GPU memory caused by multi-gpu training #5

lqzhao · 2021-12-02T08:56:42Z

Hi, Alex,
Thanks for your nice work. I'm facing the problem of uneven GPU memories when training the model with multiple GPUs. It costs much more memory on GPU#0 than others. I think the main reason is that DataParallel can only compute losses on GPU#0. Would you give some advice to balance the GPU memory? Thanks in advance.

alexklwong · 2021-12-02T09:34:50Z

Strange, I never had that problem before. What sort of GPUs are you using? What about batch size? Is the imbalance causing an issue?

It could be related to this:
https://discuss.pytorch.org/t/dataparallel-imbalanced-memory-usage/22551/9

But then moving loss computation into the nn.Module doesn't quite make logical sense (since loss requires multiple images and their relative pose to compute) and also add overhead.

lqzhao · 2021-12-02T09:46:28Z

I use four 2080Ti GPUs with a batch size of 24. It's a nonnegligible issue when using a large batch size. Please see this.

alexklwong · 2021-12-02T10:43:35Z

I think if you can also do something like this in your bash:

export CUDA_VISIBLE_DEVICES=0,1

if you want to use a smaller batch size and it should allocate them onto a second GPU.

lqzhao · 2021-12-02T13:18:01Z

Thanks, you mean like this? export CUDA_VISIBLE_DEVICES=0,1; bash bash/kitti/train_kbnet_kitti.sh
I tried this but it didn't work...

lqzhao · 2021-12-02T14:41:35Z

I separated the function of computing_loss as a Module and Dataparallel it. When I train with the same settings, the GPU memories are like this:

It seems to be slightly alleviated...But not very promising.

alexklwong · 2021-12-08T20:59:21Z

I did some digging. I think this is the nature of PyTorch where it replicates anything that is on data parallel across GPUs. But the first GPU is still the ``master'' so it needs to hold optimizer, parameters, any operation that is not parallelized.

alexklwong · 2021-12-08T21:01:03Z

Thanks, you mean like this? export CUDA_VISIBLE_DEVICES=0,1; bash bash/kitti/train_kbnet_kitti.sh I tried this but it didn't work...

As for the above, I think you'll need to replace the export statement in the bash file
https://github.com/alexklwong/calibrated-backprojection-network/blob/master/bash/kitti/train_kbnet_kitti.sh#L3

lqzhao · 2021-12-09T08:51:39Z

I did some digging. I think this is the nature of PyTorch where it replicates anything that is on data parallel across GPUs. But the first GPU is still the ``master'' so it needs to hold optimizer, parameters, any operation that is not parallelized.

Thanks for your reply. I also found that Dataparallel would cause very low training efficiency due to the cross-GPU interactions. Thus I use 2 GPUs to keep the balance between efficiency and batch size. I can easily reproduce the results of your paper. Thanks again.

alexklwong · 2021-12-28T18:37:09Z

Great, thanks, closing this issue

alexklwong closed this as completed Dec 28, 2021

alexklwong mentioned this issue Dec 29, 2021

Question about RuntimeError: inverse_cuda: For batch 0: U( , ) is zero, singular U. #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uneven GPU memory caused by multi-gpu training #5

uneven GPU memory caused by multi-gpu training #5

lqzhao commented Dec 2, 2021

alexklwong commented Dec 2, 2021

lqzhao commented Dec 2, 2021

alexklwong commented Dec 2, 2021

lqzhao commented Dec 2, 2021

lqzhao commented Dec 2, 2021

alexklwong commented Dec 8, 2021

alexklwong commented Dec 8, 2021

lqzhao commented Dec 9, 2021

alexklwong commented Dec 28, 2021

uneven GPU memory caused by multi-gpu training #5

uneven GPU memory caused by multi-gpu training #5

Comments

lqzhao commented Dec 2, 2021

alexklwong commented Dec 2, 2021

lqzhao commented Dec 2, 2021

alexklwong commented Dec 2, 2021

lqzhao commented Dec 2, 2021

lqzhao commented Dec 2, 2021

alexklwong commented Dec 8, 2021

alexklwong commented Dec 8, 2021

lqzhao commented Dec 9, 2021

alexklwong commented Dec 28, 2021