multi GPU training？ #22

PangzeCheung · 2020-04-15T16:54:17Z

I set the gpu_ids 2,3, but the program only runs on the GPU 2. Could you please tell me is the code support for multi GPU training? Thank you!

RenYurui · 2020-04-16T08:15:13Z

You can use torch.nn.DataParallel to train the model using multi-GPU. See here

Specifically, if you want to train the pose-guided person image generation task, you can modify the "__ init __" function in pose_model.py. Add

self.net_G = torch.nn.DataParallel(self.net_G, device_ids=self.gpu_ids)
self.net_D = torch.nn.DataParallel(self.net_D, device_ids=self.gpu_ids)

RenYurui · 2020-04-16T08:18:26Z

Currently, only the face animation model supports multi-GPU training.
We will update the code soon.
Thanks for asking.

PangzeCheung · 2020-04-16T14:12:40Z

@RenYurui Thank you very much!

BhavanJ · 2020-09-16T07:56:30Z

Hi @RenYurui,

Nice work!
It seems like even after I use the DataParallel in pose_flownet using below command, the model still uses a single GPU.
self.net_G = torch.nn.DataParallel(self.net_G, device_ids=self.gpu_ids)

It seems like all data is only loaded on first GPU in your code as show below:

            self.input_P1 = input_P1.cuda(self.gpu_ids[0], async=True)
            self.input_BP1 = input_BP1.cuda(self.gpu_ids[0], async=True)
            self.input_P2 = input_P2.cuda(self.gpu_ids[0], async=True)
            self.input_BP2 = input_BP2.cuda(self.gpu_ids[0], async=True)

I tried to replace above with just .cuda() but still I am not able to spread batch data across multiple GPU's and first GPU is running out of memory when I uses larger batch size. Is it the case that your custom built CUDA operations don't support multiple GPUs?

Thanks,
Bhavan

PangzeCheung closed this as completed Apr 18, 2020

Zhangjinso mentioned this issue Mar 28, 2021

How to train with two GPUs? Zhangjinso/PISE#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi GPU training？ #22

multi GPU training？ #22

PangzeCheung commented Apr 15, 2020 •

edited

RenYurui commented Apr 16, 2020

RenYurui commented Apr 16, 2020

PangzeCheung commented Apr 16, 2020

BhavanJ commented Sep 16, 2020 •

edited

multi GPU training？ #22

multi GPU training？ #22

Comments

PangzeCheung commented Apr 15, 2020 • edited

RenYurui commented Apr 16, 2020

RenYurui commented Apr 16, 2020

PangzeCheung commented Apr 16, 2020

BhavanJ commented Sep 16, 2020 • edited

PangzeCheung commented Apr 15, 2020 •

edited

BhavanJ commented Sep 16, 2020 •

edited