a multi-gpu bug #136

DingJuPeng1 · 2022-04-21T03:51:00Z

If I use the batchbins in espnet, It will trigger a multi-GPU bug.
for example, if I use two GPUs, and the final batch_size is 61, and I use data parallel, it will divide into 30, 31,
when I thy to use torch-audiomentations, it will trigger a bug as follow.

whether The batch size of each card must be the same or there can be other solutions to avoid this bug.

looking forward to a reply

iver56 · 2022-04-21T06:43:06Z

Could you provide a snippet of code that reproduces the problem? If possible, make one that doesn't need a multi-gpu setup to reproduce it, as I don't have such a setup available at the moment

DingJuPeng1 · 2022-04-21T07:25:42Z

I apologize for not being particularly easy to get a small piece of reproducible code, because this code is based on espnet which is wrapped.
when I use single Gpu to run same code, It is normal, but when I use two GPU with dataParallel it will trigger this problem, it seems like it mixes the two sample size in different batches than lead to shape dismatch.
This problem will occur when I run this code:

"self.transforms" include "ApplyImpulseResponse" and "AddBackgroundNoise"

iver56 · 2022-04-21T08:13:37Z

Ok, but without a code example and a multi-gpu setup I won't be able to reproduce the bug at the moment.

Does this bug apply to all transforms, or just ApplyImpulseResponse and/or AddBackgroundNoise?

Is there a way you can work around it? Like always give it a batch size that is divisible by your number of GPUs? A fixed batch size should do the trick.

Or would you like to make a PR that fixes the bug?

Or maybe I should slap a known limitation on readme that says multi-GPU with "uneven" batch sizes isn't officially supported?

DingJuPeng1 · 2022-04-21T08:47:41Z

When I Use batch_sizes which can be divible by the number of GPUs, it can work normally. It seems like occur when there are uneven batch sizes in different GPUs. If you can't reproduce the bug and fix that, I think I can have a try to fix this bug by myself at first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a multi-gpu bug #136

a multi-gpu bug #136

DingJuPeng1 commented Apr 21, 2022

iver56 commented Apr 21, 2022

DingJuPeng1 commented Apr 21, 2022

iver56 commented Apr 21, 2022

DingJuPeng1 commented Apr 21, 2022

a multi-gpu bug #136

a multi-gpu bug #136

Comments

DingJuPeng1 commented Apr 21, 2022

iver56 commented Apr 21, 2022

DingJuPeng1 commented Apr 21, 2022

iver56 commented Apr 21, 2022

DingJuPeng1 commented Apr 21, 2022