Skip to content

multi-gpu training triggers CUDA out of memory error #2456

@griff4692

Description

@griff4692

Hi -

I am running into issues when going from single to multi-gpu training. Specifically, if I switch the line

pl.Trainer(gpus=1, precision=16, distributed_backend='ddp')

to

pl.Trainer(gpus=4, precision=16, distributed_backend='ddp')

I get the dreaded CUDA out of memory error. Is there any reason why the parallelism causes the GPU to receive more data?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedOpen to be worked onpriority: 0High priority task

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions