multi-gpu training triggers CUDA out of memory error

Hi - 

I am running into issues when going from single to multi-gpu training.  Specifically, if I switch the line

`pl.Trainer(gpus=1, precision=16, distributed_backend='ddp')`

to

`pl.Trainer(gpus=4, precision=16, distributed_backend='ddp')`

I get the dreaded CUDA out of memory error.  Is there any reason why the parallelism causes the GPU to receive more data?