How many times does the pl.LightningDataModule.setup() runs in DDP

### Discussed in https://github.com/PyTorchLightning/pytorch-lightning/discussions/9251

<div type='discussions-op-text'>

<sup>Originally posted by **jatentaki** September  1, 2021</sup>
I have two questions regarding the behavior of `DataLoader`s when multi-gpu training with `ddp`/`ddp_spawn`. Let me first define that I use "GPU worker" to mean the process using each of the N GPUs for model forward/backward and "data worker" to mean the processes created by `torch.data.utils.DataLoader` to load and preprocess batches.
1. How many data workers are there per GPU worker? I see that with `ddp` the dataset is being recreated for each GPU worker but the total number of data workers seems to be constant: does each GPU worker get its share of N_total_data_workers / N_gpu_workers? Is this documented somewhere?
2. I have a pipeline where the data workers actually use some GPU functionality (render some synthetic data via OpenGL) and I need to specify which GPU they should use. How can I figure out which GPU worker a data worker belongs to, such that I can load balance that rendering across GPUs?</div>

Supposed I run ddp with 4 gpus, and I instancialize my dataset object and then prepare the dataset in setup() function like below. 
```
    def setup(self, stage=None):
        # Assign train/val datasets for use in dataloaders
        instancialize = self.instancialize()
        instancialize.get_dataset()
        if stage == 'fit' or stage is None:
            self.trainset = instancialize.dataset['train']
            self.valset = instancialize.dataset['test']
        # Assign test dataset for use in dataloader(s)
        if stage == 'test' or stage is None:
            self.testset = instancialize.dataset['test']
```
I notice that the dataset is setting up for 4 times with ddp from printed contents in `instancialize.get_dataset()`. From my understand of ddp, we get a big batch of data each time from a dataset and distribute them equally to each of gpus. After all gpus compute the loss, we compute the mean of loss of all gpus and each model in each gpu update this loss. Here it is reasonable to have multiple dataloader, but multiple copies of dataset seems not reasonable and may harm the training. So, am I doing anything wrong in my code or having a misunderstanding of ddp? Appretiate for your answer, thx~

cc @carmocca @awaelchli @borda @ananthsub @ninginthecloud @jjenniferdai @rohitgr7 @justusschock @kaushikb11 @akihironitta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How many times does the pl.LightningDataModule.setup() runs in DDP #11642

Discussed in #9251

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How many times does the pl.LightningDataModule.setup() runs in DDP #11642

Description

Discussed in #9251

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions