Skip to content

How many times does the pl.LightningDataModule.setup() runs in DDP #11642

@zixiliuUSC

Description

@zixiliuUSC

Discussed in #9251

Originally posted by jatentaki September 1, 2021
I have two questions regarding the behavior of DataLoaders when multi-gpu training with ddp/ddp_spawn. Let me first define that I use "GPU worker" to mean the process using each of the N GPUs for model forward/backward and "data worker" to mean the processes created by torch.data.utils.DataLoader to load and preprocess batches.

  1. How many data workers are there per GPU worker? I see that with ddp the dataset is being recreated for each GPU worker but the total number of data workers seems to be constant: does each GPU worker get its share of N_total_data_workers / N_gpu_workers? Is this documented somewhere?
  2. I have a pipeline where the data workers actually use some GPU functionality (render some synthetic data via OpenGL) and I need to specify which GPU they should use. How can I figure out which GPU worker a data worker belongs to, such that I can load balance that rendering across GPUs?

Supposed I run ddp with 4 gpus, and I instancialize my dataset object and then prepare the dataset in setup() function like below.

    def setup(self, stage=None):
        # Assign train/val datasets for use in dataloaders
        instancialize = self.instancialize()
        instancialize.get_dataset()
        if stage == 'fit' or stage is None:
            self.trainset = instancialize.dataset['train']
            self.valset = instancialize.dataset['test']
        # Assign test dataset for use in dataloader(s)
        if stage == 'test' or stage is None:
            self.testset = instancialize.dataset['test']

I notice that the dataset is setting up for 4 times with ddp from printed contents in instancialize.get_dataset(). From my understand of ddp, we get a big batch of data each time from a dataset and distribute them equally to each of gpus. After all gpus compute the loss, we compute the mean of loss of all gpus and each model in each gpu update this loss. Here it is reasonable to have multiple dataloader, but multiple copies of dataset seems not reasonable and may harm the training. So, am I doing anything wrong in my code or having a misunderstanding of ddp? Appretiate for your answer, thx~

cc @carmocca @awaelchli @Borda @ananthsub @ninginthecloud @jjenniferdai @rohitgr7 @justusschock @kaushikb11 @akihironitta

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions