AttributeError on using multi gpu even on using ddp #2515

hadarishav · 2020-07-05T16:46:07Z

🐛 Bug

I am getting attribute not found error on using multi gpu. The code works fine on using a single gpu. I am also using ddp as suggested. Here's a traceback.

Traceback (most recent call last):
  File "/home/sohigre/STL/stl_bert_trial_lightning.py", line 245, in <module>
    trainer.fit(model)
  File "/home/sohigre/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 952, in fit
    self.ddp_train(task, model)
  File "/home/sohigre/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_data_parallel.py", line 500, in ddp_train
    self.optimizers, self.lr_schedulers, self.optimizer_frequencies = self.init_optimizers(model)
  File "/home/sohigre/anaconda3/lib/python3.7/site-packages/pytorch_lightning/trainer/optimizers.py", line 18, in init_optimizers
    optim_conf = model.configure_optimizers()
  File "/home/sohigre/STL/stl_bert_trial_lightning.py", line 178, in configure_optimizers
    total_steps = len(self.train_dataloader()) * self.max_epochs
  File "/home/sohigre/STL/stl_bert_trial_lightning.py", line 109, in train_dataloader
    return DataLoader(self.ds_train, batch_size=self.batch_size,num_workers=4)
  File "/home/sohigre/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'Abuse_lightning' object has no attribute 'ds_train'

Code sample

  def prepare_data(self):
    # download only (not called on every GPU, just the root GPU per node)
    df = pd.read_csv(self.filename)
    self.df_train, self.df_test = train_test_split(df, test_size=0.2, random_state=self.RANDOM_SEED)
    self.df_val, self.df_test = train_test_split(self.df_test, test_size=0.5, random_state=self.RANDOM_SEED)
    self.tokenizer = BertTokenizer.from_pretrained(self.PRE_TRAINED_MODEL_NAME)
    self.ds_train = AbuseDataset(reviews=self.df_train.comment.to_numpy(), targets=self.df_train.Score.to_numpy(),
                          tokenizer=self.tokenizer,max_len=self.max_len)
    self.ds_val = AbuseDataset(reviews=self.df_val.comment.to_numpy(), targets=self.df_val.Score.to_numpy(),
                          tokenizer=self.tokenizer,max_len=self.max_len)
    self.ds_test = AbuseDataset(reviews=self.df_test.comment.to_numpy(), targets=self.df_test.Score.to_numpy(),
                          tokenizer=self.tokenizer,max_len=self.max_len)

  @pl.data_loader
  def train_dataloader(self):
    
    return DataLoader(self.ds_train, batch_size=self.batch_size,num_workers=4)

Environment

CUDA:
- GPU:
  - GeForce GTX 1080 Ti
  - GeForce GTX 1080 Ti
  - GeForce GTX 1080 Ti
  - GeForce GTX 1080 Ti
- available: True
- version: 10.1
Packages:
- numpy: 1.18.1
- pyTorch_debug: False
- pyTorch_version: 1.5.1
- pytorch-lightning: 0.8.4
- tensorboard: 2.2.2
- tqdm: 4.42.1
System:
- OS: Linux
- architecture:
  - 64bit
- processor:
- python: 3.7.6
- version: Proposal for help #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07)

The text was updated successfully, but these errors were encountered:

github-actions · 2020-07-05T16:47:01Z

Hi! thanks for your contribution!, great first issue!

awaelchli · 2020-07-05T21:51:23Z

you should use setup, not prepare_data.
https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.core.hooks.html#pytorch_lightning.core.hooks.ModelHooks.setup

rohitgr7 · 2020-07-06T09:39:23Z

@awaelchli shouldn't dataset be initialized in prepare_data only. If I have some process in dataset init that takes some time then it will be done again seperately in all the devices which is I guess not required because the sampler handles moving batches to different devices AFAIK and dataset should be initialized only once and should be done in prepare_data.

awaelchli · 2020-07-06T10:46:22Z

The poster is getting the AttributeError because they assign attributes to self, but since prepare_data is only called once per node (or optionally once over all nodes), then some of the subprocesses have a model without these attributes, leading to the error. Therefore I suggest to assign these attributes in setup instead.

Whether or not this is right for the use case of op is of course a different question.
I interpret the docs as follows:
Split your preprocessing pipeline into

prepare_data: to download and preprocess a dataset (once per node, but not per gpu)
setup: read necessary attributes from disk, get file handles etc. and save it as attribute to self (per process/GPU)

If these cases do not apply to you, you can always preprocess the data outside the LightningModule and pass your dataloaders as arguments to Trainer.fit().

hadarishav · 2020-07-06T11:32:22Z

Thanks @awaelchli . This seems to have solved the problem. Getting a new error now "RuntimeError: Tensors must be CUDA and dense". Not sure if they are related. Let me know if you have any suggestions!

rohitgr7 · 2020-07-07T20:45:43Z

@awaelchli

since prepare_data is only called once per node

then some of the subprocesses have a model without these attributes, leading to the error.

If an attribute is attached to the model in prepare_data within each node and then we just copy this model on all the devices(DDP), why wouldn't this attribute be copied too? I was trying this thing with tpu/8 cores(works similar to DDP) and it doesn't throw any Attribute error.

awaelchli · 2020-07-08T00:00:06Z

I think prepare data is simply only executed on rank 0. The models don't get copied. In ddp, the script gets launched multiple times.
This is how I understand it.

awaelchli · 2020-07-08T00:01:13Z

For TPU, I don't know how it is. Let's ask @williamFalcon

hadarishav added bug Something isn't working help wanted Open to be worked on labels Jul 5, 2020

hadarishav closed this as completed Jul 6, 2020

zhuangdf mentioned this issue Nov 30, 2021

Multiple GPU training support apchenstu/mvsnerf#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError on using multi gpu even on using ddp #2515

AttributeError on using multi gpu even on using ddp #2515

hadarishav commented Jul 5, 2020 •

edited by Borda

Loading

github-actions bot commented Jul 5, 2020

awaelchli commented Jul 5, 2020

rohitgr7 commented Jul 6, 2020

awaelchli commented Jul 6, 2020 •

edited

Loading

hadarishav commented Jul 6, 2020 •

edited

Loading

rohitgr7 commented Jul 7, 2020

awaelchli commented Jul 8, 2020

awaelchli commented Jul 8, 2020

AttributeError on using multi gpu even on using ddp #2515

AttributeError on using multi gpu even on using ddp #2515

Comments

hadarishav commented Jul 5, 2020 • edited by Borda Loading

🐛 Bug

Code sample

Environment

github-actions bot commented Jul 5, 2020

awaelchli commented Jul 5, 2020

rohitgr7 commented Jul 6, 2020

awaelchli commented Jul 6, 2020 • edited Loading

hadarishav commented Jul 6, 2020 • edited Loading

rohitgr7 commented Jul 7, 2020

awaelchli commented Jul 8, 2020

awaelchli commented Jul 8, 2020

hadarishav commented Jul 5, 2020 •

edited by Borda

Loading

awaelchli commented Jul 6, 2020 •

edited

Loading

hadarishav commented Jul 6, 2020 •

edited

Loading