-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError on using multi gpu even on using ddp #2515
Comments
Hi! thanks for your contribution!, great first issue! |
you should use setup, not prepare_data. |
@awaelchli shouldn't dataset be initialized in prepare_data only. If I have some process in dataset init that takes some time then it will be done again seperately in all the devices which is I guess not required because the sampler handles moving batches to different devices AFAIK and dataset should be initialized only once and should be done in prepare_data. |
The poster is getting the AttributeError because they assign attributes to self, but since prepare_data is only called once per node (or optionally once over all nodes), then some of the subprocesses have a model without these attributes, leading to the error. Therefore I suggest to assign these attributes in setup instead. Whether or not this is right for the use case of op is of course a different question.
If these cases do not apply to you, you can always preprocess the data outside the LightningModule and pass your dataloaders as arguments to Trainer.fit(). |
Thanks @awaelchli . This seems to have solved the problem. Getting a new error now "RuntimeError: Tensors must be CUDA and dense". Not sure if they are related. Let me know if you have any suggestions! |
If an attribute is attached to the model in |
I think prepare data is simply only executed on rank 0. The models don't get copied. In ddp, the script gets launched multiple times. |
For TPU, I don't know how it is. Let's ask @williamFalcon |
🐛 Bug
I am getting attribute not found error on using multi gpu. The code works fine on using a single gpu. I am also using ddp as suggested. Here's a traceback.
Code sample
Environment
The text was updated successfully, but these errors were encountered: