-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while saving the model in multi gpu scenario using torch.save #4120
Comments
this is on 0.9. mind upgrading to 1.0? |
In 1.0, #4114 - this error is getting reproduced in both cpu and multi gpu version.
|
Instead of https://pytorch-lightning.readthedocs.io/en/latest/weights_loading.html#manual-saving I'll continue to investigate why torch.save doesn't work as intended, but I'm getting more verbose output:
This is also the same track as what @rohitgr7 is on in #4114 |
@SeanNaren Thank you very much for the alternative suggestion. We are using In ddp - multi gpu scenario, the same error - |
馃悰 Bug
In pytorch lightning - 0.9, while saving the model using
torch.save
in multi gpu scenario (DDP) results in error.To Reproduce
Run the script in multi gpu machine to reproduce the error.
However, running the same script in cpu works fine (model is getting saved).
Expected behavior
When running with multiple gpus, model file should be dumped as
model.pth
Environment
Additional context
Stack Trace:
The issue is very specific to ddp environment as the script works fine with cpu
Tried the same classification example with pytorch 1.6 multi gpu scenario, the model is getting saved successfully.
The text was updated successfully, but these errors were encountered: