Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate models when using pytorch lightning with ddp accelerator #292

Closed
tankeco opened this issue Jan 21, 2021 · 3 comments
Closed

duplicate models when using pytorch lightning with ddp accelerator #292

tankeco opened this issue Jan 21, 2021 · 3 comments

Comments

@tankeco
Copy link
Contributor

tankeco commented Jan 21, 2021

When using pytorch lightning with ddp accelerator on 4 gpus, I find every checkpoint is recorded 4 times on web UI with different ids. One is on default_output_uri (s3://...) and three are on file:///... .

4

Note pytorch lightning automatically ensures that the model is saved only on the main process: https://pytorch-lightning.readthedocs.io/en/stable/weights_loading.html#manual-saving-with-accelerators

@bmartinn
Copy link
Member

Hi @tankeco
Could you maybe test the fix? it's already part of the latest RC

pip install clearml==0.17.5rc2 

@tankeco
Copy link
Contributor Author

tankeco commented Jan 25, 2021

it fixed :)

@bmartinn
Copy link
Member

Great news @tankeco :) closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants