Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Never Starts in Single GPU machine (Solution) #26

Closed
TheSeriousProgrammer opened this issue Aug 16, 2022 · 1 comment
Closed

Comments

@TheSeriousProgrammer
Copy link

Hi , I dont use much pytorch-lightning or performed distributed training (noob here),
but found that the training never starts on a single gpu single node configuration

The solution which I found out for the same was to modify the num_nodes parameter in your train configuration to 1
If number is greater that 1 pytorch lightning waits for the other nodes I presume

It took a lot of time for me to get it right , putting it out for fellow noobs : )

Thanks for sharing such an incredible work to the community !!!

@gwkrsrch
Copy link
Collaborator

Hi @TheSeriousProgrammer, thanks for sharing your experience!
d3f759b will help to prevent such misunderstanding. Thanks a lot :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants