New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot train with multi GPUs #13
Comments
Pull from master and try again with FP16 enabled and disabled. |
Hi, rafaelvalle. I tried and it seems to be stuck here for a long time.
|
Try with |
Hi, rafaelvalle. I also got a same error. After a few minutes, below log appeared and it seemed to stop.
I checked my netword status with |
@n5-suzuki : for multi-gpu you should be running multiproc. |
I got the same error. It's the same problem for tacotron-pytorch.So sad! |
I think we can learn from this project and see how it is done to synthesis music rather than running this project. So I manually close this for lack of activity. If anybody has a solution, welcome to reopen and share it below. |
I clone the repository to my local server, then start to train on my own dataset.
I can run with one GPU, and the logs are as follwing:
But when I run with multi-GPUs, life becomes difficult for me.
The first problem is "apply_gradient_allreduce is not defined" error. OK, that's easy to fix, I just import it from distributed.
The coming problem is that the training seems to stop at the "Done initializing distributed", no more logs is printed further.
Can you fix this ? Thank you !
The text was updated successfully, but these errors were encountered: