Distributed Training failing. #649
Unanswered
luthes
asked this question in
General Q&A
Replies: 1 comment 4 replies
-
try |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Seem to be having an issue getting GPUs to working, these are V100s on AWS.
Commands:
Error:
Config:
With a single GPU on the same machine with 8, I don't seem to have any issues. Setting the environment variable to 0, even with 8 GPUs, it starts training, or at least got passed here. On my machine with a single GPU, I don't have any issues with a
python training_tacotron2.py
config (generic tacotron2 json config I translated). I'm thinking I'm missing something, or maybe something's not documented in the Wiki.Beta Was this translation helpful? Give feedback.
All reactions