New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
During training, the loss value goes up and down and cannot converge, is that normal? Besides, what should the final loss value looks like? #49
Comments
I trained 170 k,but the loss is still around -5 |
the -18 is not correct, i have fix it, the reason is that i use the wrong audio value. |
@azraelkuan Thank you for clearing things up! |
The fact that you were able to train after taking the absolute of the determinant suggests that your learning rate was too high. Given that we initialize the determinants to be positive, the determinant's crossing between positive and negative determinant suggests that during the optimization process we step over infinite error, at determinant 0, which is bad. |
not very good,echo result,it seems that the pitch and amplitude is not convergent ,so it sounds like echo voice,but i think more train will be better. |
@rafaelvalle I still get nan loss after change back to logdet(). I check the log_det_W_total in the loss function, and found it approached to some very negative number which means the determinant approached to zero when training. So how did you prevent it to cross zero value? A smaller learning rate doesn't help for me. I have tried using 1e-5 as learning rate, but the progress is too slow, after 8k steps the loss is still around -3.5. |
We trained the model for 540k iterations with bath size 24. Rushing and increasing the learning rate is probably not in your interest. |
@rafaelvalle Thanks for the implementation. As you mentioned, on LJSpeech Dataset, with batch size of 24, it needs 540k iterations to reach good performance, I assume you use 10 V100 from the paper, how many days does it need to train 540k iterations? For people that only have single 1080 Ti, which can only fit batch size of 1, does it mean I need roughly 540k * 24 = 12960k iterations (a few months of training time) to make it perform the same? |
@rafaelvalle yeah probably you are right. But seems everyone are fine with 1e-4 learning rate, it's weird that it only happens to me. And in my experience a too large learning rate would not cause these kinds of crazy loss curve. I might have missed something but I have no clue. Anyway, I uploaded my implementations on github here. If anyone can help me find out the solution I will be very aprreciated. |
@yoyololicon Sounds interesting. I have read your PR before. But do you know why it leads to the instability? |
Actually I don't know. In this link it said using sum() to combine multiple loss should not have a problem. |
I implemented a waveglow model in my own project. The codes are almost the same as this repo with some modifications:
The n_channels is 256 so the model size is 4~5 times smaller than original. I run model on 2 1080ti using nn.DataParallel with batch size of 8. After about 5k steps the loss is around -6 ~ -7 and I can hear some speech-like sentences from the model outputs. Then the loss value started to go up and down, even over zero, and cannot go any further. I add 24 flows in the model, doesn't help; 32 batch size, the problem still exist. Maybe more steps it will become better, but after 70k steps still cannot see any improvement. Did anyone have similar problems?
I also want to ask about the final loss as reference. In my case, -11 is the smallest value I can get, then the aforementioned problem happened. In #5 @azraelkuan can get a -18 loss value at 56k, is it the normal loss value?
The text was updated successfully, but these errors were encountered: