During training, the loss value goes up and down and cannot converge, is that normal? Besides, what should the final loss value looks like? #49

yoyololicon · 2018-11-28T14:43:25Z

I implemented a waveglow model in my own project. The codes are almost the same as this repo with some modifications:

Upsample the mel-spectrogram to number of groups, so n_mel_channels in WN reduce to 80.
Change logdet() in invertable1x1 to det().abs().log() as pretrained model which can resume training #35 did because the first few runs I did the loss became nan after thousands of steps.

The n_channels is 256 so the model size is 4~5 times smaller than original. I run model on 2 1080ti using nn.DataParallel with batch size of 8. After about 5k steps the loss is around -6 ~ -7 and I can hear some speech-like sentences from the model outputs. Then the loss value started to go up and down, even over zero, and cannot go any further. I add 24 flows in the model, doesn't help; 32 batch size, the problem still exist. Maybe more steps it will become better, but after 70k steps still cannot see any improvement. Did anyone have similar problems?

I also want to ask about the final loss as reference. In my case, -11 is the smallest value I can get, then the aforementioned problem happened. In #5 @azraelkuan can get a -18 loss value at 56k, is it the normal loss value?

jiqizaisikao · 2018-11-28T15:00:52Z

I trained 170 k，but the loss is still around -5

azraelkuan · 2018-11-28T15:41:55Z

the -18 is not correct, i have fix it, the reason is that i use the wrong audio value.
in my experience, the loss should be -6~-7 at the final.

yoyololicon · 2018-11-28T15:48:11Z

@azraelkuan Thank you for clearing things up!
@jiqizaisikao Did you get good result? When the loss value start to change dramatically my model can only produce noise and spikes.

rafaelvalle · 2018-11-28T16:58:20Z

The fact that you were able to train after taking the absolute of the determinant suggests that your learning rate was too high. Given that we initialize the determinants to be positive, the determinant's crossing between positive and negative determinant suggests that during the optimization process we step over infinite error, at determinant 0, which is bad.

yoyololicon · 2018-11-29T01:29:36Z

The fact that you were able to train after taking the absolute of the determinant suggests that your learning rate was too high. Given that we initialize the determinants to be positive, the determinant's crossing between positive and negative determinant suggests that during the optimization process we step over infinite error, at determinant 0, which is bad.

Thanks for the advice! Maybe 1e-4 is too high for smaller model. I'll try decrease it.

Here is my training curve, very unstable, but it might be caused by too large learning rate as well.

jiqizaisikao · 2018-11-29T01:31:08Z

not very good,echo result,it seems that the pitch and amplitude is not convergent ,so it sounds like echo voice,but i think more train will be better.
I used learning rate 5e-5

yoyololicon · 2018-11-29T07:37:33Z

I re-trained the model with 5e-5 learning, and after 10k steps the loss start to jump around again.
I feel like I'm wasting my time tuning the parameters...
If anyone know what's happening or how to solve this situation please let me know.

yoyololicon · 2018-11-29T14:17:31Z

@rafaelvalle I still get nan loss after change back to logdet(). I check the log_det_W_total in the loss function, and found it approached to some very negative number which means the determinant approached to zero when training. So how did you prevent it to cross zero value? A smaller learning rate doesn't help for me.

I have tried using 1e-5 as learning rate, but the progress is too slow, after 8k steps the loss is still around -3.5.

rafaelvalle · 2018-11-29T15:35:32Z

We trained the model for 540k iterations with bath size 24. Rushing and increasing the learning rate is probably not in your interest.
I would suggest making sure your data does not have all silence samples and using a small learning rate.

triwoods · 2018-11-29T19:53:20Z

@rafaelvalle Thanks for the implementation. As you mentioned, on LJSpeech Dataset, with batch size of 24, it needs 540k iterations to reach good performance, I assume you use 10 V100 from the paper, how many days does it need to train 540k iterations? For people that only have single 1080 Ti, which can only fit batch size of 1, does it mean I need roughly 540k * 24 = 12960k iterations (a few months of training time) to make it perform the same?

yoyololicon · 2018-11-30T02:31:02Z

@rafaelvalle yeah probably you are right. But seems everyone are fine with 1e-4 learning rate, it's weird that it only happens to me. And in my experience a too large learning rate would not cause these kinds of crazy loss curve. I might have missed something but I have no clue.

Anyway, I uploaded my implementations on github here. If anyone can help me find out the solution I will be very aprreciated.

yoyololicon · 2018-11-30T09:06:57Z

Found out this might caused by the implementation of loss function. I was using python sum() to sum log_det_W_list and log_s_list. After cumulatively adding the determinants and log_s in the forward pass, the model suddenly works. The training process is stable now, I should notice this earlier.

jcao-ai · 2018-11-30T09:29:44Z

@yoyololicon Sounds interesting. I have read your PR before. But do you know why it leads to the instability?

yoyololicon · 2018-11-30T12:06:58Z

@yoyololicon Sounds interesting. I have read your PR before. But do you know why it leads to the instability?

Actually I don't know. In this link it said using sum() to combine multiple loss should not have a problem.
https://discuss.pytorch.org/t/how-to-combine-multiple-criterions-to-a-loss-function/348

yxt132 · 2019-01-30T02:32:36Z

Found out this might caused by the implementation of loss function. I was using python sum() to sum log_det_W_list and log_s_list. After cumulatively adding the determinants and log_s in the forward pass, the model suddenly works. The training process is stable now, I should notice this earlier.

what learning rate did you end up using eventually? What's your final loss and number of iterations when you get good results?

rafaelvalle closed this as completed Dec 4, 2018

yoyololicon mentioned this issue Dec 5, 2018

Inference time 3 times slower than real-time on single GTX 1080ti #54

Closed

jaywalnut310 mentioned this issue Jun 16, 2020

Log det Jacobian is wrong in Inv1x1Conv jaywalnut310/glow-tts#17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

During training, the loss value goes up and down and cannot converge, is that normal? Besides, what should the final loss value looks like? #49

During training, the loss value goes up and down and cannot converge, is that normal? Besides, what should the final loss value looks like? #49

yoyololicon commented Nov 28, 2018

jiqizaisikao commented Nov 28, 2018

azraelkuan commented Nov 28, 2018

yoyololicon commented Nov 28, 2018 •

edited

rafaelvalle commented Nov 28, 2018

yoyololicon commented Nov 29, 2018

jiqizaisikao commented Nov 29, 2018 •

edited

yoyololicon commented Nov 29, 2018

yoyololicon commented Nov 29, 2018

rafaelvalle commented Nov 29, 2018

triwoods commented Nov 29, 2018 •

edited

yoyololicon commented Nov 30, 2018

yoyololicon commented Nov 30, 2018

jcao-ai commented Nov 30, 2018

yoyololicon commented Nov 30, 2018

yxt132 commented Jan 30, 2019

During training, the loss value goes up and down and cannot converge, is that normal? Besides, what should the final loss value looks like? #49

During training, the loss value goes up and down and cannot converge, is that normal? Besides, what should the final loss value looks like? #49

Comments

yoyololicon commented Nov 28, 2018

jiqizaisikao commented Nov 28, 2018

azraelkuan commented Nov 28, 2018

yoyololicon commented Nov 28, 2018 • edited

rafaelvalle commented Nov 28, 2018

yoyololicon commented Nov 29, 2018

jiqizaisikao commented Nov 29, 2018 • edited

yoyololicon commented Nov 29, 2018

yoyololicon commented Nov 29, 2018

rafaelvalle commented Nov 29, 2018

triwoods commented Nov 29, 2018 • edited

yoyololicon commented Nov 30, 2018

yoyololicon commented Nov 30, 2018

jcao-ai commented Nov 30, 2018

yoyololicon commented Nov 30, 2018

yxt132 commented Jan 30, 2019

yoyololicon commented Nov 28, 2018 •

edited

jiqizaisikao commented Nov 29, 2018 •

edited

triwoods commented Nov 29, 2018 •

edited