Why is the 'training' parameter of dropout in Prenet set to True? #247

jjl1994 · 2019-08-01T13:33:52Z

In the code of Prenet

def forward(self, x):
    for linear in self.layers:
        x = F.dropout(F.relu(linear(x)), p=0.5, training=True)
    return x

Why is 'training=True'? Shouldn't it be 'training=self.training'? Does that mean we apply dropout when inference? I changed this to 'training=self.training' and the pre-trained model is unable to generate correct audio.

The text was updated successfully, but these errors were encountered:

yuxinyuan · 2019-08-02T02:21:55Z

This is mentioned in the original Tacotron2 paper.

In order to introduce output variation at inference time, dropout with probability 0.5 is applied only to layers in the pre-net of the autoregressive decoder.

The code just follows the specification.

jjl1994 · 2019-08-02T04:57:05Z

@yuxinyuan Hi, I noticed that this is mentioned in Tacotron2 paper: 'to introduce output variation at inference time'. But why the model output noise after I set this to False when inference? I don't understand why the model can only work with dropout set to True.

jjl1994 · 2019-08-02T05:05:40Z

@yuxinyuan When I'm testing with the pre-trained model, even if I set training=True and change the drop rate to some other value, for example, 0.3. The model also generates noise. That's weird. The pre-trained model only works with training=True and droprate=0.5.

Yeongtae · 2019-08-02T05:09:55Z

@jjl1994
Model have been learned from half information(dropout(0.5)) of previous mel.

Full information(dropout(0.0)) of previous mel make the decoder hard to correct prediction. Since Full information is too much for prenet that consist of (fc, dropout(0.5))×2.

jjl1994 · 2019-08-02T05:26:58Z

@Yeongtae I think the framework should have already automatically ×2 if the dropout is set to true because if the framework don't (fc, dropout(0.5))×2, it will not get the correct loss and update the network. This mechanism has been mentioned in the course from Fei-Fei Li. Also If model only learned from half information. We should always use dropout when inference. But actually we don't do that. We only use drop during training(maybe in 99% of the case).

Yeongtae · 2019-08-02T05:45:02Z

If u set droprate to 0.15 from 0.5 and training=self.training, u can solve this problem.

Text to mel model can always make some audio given a same utterance.
But it makes model hard to converge and unstable than using droprate=0.5.

jjl1994 · 2019-08-02T08:08:20Z

@Yeongtae Hi, maybe (0.15 and disable dropout) when reference will let the prenet give simillar results as (0.5 and enable dropout). It seems the autoregressive decoder is extremely sensitive to its input. A dropout before the layer will not affect the output of some conventional layers(such as dense layer), but do will affect the output of the autoregressive decoder.

rafaelvalle · 2019-08-02T16:09:04Z

@jjl1994 you can run an experiment and train the model from scratch without dropout on the prenet

wizardk · 2019-08-07T03:46:05Z

@jjl1994 You are right. Using dropout in inference is not wised. The mozilla version of tacotron2 is working as your wish. Moreover, using small dropout on overmany parameters is not wised too. The original model structure is just a reference, you can optimize it.

terryyizhong · 2019-08-30T02:58:43Z

any experiment result of set training=self.training to train the model?

rafaelvalle · 2019-10-26T04:55:27Z

Closing due to inactivity.

kevinmtian · 2020-04-05T04:18:00Z

@terryyizhong I can share valid loss of setting training=self.training on my end; I started training on LJ speech from scratch using identical params provided in master. It looks very strange; I am looking into what could have gone wrong.

Validation loss 200:  8.105130
Validation loss 400: 12.825017
Validation loss 600: 11.753986
Validation loss 800: 14.233746
Validation loss 1000: 14.253099
Validation loss 1200: 18.660198
Validation loss 1400: 17.960465
Validation loss 1600: 19.330160
Validation loss 1800: 22.346097
Validation loss 2000: 23.067725
Validation loss 2200: 25.812730
Validation loss 2400: 26.288597
Validation loss 2600: 29.514675
Validation loss 2800: 27.077643
Validation loss 3000: 27.432822
Validation loss 3200: 29.471922
Validation loss 3400: 30.740887
Validation loss 3600: 30.523686
Validation loss 3800: 31.277980
Validation loss 4000: 31.414633
Validation loss 4200: 31.757557
Validation loss 4400: 30.777057
Validation loss 4600: 32.895072
Validation loss 4800: 33.554407

terryyizhong · 2020-04-10T04:02:06Z

@terryyizhong I can share valid loss of setting training=self.training on my end; I started training on LJ speech from scratch using identical params provided in master. It looks very strange; I am looking into what could have gone wrong.

Validation loss 200:  8.105130
Validation loss 400: 12.825017
Validation loss 600: 11.753986
Validation loss 800: 14.233746
Validation loss 1000: 14.253099
Validation loss 1200: 18.660198
Validation loss 1400: 17.960465
Validation loss 1600: 19.330160
Validation loss 1800: 22.346097
Validation loss 2000: 23.067725
Validation loss 2200: 25.812730
Validation loss 2400: 26.288597
Validation loss 2600: 29.514675
Validation loss 2800: 27.077643
Validation loss 3000: 27.432822
Validation loss 3200: 29.471922
Validation loss 3400: 30.740887
Validation loss 3600: 30.523686
Validation loss 3800: 31.277980
Validation loss 4000: 31.414633
Validation loss 4200: 31.757557
Validation loss 4400: 30.777057
Validation loss 4600: 32.895072
Validation loss 4800: 33.554407

thanks for your information. I tried before and counter the same problem. The loss keep going up after several steps.
Looking forward you find the solution.

zwlanpishu · 2020-05-14T07:31:44Z

@terryyizhong I can share valid loss of setting training=self.training on my end; I started training on LJ speech from scratch using identical params provided in master. It looks very strange; I am looking into what could have gone wrong.

Validation loss 200:  8.105130
Validation loss 400: 12.825017
Validation loss 600: 11.753986
Validation loss 800: 14.233746
Validation loss 1000: 14.253099
Validation loss 1200: 18.660198
Validation loss 1400: 17.960465
Validation loss 1600: 19.330160
Validation loss 1800: 22.346097
Validation loss 2000: 23.067725
Validation loss 2200: 25.812730
Validation loss 2400: 26.288597
Validation loss 2600: 29.514675
Validation loss 2800: 27.077643
Validation loss 3000: 27.432822
Validation loss 3200: 29.471922
Validation loss 3400: 30.740887
Validation loss 3600: 30.523686
Validation loss 3800: 31.277980
Validation loss 4000: 31.414633
Validation loss 4200: 31.757557
Validation loss 4400: 30.777057
Validation loss 4600: 32.895072
Validation loss 4800: 33.554407

have you solved the problem? I encountered the same problem when set training=self.training, the valid loss keeps going up after some steps, especially when the reductor factor = 1.

terryyizhong · 2020-05-14T08:30:34Z

no, I think this issue should keep open gain

CookiePPP · 2020-05-14T10:54:39Z

@terryyizhong
@zwlanpishu
Do either of you have some alignment and predicted spectrogram pictures you can upload?
(Images tab in Tensorboard)

zwlanpishu · 2020-05-15T05:37:06Z

@terryyizhong
@zwlanpishu
Do either of you have some alignment and predicted spectrogram pictures you can upload?
(Images tab in Tensorboard)

I am training. But it is really hard to converge with a reduce fator r =1. So, usually how many steps does it take to pick up a alignment for the LJspeech dataset?

zwlanpishu · 2020-05-18T02:18:43Z

@terryyizhong
@zwlanpishu
Do either of you have some alignment and predicted spectrogram pictures you can upload?
(Images tab in Tensorboard)

When training with prenet dropout=self.training, the process is same as before, so it also converges and get an alignment. But the valid loss is easy to overfit with dropout disabled. It rises up quickly after several epochs. As a result, the model can not work with the prenet dropout disabled when infering. However, setting the prenet dropout=True solves the problem with a non-overfiting valid loss. As discussed above, maybe (p = 0.15 and disable dropout) when infering will let the prenet give simillar results as (p = 0.5 and enable dropout).

traing loss:

validation loss with dropout disabled when infering:

jjl1994 changed the title ~~Why is the dropout parameter 'training' in Prenet set to True?~~ Why is the 'training' parameter of dropout in Prenet set to True? Aug 1, 2019

rafaelvalle closed this as completed Oct 26, 2019

CookiePPP mentioned this issue Mar 28, 2020

Why dropout was forced in training mode? #324

Closed

Dekakhrone mentioned this issue Jun 8, 2020

Why use Dropout in the evaluation #364

Closed

BogiHsu mentioned this issue Jun 18, 2020

Set training=self.training in dropout BogiHsu/Tacotron2-PyTorch#8

Closed

syed-ahmed mentioned this issue May 5, 2022

[nvFuser] failing correctness tests in torchbench.py pytorch/pytorch#76662

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the 'training' parameter of dropout in Prenet set to True? #247

Why is the 'training' parameter of dropout in Prenet set to True? #247

jjl1994 commented Aug 1, 2019 •

edited

yuxinyuan commented Aug 2, 2019

jjl1994 commented Aug 2, 2019 •

edited

jjl1994 commented Aug 2, 2019

Yeongtae commented Aug 2, 2019 •

edited

jjl1994 commented Aug 2, 2019

Yeongtae commented Aug 2, 2019 •

edited

jjl1994 commented Aug 2, 2019 •

edited

rafaelvalle commented Aug 2, 2019

wizardk commented Aug 7, 2019

terryyizhong commented Aug 30, 2019

rafaelvalle commented Oct 26, 2019

kevinmtian commented Apr 5, 2020

terryyizhong commented Apr 10, 2020

zwlanpishu commented May 14, 2020

terryyizhong commented May 14, 2020

CookiePPP commented May 14, 2020 •

edited

zwlanpishu commented May 15, 2020

zwlanpishu commented May 18, 2020

Why is the 'training' parameter of dropout in Prenet set to True? #247

Why is the 'training' parameter of dropout in Prenet set to True? #247

Comments

jjl1994 commented Aug 1, 2019 • edited

yuxinyuan commented Aug 2, 2019

jjl1994 commented Aug 2, 2019 • edited

jjl1994 commented Aug 2, 2019

Yeongtae commented Aug 2, 2019 • edited

jjl1994 commented Aug 2, 2019

Yeongtae commented Aug 2, 2019 • edited

jjl1994 commented Aug 2, 2019 • edited

rafaelvalle commented Aug 2, 2019

wizardk commented Aug 7, 2019

terryyizhong commented Aug 30, 2019

rafaelvalle commented Oct 26, 2019

kevinmtian commented Apr 5, 2020

terryyizhong commented Apr 10, 2020

zwlanpishu commented May 14, 2020

terryyizhong commented May 14, 2020

CookiePPP commented May 14, 2020 • edited

zwlanpishu commented May 15, 2020

zwlanpishu commented May 18, 2020

jjl1994 commented Aug 1, 2019 •

edited

jjl1994 commented Aug 2, 2019 •

edited

Yeongtae commented Aug 2, 2019 •

edited

Yeongtae commented Aug 2, 2019 •

edited

jjl1994 commented Aug 2, 2019 •

edited

CookiePPP commented May 14, 2020 •

edited