Should there be any noise output? #82

deepglugs · 2020-10-21T14:36:59Z

I'm having trouble getting any decent results out of flowtron and trying to figure out why. With my somewhat small dataset (0.67hrs) and warmstart from ljs, I can't seem to get anything but noise when doing inference on my checkpoints. I tried warmstart ljs with flow=1 and flow=2. I trained for 240k steps. I've tried adjusting p_arpabet (1.0, 0.5), but no dice. Also tried lowering the learning rate to 1e-5.

It seems I should be getting something other than noise at some point?

pytorch 1.6, python 3.8: noise up to 200k+

pytorch 1.3, python 3.7.4: step 5k: (102400,) noise. step 10k: (9984,) noise, step 20k: (2816,) noise

I know the dataset can't be too bad because deepvoice3 works on it to a reasonable degree...

deepglugs · 2020-10-27T06:13:25Z

A bit of an update... I'm training on the LJ dataset and I don't get noise. So something about my dataset is troublesome for flowtron. My data has a lot of shorter utterances like maybe 2-5 words. I also notice that the loss decay was much much faster. -1.0 loss in under 500 steps. LJ isn't even below 0.9 at 100k steps. I also noticed that my wavs are 32bit and LJ are 16bit. My data was magically converted after using librosa's wav writer after trimming silence. Ooops! Retraining now. Hoping for the best.

deepglugs · 2020-11-02T23:02:27Z

Another update. Looks like 32bit wav data was my issue. Now I get jibberish output with the model never attending to the text. Attention weights look poor after 1.6m steps similar to #41 and others:

Training loss

I wonder if my dataset is too small? I have < 1hr of audio data. Would adding another speaker help? Another difference between my dataset and say, LJS is that my dataset has many more smaller utterances (1-3 words).

I've gone through another pass and cleaned my data checking the transcript and removing things like laughing. At the same time, I'm training another model with this dataset and one more as an additional speaker. This makes almost 2hrs of data.

deepglugs · 2020-11-06T22:38:48Z

Yes another update: Still trying to figure out the differences between my dataset and ljs. There are two remaining possibilities that come to mind: utterance length and total dataset size. I trimmed out of my training dataset any sample that was < 1s and > 10s. The min/max distribution now roughly matches ljs. However, even after 345k steps, no attention was learned.

I then created an ljs dataset with only 500 samples (~0.9hrs). Also no attention after 350k steps. Will try again at 1k samples (1.71hrs) and go up to figure out just how much data is required to learn attention on.

deepglugs · 2020-11-10T16:47:03Z

LJS with 2500 samples I have attention starting at 85k. here's 185k

rafaelvalle · 2021-03-16T23:29:43Z

please make sure you set the attention prior to True here
https://github.com/NVIDIA/flowtron/blob/master/config.json#L34

deepglugs · 2021-03-19T20:53:42Z

That seems to have done the trick! The directions for training from scratch seem to apply to pre-trained models as well.

I'm seeing a lot of stuttering in the audio output though. What is typically the cause for this? Need more training time? Data issues? (sigma==0.8)

out.mp4

deepglugs changed the title ~~Mean, LogVar, Prob = None in compute_validation_loss~~ Should there be any noise output? Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should there be any noise output? #82

Should there be any noise output? #82

deepglugs commented Oct 21, 2020 •

edited

Loading

deepglugs commented Oct 27, 2020

deepglugs commented Nov 2, 2020

deepglugs commented Nov 6, 2020

deepglugs commented Nov 10, 2020

rafaelvalle commented Mar 16, 2021

deepglugs commented Mar 19, 2021

Should there be any noise output? #82

Should there be any noise output? #82

Comments

deepglugs commented Oct 21, 2020 • edited Loading

deepglugs commented Oct 27, 2020

deepglugs commented Nov 2, 2020

deepglugs commented Nov 6, 2020

deepglugs commented Nov 10, 2020

rafaelvalle commented Mar 16, 2021

deepglugs commented Mar 19, 2021

deepglugs commented Oct 21, 2020 •

edited

Loading