Integrating Tacotron and LPCNet: Training tacotron with .f32 features #4

rpratesh · 2019-05-17T05:55:15Z

In the ReadMe, it's mentioned

Convert the data generated at the last step which has .f32 extension to what could be loaded with numpy. I merge it to the Tacotron feeder here and here with the following code.

> mel_target = np.fromfile(os.path.join(self._mel_dir, meta[0]), dtype='float32')
> mel_target = np.resize(mel_target, (-1, self._hparams.num_mels))

But, meta[0] will have speech-audio-xxxx.npy files while self._mel_dir would have speech-mel-xxxx.npy files. So, the above code snippet is trying to search for speech (npy or f32) files inside mel_dir. Is there any thing wrong in the above code snippet.

One more doubt: Where should I copy the .f32 file generated in previous step, in Mels or in wavs or in Linear folder so that we can train Tacotron with these features generated.

Also, In this case, should I use

python train.py --model='Tacotron-2'

which trains entire tacotron+wavenet

or use

python train.py --model='Tacotron'

which trains only Tacotron.

Thanks

The text was updated successfully, but these errors were encountered:

MlWoo · 2019-05-17T06:36:45Z

about meta[0].
If you use tacotron2 preprocessing, you will get three folders(audio, melsprectrum and linear spectrum) and a txt file used to provide the info. Because I do not need audio file in tacotron2 training, so I make a soft link of audio folder to the f32 folder. Meanwhile, I can training tacotron2 with mels conventionally. Of course, I should modify the first column(meta[0]) to the real name of f32 file. You can modify it to your actual path of f32 according to your situation.
only train tacotron. you have another vocoder LPCNet instead of wavenet.

alokprasad · 2019-05-17T09:32:25Z

So while training tacatron2 , i should replace/softlink the f32 to audio folder of training_data( after preprocessing) and train.txt first column(meta[0]) should be actual name of f32 files right?
if above is taken care below lines should be added to feeder.py.

mel_target = np.fromfile(os.path.join(self._audio_dir, meta[0]), dtype='float32')
mel_target = np.resize(mel_target, (-1, self._hparams.num_mels))

Basically what npy should be loaded in feeder audio or melspectrum .?

MlWoo · 2019-05-17T10:00:50Z

of course, you can do that as long as your path is the f32.file .

superhg2012 · 2019-05-24T08:07:13Z

Hi, the feature extracted with feature_extract.sh script is saved as .f32 file and then it's used to train Tacotron2 . But Normally Tacotron2 was used to predict mel spectrogram. Here, T2 + LPCNet ,is the predict target of T2 is changed or just replace mel spectrogram with .f32 feature?

alokprasad · 2019-05-24T08:10:19Z

@superhg2012 replace the f32 created to audio folder..
check this diff based on Mlwoo changes.
https://github.com/alokprasad/LPCTron/blob/master/Tacotron-2/Tacotron2-lpcnet_changes.diff

superhg2012 · 2019-05-24T08:27:52Z

@alokprasad thanks a lot !! #

superhg2012 · 2019-06-13T09:24:51Z

@alokprasad can you post your samples?

alokprasad · 2019-06-13T10:50:21Z

@superhg2012
Please find the recording,,( They are not good)
I think we should retrain lpcnet with f32 generated from tactron2
https://vocaroo.com/i/s1Dx9nbKFeuY
https://vocaroo.com/i/s1VRBWayVzrD

superhg2012 · 2019-06-13T11:21:57Z

@alokprasad I can not reach the link you posted, please refer to [#1], (#1) posted audio sample, it seems that the author did not use GTA training mode.

MlWoo · 2019-06-13T14:42:03Z

@alokprasad You have a lot work to do because you should calculate the length of audio according to the number of frames and add the tail to the audio. We did not use GTA mode because the job is trivial.
LPCNet is sensitive to pitch params. I think gta mode will result in one deviation of baseline to another if t2 is not trained well.

alokprasad · 2019-06-14T13:23:27Z

@MlWoo do you mean that each Audio file should be of same length or it should in integral multiple of frames?

MlWoo · 2019-06-15T01:23:55Z

@alokprasad more work. LPCnet will cut off the silence of audio in default, you should modify LPCnet code to cooperate with gta result of T2.

alokprasad · 2019-07-11T07:22:54Z

@MlWoo Can you point to code in LPCNet , where the modification need to be done.

alokprasad · 2019-07-11T08:04:01Z

@MlWoo i saw that xiph@554b6df there is silence removal here, and this is needed only to
during training of LPCnet .
Should i remove this code and Train LPCNet.

alokprasad · 2019-07-15T05:28:26Z

@MlWoo Can we add this in Tacotron training for silence removal
gooofy/zamia-tts@66bd10d

MlWoo · 2019-08-08T03:13:09Z

@alokprasad Tacotron training with silence removal is maybe a good idea when training English. It is bad idea while training Chinese(mandarin) because the very short silence is benefiticial to the prosody. I am not very sure about that is good to English cauz' I am not a native English speaker. Removing the long silence at the beginning and end of an audio is necessary when training Tacotron2.

lmingde · 2019-08-20T12:08:59Z

@superhg2012 replace the f32 created to audio folder..
check this diff based on Mlwoo changes.
https://github.com/alokprasad/LPCTron/blob/master/Tacotron-2/Tacotron2-lpcnet_changes.diff

I see you save audio (by preprocess) to meta[0], so you use audio as mel_target to train Tacotron2?

lmingde · 2019-08-20T12:10:12Z

Hi, the feature extracted with feature_extract.sh script is saved as .f32 file and then it's used to train Tacotron2 . But Normally Tacotron2 was used to predict mel spectrogram. Here, T2 + LPCNet ,is the predict target of T2 is changed or just replace mel spectrogram with .f32 feature?

Is the way to train T2? I don't understand well.

lmingde · 2019-08-26T08:15:57Z

Hi, the feature extracted with feature_extract.sh script is saved as .f32 file and then it's used to train Tacotron2 . But Normally Tacotron2 was used to predict mel spectrogram. Here, T2 + LPCNet ,is the predict target of T2 is changed or just replace mel spectrogram with .f32 feature?

Is the way to train T2? I don't understand well.

I see, we use f32 to train T2 instead mel feature

lmingde · 2019-08-26T08:51:49Z

@MlWoo i saw that mozilla@554b6df there is silence removal here, and this is needed only to
during training of LPCnet .
Should i remove this code and Train LPCNet.

I have confuse about the trm

@MlWoo i saw that mozilla@554b6df there is silence removal here, and this is needed only to
during training of LPCnet .
Should i remove this code and Train LPCNet.

@alokprasad
I use your LPCTron code, andthe ouput vocie is bad, Is the Hparmas effect?
And by the way, Do we need create mel and linear spectrum in Tacotron preprocess, I don't find we use mel or linear train when we use Tacotron+LPCNet.

byuns9334 · 2019-09-21T02:53:50Z

@superhg2012 Is the audio quality of TTS + LPCNet good? how did you make it?

superhg2012 · 2019-10-10T09:53:23Z

@byuns9334 I don't get good quality with T2 + LPCNet(20dim). But, I get better quality with T1 and LPCNet(55 dim).

@lmingde I put the dumped f32 files into the audio dir,when train T2, the f32 files in audio dir will be feeded as mel_target for training.

alokprasad · 2019-10-14T05:00:09Z

@superhg2012 i guess T1 and T2 are same except Vocoder part , which anyways we are using LPCNET.
Can you share the changes for T1 with LPCNET.

superhg2012 · 2019-10-14T08:51:42Z

@alokprasad about LPCNet model training with 55 dim features is better than 20 dim. About T1, no special changes, just train with 55 dim features.

alokprasad · 2019-10-14T09:00:07Z

nb_features is already 55 , so u mean to say no changes in lpcnet just train lpcnet.
https://github.com/mozilla/LPCNet/blob/master/src/train_lpcnet.py

For T1
Instead of num_mels = 20 you mean num_mels = 55?

superhg2012 · 2019-10-14T09:08:44Z

yes, just try it.
when test synthesis... make without taco=1 flag
make clean & make test_lpcnet

wangfn · 2019-10-18T19:51:23Z

@MlWoo Hi, may I know that, in the training stage when feeding a batch of samples to Tacotron, what padding values are used to ensure the f32 features (whatever 20 or 55 dims) having the same length? I noticed that -0.1 is used in alokprasad's LPCTron implementation.

MlWoo · 2019-10-19T01:49:02Z

@wangfn I have forgotten it. no worries, just mask the padding value when calculating the loss.

wangfn · 2019-10-19T09:19:24Z

@MlWoo Thanks a lot, indeed masking the padding values is the solution.

alokprasad · 2020-04-17T03:39:13Z

@byuns9334 I don't get good quality with T2 + LPCNet(20dim). But, I get better quality with T1 and LPCNet(55 dim).

@lmingde I put the dumped f32 files into the audio dir,when train T2, the f32 files in audio dir will be feeded as mel_target for training.

@superhg2012 what changes is required for LPCNET for 20 to 55 dim? i thin it uses 55 but only 20 are needed.Any changes in Tactron2 training if we change dims in LPCNET

CJai-K · 2021-06-15T23:41:36Z

Hello all,
I've been working on the tacotron+LPCNet integration but the synthesis is very noisy/robotic. Per the readme I run make dump_data to get the LPCNet training material, then make clean and make dump_data taco=1 and make test_lpcnet taco=1 for the tacotron training and synthesis.

I trained both models on the LJSpeech dataset and ended up with this alignment:

and with these synthesis results
sample.zip

I've heard some great results from others so I am wondering where I went wrong.

Thanks!

MlWoo mentioned this issue May 17, 2019

dataset and preprocessing for tacotron2 + lpcnet #3

Open

MaisyZhang mentioned this issue Jun 17, 2020

Tacotron Training ValueError: cannot reshape array of size 137996 into shape (20) alokprasad/LPCTron#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating Tacotron and LPCNet: Training tacotron with .f32 features #4

Integrating Tacotron and LPCNet: Training tacotron with .f32 features #4

rpratesh commented May 17, 2019 •

edited

Loading

MlWoo commented May 17, 2019 •

edited

Loading

alokprasad commented May 17, 2019 •

edited

Loading

MlWoo commented May 17, 2019

superhg2012 commented May 24, 2019

alokprasad commented May 24, 2019

superhg2012 commented May 24, 2019

superhg2012 commented Jun 13, 2019

alokprasad commented Jun 13, 2019

superhg2012 commented Jun 13, 2019 •

edited

Loading

MlWoo commented Jun 13, 2019 •

edited

Loading

alokprasad commented Jun 14, 2019

MlWoo commented Jun 15, 2019

alokprasad commented Jul 11, 2019

alokprasad commented Jul 11, 2019

alokprasad commented Jul 15, 2019 •

edited

Loading

MlWoo commented Aug 8, 2019

lmingde commented Aug 20, 2019

lmingde commented Aug 20, 2019

lmingde commented Aug 26, 2019

lmingde commented Aug 26, 2019

byuns9334 commented Sep 21, 2019

superhg2012 commented Oct 10, 2019

alokprasad commented Oct 14, 2019

superhg2012 commented Oct 14, 2019

alokprasad commented Oct 14, 2019

superhg2012 commented Oct 14, 2019

wangfn commented Oct 18, 2019

MlWoo commented Oct 19, 2019

wangfn commented Oct 19, 2019

alokprasad commented Apr 17, 2020

CJai-K commented Jun 15, 2021 •

edited

Loading

Integrating Tacotron and LPCNet: Training tacotron with .f32 features #4

Integrating Tacotron and LPCNet: Training tacotron with .f32 features #4

Comments

rpratesh commented May 17, 2019 • edited Loading

MlWoo commented May 17, 2019 • edited Loading

alokprasad commented May 17, 2019 • edited Loading

MlWoo commented May 17, 2019

superhg2012 commented May 24, 2019

alokprasad commented May 24, 2019

superhg2012 commented May 24, 2019

superhg2012 commented Jun 13, 2019

alokprasad commented Jun 13, 2019

superhg2012 commented Jun 13, 2019 • edited Loading

MlWoo commented Jun 13, 2019 • edited Loading

alokprasad commented Jun 14, 2019

MlWoo commented Jun 15, 2019

alokprasad commented Jul 11, 2019

alokprasad commented Jul 11, 2019

alokprasad commented Jul 15, 2019 • edited Loading

MlWoo commented Aug 8, 2019

lmingde commented Aug 20, 2019

lmingde commented Aug 20, 2019

lmingde commented Aug 26, 2019

lmingde commented Aug 26, 2019

byuns9334 commented Sep 21, 2019

superhg2012 commented Oct 10, 2019

alokprasad commented Oct 14, 2019

superhg2012 commented Oct 14, 2019

alokprasad commented Oct 14, 2019

superhg2012 commented Oct 14, 2019

wangfn commented Oct 18, 2019

MlWoo commented Oct 19, 2019

wangfn commented Oct 19, 2019

alokprasad commented Apr 17, 2020

CJai-K commented Jun 15, 2021 • edited Loading

rpratesh commented May 17, 2019 •

edited

Loading

MlWoo commented May 17, 2019 •

edited

Loading

alokprasad commented May 17, 2019 •

edited

Loading

superhg2012 commented Jun 13, 2019 •

edited

Loading

MlWoo commented Jun 13, 2019 •

edited

Loading

alokprasad commented Jul 15, 2019 •

edited

Loading

CJai-K commented Jun 15, 2021 •

edited

Loading