scaling Mel Spectrogram output for Wavenet Vocoder #24

G-Wang · 2018-05-21T09:56:10Z

Hello,

First of all thanks for the nice Tacotron 2 implementation.

I'm trying to use the trained Tacotron 2 outputs as inputs to r9r9's Wavenet vocoder. However his pre-trained wavenet works on scaled Mel Spectrogram between [0, 1].

What is the range for this tacotron 2 implementation, I'm having a hard time finding this out to use it for scaling.

For reference, this is r9r9's normalization function that he applies to the Mel Spectrogram before using it for training, which scales it between 0 and 1:

def _normalize(S): return np.clip((S - hparams.min_level_db) / -hparams.min_level_db, 0, 1)

The text was updated successfully, but these errors were encountered:

rafaelvalle · 2018-05-21T15:16:10Z

Our [dynamic range compression] (https://github.com/NVIDIA/tacotron2/blob/master/audio_processing.py#L78) just applies log on clamped values. We also provide a dynamic range decompression function here

I think the code below is what you're looking for.

mel = torch.load(conditional_path)
mel = dynamic_range_decompression(mel)
mel = mel.cpu().numpy()
mel = mel.transpose()

mel = audio._amp_to_db(mel) - hparams.ref_level_db
if not hparams.allow_clipping_in_normalization:
    assert mel.max() <= 0 and mel.min() - hparams.min_level_db >= 0
mel = audio._normalize(mel)

rafaelvalle · 2018-05-21T15:18:04Z

Curious to hear your samples and to know if you trained with the most recent code that updates the input of the attention and decoder.

G-Wang · 2018-05-22T05:11:39Z

Thanks, that worked, (my model still needs some more training), I'm starting another run with the updated attention repository, will post back here with the entire pipeline (+ wavenet vocoder) once that's done.

yliess86 · 2018-06-12T13:58:36Z

Hi @G-Wang ! Have you gone further with your training ? I am currently doing the same thing as you and the voice I get from Wavenet sounds like if it has the flu. Did you managed to have good results ?

rafaelvalle · 2018-06-12T16:05:02Z

@yliess86 can you share the audio and the mel-spectrogram that sounds like it has the flu here?

G-Wang · 2018-06-12T16:49:48Z

Hello @yliess86, due to my limited compute, I had to set the batch size pretty low to fit everything into the GPU (batch_size of 18), the network's loss was stuck around 0.63 for quite a while, so I didn't continue. however that was older version of the code. I've started running a new training with the latest code since yesterday.

But note that the solution @rafaelvalle provided works no problem once your network has been trained, as I took the ground truth mel spectrogram data that Tacatron-2 trains on and it has very good quality on R9R9's wavenet vocoder (using the above code)

yliess86 · 2018-06-13T08:47:48Z

@G-Wang Ok thank you.
@rafaelvalle Sure! I will share audio and mel-spec today.

yliess86 · 2018-06-13T10:01:59Z

Here are the text, the corresponding audio and the corresponding mel-spectrogram:

'This is an example of text to speech synthesis after 9 days training. This may sound awful, but it is a start.'
Audio
Mel-spectrogram (sorry forgot to inverse top direction):

rafaelvalle · 2018-06-14T14:54:57Z

@yliess86 Can you share the mel-spectrogram file?
Using the WaveNet decoder is essential for good audio quality!

yliess86 · 2018-06-14T15:10:09Z

@rafaelvalle I plugged the output of the tacotron2 to the convertion pipline you described above and finally plugged it into the r9y9 wavenet. The image you can see is the plt plot I did just before the convertion. Do you want me to give you the output (mel-spec) into a .npy file or other ?

# Tacotron2
mel = taco(sentence)[0]

# Convertion
mel = dynamic_range_decompression(mel_input)
mel = mel.data.cpu().numpy()
mel = mel.transpose()
mel = audio._amp_to_db(mel) - hparams.ref_level_db
if not hparams.allow_clipping_in_normalization:
    assert mel.max() <= 0 and mel.min() - hparams.min_level_db >= 0
mel = audio._normalize(mel)

# Wavenet Vocoder
if mel.shape[1] != hparams.num_mels:
    np.swapaxes(mel, 0, 1)
waveform = wavegen(self.model, c=mel, fast=True, tqdm=tqdm)

rafaelvalle · 2018-06-14T15:20:33Z

Yes, please do share the mel-spec into a torch or npy file.

yliess86 · 2018-06-14T15:46:37Z

Here is the (.npy) file: Mel-Spec

rafaelvalle · 2018-06-14T15:59:20Z

@yliess86 The model that produced this mel-spectrogram was not trained on LJ Speech dataset, right?

yliess86 · 2018-06-14T16:06:37Z

It was. I trained the model with the LJ Speech dataset. This mel Spec is the result of 70000 iterations on this dataset.

rafaelvalle · 2018-06-14T16:13:35Z

That's unexpected. Did you train using the default params?

yliess86 · 2018-06-14T16:18:35Z

Yes, I just changed the batch size to 24. I'll try to download it again and retrain the model. It was the LJ Speech dataset from the link given on the Rayhane repository so maybe this is not exactly the same.

rafaelvalle · 2018-06-14T16:30:21Z

What were the training and validation loss of the model used to produce that mel spectrogram?

yliess86 · 2018-06-14T16:31:59Z

The training loss was between 0.3 and 0.5 and the validation one was 0.46

rafaelvalle · 2018-06-14T18:14:14Z

Don't retrain the model, the problem is with the mel-spectrogram representation.
Can you please submit a new issue ? We'll post a solution there.

mrgloom · 2019-03-24T22:25:49Z

Is it possible to use pretrained wavenet models from https://github.com/r9y9/wavenet_vocoder with https://github.com/NVIDIA/tacotron2 ? or it should be retrained?

G-Wang closed this as completed May 22, 2018

yliess86 mentioned this issue Jun 14, 2018

Hard to train on one small GPU #30

Closed

yliess86 mentioned this issue Jun 14, 2018

Problem with the Mel Spectrogram Representation #41

Closed

r9y9 mentioned this issue Aug 2, 2018

Synthesis for LJ Speech Model r9y9/wavenet_vocoder#102

Closed

serg06 mentioned this issue Nov 28, 2020

Converting generated MEL spectrograms to Tacotron 2 format, in order to use a different vocoder? fatchord/WaveRNN#217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scaling Mel Spectrogram output for Wavenet Vocoder #24

scaling Mel Spectrogram output for Wavenet Vocoder #24

G-Wang commented May 21, 2018

rafaelvalle commented May 21, 2018 •

edited

Loading

rafaelvalle commented May 21, 2018

G-Wang commented May 22, 2018

yliess86 commented Jun 12, 2018

rafaelvalle commented Jun 12, 2018

G-Wang commented Jun 12, 2018

yliess86 commented Jun 13, 2018

yliess86 commented Jun 13, 2018 •

edited

Loading

rafaelvalle commented Jun 14, 2018 •

edited

Loading

yliess86 commented Jun 14, 2018 •

edited

Loading

rafaelvalle commented Jun 14, 2018

yliess86 commented Jun 14, 2018 •

edited

Loading

rafaelvalle commented Jun 14, 2018

yliess86 commented Jun 14, 2018

rafaelvalle commented Jun 14, 2018

yliess86 commented Jun 14, 2018 •

edited

Loading

rafaelvalle commented Jun 14, 2018

yliess86 commented Jun 14, 2018

rafaelvalle commented Jun 14, 2018 •

edited

Loading

mrgloom commented Mar 24, 2019 •

edited

Loading

scaling Mel Spectrogram output for Wavenet Vocoder #24

scaling Mel Spectrogram output for Wavenet Vocoder #24

Comments

G-Wang commented May 21, 2018

rafaelvalle commented May 21, 2018 • edited Loading

rafaelvalle commented May 21, 2018

G-Wang commented May 22, 2018

yliess86 commented Jun 12, 2018

rafaelvalle commented Jun 12, 2018

G-Wang commented Jun 12, 2018

yliess86 commented Jun 13, 2018

yliess86 commented Jun 13, 2018 • edited Loading

rafaelvalle commented Jun 14, 2018 • edited Loading

yliess86 commented Jun 14, 2018 • edited Loading

rafaelvalle commented Jun 14, 2018

yliess86 commented Jun 14, 2018 • edited Loading

rafaelvalle commented Jun 14, 2018

yliess86 commented Jun 14, 2018

rafaelvalle commented Jun 14, 2018

yliess86 commented Jun 14, 2018 • edited Loading

rafaelvalle commented Jun 14, 2018

yliess86 commented Jun 14, 2018

rafaelvalle commented Jun 14, 2018 • edited Loading

mrgloom commented Mar 24, 2019 • edited Loading

rafaelvalle commented May 21, 2018 •

edited

Loading

yliess86 commented Jun 13, 2018 •

edited

Loading

rafaelvalle commented Jun 14, 2018 •

edited

Loading

yliess86 commented Jun 14, 2018 •

edited

Loading

yliess86 commented Jun 14, 2018 •

edited

Loading

yliess86 commented Jun 14, 2018 •

edited

Loading

rafaelvalle commented Jun 14, 2018 •

edited

Loading

mrgloom commented Mar 24, 2019 •

edited

Loading