Skip to content

Tacotron2 Issues with Inference and using a Custom Dataset #1070

@conceptofmind

Description

@conceptofmind

I believe I am currently having an issue when training from both scratch and the pre-trained tacotron2 model.

I have collected 14 to 17 hours of pre-processed wav files of Obama speaking. Each file was initially normalized with ffmpeg-normalize and then resampled to the recommended 22050Hz.

I have ensured that:

  • the Sampling rate of each wav file is 22050Hz
  • there is only a Single speaker: Obama
  • the Speech contains a variety of speech phonemes
  • each Audio file is split into segments of 10 seconds
  • each of the Audio segments does NOT have silence at the beginning and end of the file
  • each of the Audio segments does not contain long silences

Here is a link to a drive containing the wav files for inspection:

https://drive.google.com/drive/folders/17RoPoNhcU6ovW0BBkONt3WEXf6ZvuUwF?usp=download

Here is a link to both of the formatted .txt files (train and val):

Train .txt file: https://drive.google.com/file/d/1dxTkagpAT43jP06QAeODWS92GmuqdPqz/view?usp=sharing
Validation .txt file: https://drive.google.com/file/d/1dtaHPWTFdXLM1QdOVb2V9H2a_VMKVWRg/view?usp=sharing

I formatted the .txt files in the same way as the LJSpeech dataset. I used wav2vec2.0 for transcriptions. I made sure that any spaces at the start and end of the transcriptions are removed, and that a period was added to the end of each transcript. Each should be on a new line.

The train.py script will run. The directory paths and naming conventions are correct.

This is what a graph of the training inference looks like at epochs 0, 50, and 100:

Epoch 0:

531816681ab45e27dc0e382df3198f71

Epoch 50:

e926113b3eb88b9e4519cf93804bfd0a

Epoch 100:

fc8476aaad5e143b73bb3ca84a536a3f

Epoch 250:

1f0f98d92629c0fff10c00bc73f5641d

Is this how the charts should be looking? Any help would be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions