Tacotron2 Issues with Inference and using a Custom Dataset

I believe I am currently having an issue when training from both scratch and the pre-trained tacotron2 model.

I have collected 14 to 17 hours of pre-processed wav files of Obama speaking. Each file was initially normalized with ffmpeg-normalize and then resampled to the recommended 22050Hz.

I have ensured that:
- the Sampling rate of each wav file is 22050Hz
- there is only a Single speaker: Obama
- the Speech contains a variety of speech phonemes
- each Audio file is split into segments of 10 seconds
- each of the Audio segments does **NOT** have silence at the beginning and end of the file
- each of the Audio segments does not contain long silences

Here is a link to a drive containing the wav files for inspection:

https://drive.google.com/drive/folders/17RoPoNhcU6ovW0BBkONt3WEXf6ZvuUwF?usp=download

Here is a link to both of the formatted .txt files (train and val):

Train .txt file: https://drive.google.com/file/d/1dxTkagpAT43jP06QAeODWS92GmuqdPqz/view?usp=sharing
Validation .txt file: https://drive.google.com/file/d/1dtaHPWTFdXLM1QdOVb2V9H2a_VMKVWRg/view?usp=sharing

I formatted the .txt files in the same way as the LJSpeech dataset. I used wav2vec2.0 for transcriptions. I made sure that any spaces at the start and end of the transcriptions are removed, and that a period was added to the end of each transcript. Each should be on a new line.

The train.py script will run. The directory paths and naming conventions are correct.

This is what a graph of the training inference looks like at epochs 0, 50, and 100:

Epoch 0:

![531816681ab45e27dc0e382df3198f71](https://user-images.githubusercontent.com/25208228/151423433-a0963826-c9ba-4e60-8ed4-eab947e38cf7.png)

Epoch 50:

![e926113b3eb88b9e4519cf93804bfd0a](https://user-images.githubusercontent.com/25208228/151423401-80bbf762-fedd-4726-aab0-f88aa6cc1e9e.png)

Epoch 100:

![fc8476aaad5e143b73bb3ca84a536a3f](https://user-images.githubusercontent.com/25208228/151423385-5113f8fa-529d-4bf5-9bc2-ec97073b34bd.png)

Epoch 250:

![1f0f98d92629c0fff10c00bc73f5641d](https://user-images.githubusercontent.com/25208228/151574954-98d3b3c9-a7ce-4fc4-8783-28430845da2e.png)

Is this how the charts should be looking? Any help would be appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tacotron2 Issues with Inference and using a Custom Dataset #1070

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tacotron2 Issues with Inference and using a Custom Dataset #1070

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions