Phrase break prediction for Text-to-Speech systems

This repository contains code to train speaker independent phrasing models for English Text-to-Speech systems. In text, phrase breaks are usually represented by punctuation. Typically, Text-to-Speech systems insert phrase breaks in the synthesized speech whenever they encounter a comma in the text to be synthesized.

Currently the codebase supports two models

1. BLSTM token classification model using task specific word embeddings trained from scratch
2. Fine tuned BERT model with a token classification head

Given unpunctuated text as input, these models punctuate the text with commas, and the text with predicted commas is then passed to the Text-to-Speech system to be synthesized.

The models are trained using the LibriTTS alignments available at kan-bayashi/LibriTTSLabel. The train-clean-360 split is used for training, while the dev-clean and test-clean splits are used for validation and test respectively.

Quick start

Download and preprocess the dataset

Download the dataset kan-bayashi/LibriTTSLabel

Preprocess the downloaded LibriTTS Label dataset and transform to a format suitable for the model

python utils/build_libritts_label_dataset.py \
    --dataset_dir <Path to the downloaded dataset> \
    --output_dir <Output dir, where the transformed dataset will be written>

BLSTM token classification model using task specific word embeddings trained from scratch

Build vocabularies of words and punctuations from the processed dataset; for training word emebeddings from scratch
```
python utils/build_vocab_blstm.py \
    --dataset_dir <Directory containing the processed dataset>
```
Running this script will save vocabulary files dataset_dir/vocab/words.txt and dataset_dir/vocab/puncs.txt containing all the words and punctuations in the dataset. It will also save dataset_dir/vocab/params.json with some extra information.

All model parameters as well as training hyperparameters are specified in config/blstm.json, which looks like

{
    "embedding_dim": 50,
    "num_blstm_layers": 2,
    "blstm_layer_size": 512,
    "batch_size": 64,
    "lr": 1e-3,
    "num_epochs": 10
}

To experiment with different values for model parameters/training hyperparameters, this file will have to be modified.

Train the model

python train_blstm.py \
    --config_file <Path to file containing the model/training configuration to be loaded> \
    --dataset_dir <Directory containing the processed dataset> \
    --expereiment_dir <Directory where training artifacts will be saved>

Evaluate the model on the heldout test set

python eval_blstm.py \
    --config_file <Path to file containing the model/training configuration to be loaded> \
    --dataset_dir <Directory containing the processed dataset> \
    --model_checkpoint <Path to the checkpoint containing the trained model to be used for eval>

Generate text with punctuations using the trained model

python generate_blstm.py \
    --config_file <Path to file containing model configuration to be loaded> \
    --in_text_file <Path to text file containing unpunctuated text> \
    --vocab_dir <Directory containing vocab files used to train the model> \
    --model_checkpoint <Path to the checkpoint containing the trained model to be used for generation> \
    --out_text_file <Output file where punctuated text will be written>

Fine tuned BERT model with token classification head

Build vocabularies of words and punctuations from the processed dataset; for training word emebeddings from scratch
```
python utils/build_vocab_blstm.py \
    --dataset_dir <Directory containing the processed dataset>
```
Running this script will save vocabulary files dataset_dir/vocab/words.txt and dataset_dir/vocab/puncs.txt containing all the words and punctuations in the dataset. It will also save dataset_dir/vocab/params.json with some extra information. However, only dataset_dir/vocab/puncs.txt will be used by the model.

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
config		config
data		data
docs		docs
model		model
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_blstm.py		eval_blstm.py
eval_finetuned_bert.py		eval_finetuned_bert.py
finetune_bert.py		finetune_bert.py
generate_blstm.py		generate_blstm.py
generate_finetuned_bert.py		generate_finetuned_bert.py
train_blstm.py		train_blstm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Phrase break prediction for Text-to-Speech systems

Quick start

Download and preprocess the dataset

BLSTM token classification model using task specific word embeddings trained from scratch

Fine tuned BERT model with token classification head

References

About

Uh oh!

Releases

Packages

Languages

License

anandaswarup/phrase_break_prediction

Folders and files

Latest commit

History

Repository files navigation

Phrase break prediction for Text-to-Speech systems

Quick start

Download and preprocess the dataset

BLSTM token classification model using task specific word embeddings trained from scratch

Fine tuned BERT model with token classification head

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages