TatarTTS Dataset

TatarTTS is an open-source text-to-speech dataset for the Tatar language. The dataset comprises ~70 hours of transcribed audio recordings, featuring two professional speakers (one male and one female).

Paper on TechRxiv

TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language

Paper on IEEE

TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language

Setup and Requirements

We employed Piper text-to-speech system to train TTS models on our dataset.

sudo apt-get install python3-dev 
git clone https://github.com/rhasspy/piper.git
cd piper/src/python
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptools
pip3 install -e .

Please check the installation guide for more information.

Downloading the dataset

LINK TO DOWNLOAD WILL BE AVAILABLE SOON HERE. After downloading the dataset, unzip it inside piper/src/python/ directory. The dataset is in the ljspeech format.

TatarTTS
|-male
  |-wav
    |0.wav
    |1.wav
    |2.wav
    ...
  |-metadata.csv
|-female
  |-wav
    |0.wav
    |1.wav
    |2.wav
    ...
  |-metadata.csv

Pre-processing

cd piper/src/python
mkdir TatarTTS_piper
cd TatarTTS_piper
mkdir male female

Pre-processing the male speaker dataset

python3 -m piper_train.preprocess \
  --language tt \
  --input-dir /TatarTTS/male \
  --output-dir /TatarTTS_piper/male \
  --dataset-format ljspeech \
  --single-speaker \
  --sample-rate 22050

Pre-processing the female speaker dataset

python3 -m piper_train.preprocess \
  --language tt \
  --input-dir /TatarTTS/female \
  --output-dir /TatarTTS_piper/female \
  --dataset-format ljspeech \
  --single-speaker \
  --sample-rate 22050

Training

cd piper/src/python

Training on the male speaker dataset

python3 -m piper_train \
    --dataset-dir /TatarTTS_piper/male\
    --accelerator 'gpu' \
    --devices 1 \
    --batch-size 32 \
    --validation-split 0.0 \
    --num-test-examples 0 \
    --max_epochs 1000 \
    --checkpoint-epochs 1 \
    --precision 32

Training on the female speaker dataset

python3 -m piper_train \
    --dataset-dir /TatarTTS_piper/female\
    --accelerator 'gpu' \
    --devices 1 \
    --batch-size 32 \
    --validation-split 0.0 \
    --num-test-examples 0 \
    --max_epochs 1000 \
    --checkpoint-epochs 1 \
    --precision 32

Exporting a Model

python3 -m piper_train.export_onnx \
    /path/to/model.ckpt \
    /path/to/model.onnx
    
cp /path/to/training_dir/config.json \
   /path/to/model.onnx.json

Speech Synthesis with Pre-trained Models

Download and unzip pre-trained models (.onnx, .ckpt) for both speakers from Google Drive.

CLI

cd models

echo 'Аның чыраенда тәвәккәллек чагыла иде.' |   ./piper --model male/male.onnx --config male/config.json --output_file welcome.wav

echo 'Аның чыраенда тәвәккәллек чагыла иде.' |   ./piper --model female/female.onnx --config female/config.json --output_file welcome.wav

Python

cd piper/src/python_run

python3 piper --model /path/to/model/.onnx --config /path/to/model/config.json --output-file welcome.wav

Authors and Citation

The project has been developed in academic collaboration between ISSAI and Institute of Applied Semiotics of Tatarstan Academy of Sciences

@INPROCEEDINGS{10463261,
  author={Orel, Daniil and Kuzdeuov, Askat and Gilmullin, Rinat and Khakimov, Bulat and Varol, Huseyin Atakan},
  booktitle={2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)}, 
  title={TatarTTS: An Open-Source Text-to-Speech Synthesis Dataset for the Tatar Language}, 
  year={2024},
  volume={},
  number={},
  pages={717-721},
  doi={10.1109/ICAIIC60209.2024.10463261}}

References

Piper: https://github.com/rhasspy/piper
Pre-processing, training, and exporting: https://github.com/rhasspy/piper/blob/master/TRAINING.md

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE.md

LICENSE.md

README.md

README.md

Repository files navigation

TatarTTS Dataset

Paper on TechRxiv

Paper on IEEE

Setup and Requirements

Downloading the dataset

Pre-processing

Pre-processing the male speaker dataset

Pre-processing the female speaker dataset

Training

Training on the male speaker dataset

Training on the female speaker dataset

Exporting a Model

Speech Synthesis with Pre-trained Models

CLI

Python

Authors and Citation

References

About

Releases

Packages

Contributors 4

License

IS2AI/TatarTTS

Folders and files

Latest commit

History

LICENSE.md

LICENSE.md

README.md

README.md

Repository files navigation

TatarTTS Dataset

Paper on TechRxiv

Paper on IEEE

Setup and Requirements

Downloading the dataset

Pre-processing

Pre-processing the male speaker dataset

Pre-processing the female speaker dataset

Training

Training on the male speaker dataset

Training on the female speaker dataset

Exporting a Model

Speech Synthesis with Pre-trained Models

CLI

Python

Authors and Citation

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages