TTS-EgyptianArabic-Tacotron2

TTS models (Tacotron2), trained on EGYARA dataset from MASRY TTS paper including the HiFi-GAN vocoder for direct TTS inference.

Papers:

Tacotron2 | Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (arXiv)

MASRY TTS | Masry: A Text-to-Speech System for the Egyptian Arabic (SCITEPRESS)

HiFi-GAN | HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis (arXiv)

Quick Setup

Required packages: torch torchaudio pyyaml

~ for training: librosa matplotlib tensorboard

Download the pretrained weights for the Tacotron2 model for Egyptian Arabic (https://drive.google.com/file/d/1etruUB2hNsYfvn5_zsDrQM6uVJW62u8u/view?usp=drive_link) then put it in pretrained folder

We used a diacritization model from Camel Tools (https://github.com/CAMeL-Lab/camel_tools) to diacritize Egyptian Arabic.

Download the HiFi-GAN vocoder weights (link). Either put them into pretrained/hifigan-asc-v1 or edit the following lines in configs/basic.yaml.

# vocoder
vocoder_state_path: pretrained/hifigan-asc-v1/hifigan-asc.pth
vocoder_config_path: pretrained/hifigan-asc-v1/config.json

Using the models

The Tacotron2 from models.tacotron2 are wrappers that simplify text-to-mel inference. The Tacotron2Wave models includes the HiFi-GAN vocoder for direct text-to-speech inference.

Inferring the Mel spectrogram

from models.tacotron2 import Tacotron2
model = Tacotron2('pretrained/tacotron2_ar_adv.pth')
model = model.cuda()
mel_spec = model.ttmel("ازيك عامل ايه")

End-to-end Text-to-Speech

from models.tacotron2 import Tacotron2Wave
model = Tacotron2Wave('pretrained/tacotron2_ar_adv.pth')
model = model.cuda()
wave = model.tts("اَزيك عامل ايه")

By default, Arabic letters are converted using the Buckwalter transliteration. The transliteration can also be used directly. If no Arabic script is expected to be used you can set arabic_in=False.

Inference from text file

python inference.py
# default parameters:
python inference.py --list data/infer_text.txt --out_dir samples/results --model tacotron2 --checkpoint pretrained/tacotron2_ar_adv.pth --batch_size 2 --denoise 0

Testing the model

To test the model run:

python test.py
# default parameters:
python test.py --model tacotron2 --checkpoint pretrained/tacotron2_ar_adv.pth --out_dir samples/test

Training the model

Before training, the audio files must be resampled. The model was trained after preprocessing the files using scripts/preprocess_audio.py.

To train the model with options specified in the config file run:

python train.py
# default parameters:
python train.py --config configs/EGYARA.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTS-EgyptianArabic-Tacotron2

Quick Setup

Using the models

Inferring the Mel spectrogram

End-to-end Text-to-Speech

Inference from text file

Testing the model

Training the model

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
configs		configs
data		data
models		models
pretrained		pretrained
scripts		scripts
text		text
utils		utils
vocoder/hifigan		vocoder/hifigan
README.md		README.md
inference.py		inference.py
test.py		test.py
train.py		train.py

bluehybrid/TTS_EgyptianArabic_Tacotron2

Folders and files

Latest commit

History

Repository files navigation

TTS-EgyptianArabic-Tacotron2

Quick Setup

Using the models

Inferring the Mel spectrogram

End-to-end Text-to-Speech

Inference from text file

Testing the model

Training the model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages