TTS models (Tacotron2), trained on EGYARA dataset from MASRY TTS paper including the HiFi-GAN vocoder for direct TTS inference.
Papers:
Tacotron2 | Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (arXiv)
MASRY TTS | Masry: A Text-to-Speech System for the Egyptian Arabic (SCITEPRESS)
HiFi-GAN | HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis (arXiv)
Required packages:
torch torchaudio pyyaml
~ for training: librosa matplotlib tensorboard
Download the pretrained weights for the Tacotron2 model for Egyptian Arabic (https://drive.google.com/file/d/1etruUB2hNsYfvn5_zsDrQM6uVJW62u8u/view?usp=drive_link) then put it in pretrained folder
We used a diacritization model from Camel Tools (https://github.com/CAMeL-Lab/camel_tools) to diacritize Egyptian Arabic.
Download the HiFi-GAN vocoder weights (link). Either put them into pretrained/hifigan-asc-v1
or edit the following lines in configs/basic.yaml
.
# vocoder
vocoder_state_path: pretrained/hifigan-asc-v1/hifigan-asc.pth
vocoder_config_path: pretrained/hifigan-asc-v1/config.json
The Tacotron2
from models.tacotron2
are wrappers that simplify text-to-mel inference. The Tacotron2Wave
models includes the HiFi-GAN vocoder for direct text-to-speech inference.
from models.tacotron2 import Tacotron2
model = Tacotron2('pretrained/tacotron2_ar_adv.pth')
model = model.cuda()
mel_spec = model.ttmel("ازيك عامل ايه")
from models.tacotron2 import Tacotron2Wave
model = Tacotron2Wave('pretrained/tacotron2_ar_adv.pth')
model = model.cuda()
wave = model.tts("اَزيك عامل ايه")
By default, Arabic letters are converted using the Buckwalter transliteration. The transliteration can also be used directly. If no Arabic script is expected to be used you can set arabic_in=False
.
python inference.py
# default parameters:
python inference.py --list data/infer_text.txt --out_dir samples/results --model tacotron2 --checkpoint pretrained/tacotron2_ar_adv.pth --batch_size 2 --denoise 0
To test the model run:
python test.py
# default parameters:
python test.py --model tacotron2 --checkpoint pretrained/tacotron2_ar_adv.pth --out_dir samples/test
Before training, the audio files must be resampled. The model was trained after preprocessing the files using scripts/preprocess_audio.py
.
To train the model with options specified in the config file run:
python train.py
# default parameters:
python train.py --config configs/EGYARA.yaml