Persian FastPitch is a fast and efficient TTS model adapted to generate mel-spectrograms from Persian text, using the FastPitch architecture. FastPitch, originally developed by NVIDIA, offers a significant speed advantage over Tacotron-based models by directly predicting pitch and leveraging parallelized training.
This implementation is based on NVIDIA's FastPitch repository, adapted for Persian TTS. Changes made include language-specific adjustments and customizations to handle the unique aspects of Persian phonetics.
- Prepare Persian Data: Collect Persian audio files and generate corresponding phoneme sequences for each. This uses phonemes instead of text for better accuracy, handling challenges like missing vowels in Persian script.
- Resolve Colab Error in
data_function.py
: Minor edits to address Colab compatibility issues. See issue #1016 for more details. - Update
cleaners.py
incommon/text/
: Adapt character handling to Persian phonemes. - Customize Training Parameters: Modify
scripts/train.sh
andtrain.py
to fit Persian data. - Adjust Inference Parameters: Update
scripts/inference_example.sh
for Persian-specific inference.
-
Clone this Repository
git clone https://github.com/Adibian/Persian-FastPitch.git cd Persian-FastPitch
-
Install Requirements
pip install -r requirements.txt
-
Add Your Data
- Place audio files in wavs/
- Add training and validation phoneme transcriptions to filelists/
- Add test phoneme transcriptions to phrases/
-
Extract Pitch
- Run this command to extract pitch values for the audio files
python prepare_dataset.py \ --wav-text-filelists filelists/audio_text_train.txt \ filelists/audio_text_val.txt \ --n-workers 16 \ --batch-size 1 \ --dataset-path 'wavs/' \ --extract-pitch \ --f0-method pyin
- Run this command to extract pitch values for the audio files
-
Install Additional Dependencies
- Run the following to install NVIDIA Apex and download CMUdict:
git clone https://github.com/NVIDIA/apex cd apex; pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ bash scripts/download_cmudict.sh
- Run the following to install NVIDIA Apex and download CMUdict:
-
Train the Model
- Use this command to start training. Checkpoints will be saved in output/:
bash scripts/train.sh
-
Download WaveGlow Vocoder
- WaveGlow is required to convert mel-spectrograms into audio. Download it using:
bash scripts/download_waveglow.sh
- WaveGlow is required to convert mel-spectrograms into audio. Download it using:
-
Run Inference
- To synthesize audio for a test file in phrases/, run:
bash scripts/inference_example.sh
- The synthesized audio will be saved in output/audio_test_file/.
- To synthesize audio for a test file in phrases/, run: