Persian FastPitch

Persian FastPitch is a fast and efficient TTS model adapted to generate mel-spectrograms from Persian text, using the FastPitch architecture. FastPitch, originally developed by NVIDIA, offers a significant speed advantage over Tacotron-based models by directly predicting pitch and leveraging parallelized training.

This implementation is based on NVIDIA's FastPitch repository, adapted for Persian TTS. Changes made include language-specific adjustments and customizations to handle the unique aspects of Persian phonetics.

Key Modifications

Prepare Persian Data: Collect Persian audio files and generate corresponding phoneme sequences for each. This uses phonemes instead of text for better accuracy, handling challenges like missing vowels in Persian script.
Resolve Colab Error in data_function.py: Minor edits to address Colab compatibility issues. See issue #1016 for more details.
Update cleaners.py in common/text/: Adapt character handling to Persian phonemes.
Customize Training Parameters: Modify scripts/train.sh and train.py to fit Persian data.
Adjust Inference Parameters: Update scripts/inference_example.sh for Persian-specific inference.

How to Use

Clone this Repository

git clone https://github.com/Adibian/Persian-FastPitch.git
cd Persian-FastPitch

Install Requirements
```
pip install -r requirements.txt
```
Add Your Data
- Place audio files in wavs/
- Add training and validation phoneme transcriptions to filelists/
- Add test phoneme transcriptions to phrases/

Extract Pitch

Run this command to extract pitch values for the audio files

  python prepare_dataset.py \
      --wav-text-filelists filelists/audio_text_train.txt \
                           filelists/audio_text_val.txt \
      --n-workers 16 \
      --batch-size 1 \
      --dataset-path 'wavs/' \
      --extract-pitch \
      --f0-method pyin

Install Additional Dependencies

Run the following to install NVIDIA Apex and download CMUdict:

   git clone https://github.com/NVIDIA/apex
   cd apex; pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
   bash scripts/download_cmudict.sh

Train the Model
- Use this command to start training. Checkpoints will be saved in output/:
```
bash scripts/train.sh
```
Download WaveGlow Vocoder
- WaveGlow is required to convert mel-spectrograms into audio. Download it using:
```
bash scripts/download_waveglow.sh
```
Run Inference
- To synthesize audio for a test file in phrases/, run:
```
bash scripts/inference_example.sh
```
- The synthesized audio will be saved in output/audio_test_file/.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
audio		audio
cmudict		cmudict
common		common
fastpitch		fastpitch
filelists		filelists
notebooks		notebooks
phrases		phrases
platform		platform
scripts		scripts
triton		triton
waveglow		waveglow
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
export_torchscript.py		export_torchscript.py
inference.py		inference.py
models.py		models.py
pitch_transform.py		pitch_transform.py
prepare_dataset.py		prepare_dataset.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persian FastPitch

Key Modifications

How to Use

About

Releases

Packages

Languages

Adibian/Persian-FastPitch

Folders and files

Latest commit

History

Repository files navigation

Persian FastPitch

Key Modifications

How to Use

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages