GitHub - Adibian/Persian-MultiSpeaker-Tacotron2: Implementation of Transfer Learning from Speaker Verification to Multi-speaker Text-To-Speech Synthesis (SV2TTS) in Persian language.

MultiSpeaker Tacotron2 for Persian Language

This repository contains a Persian language adaptation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS). The core implementation is based on this repository, modified to work with Persian text and phoneme data.

Quickstart

Data Structure

Organize your data as follows:

dataset/persian_date/
    train_data/
        speaker1/book-1/
            sample1.txt
            sample1.wav
            ...
        ...
    test_data/
        ...

Preprocessing

Audio Preprocessing

python synthesizer_preprocess_audio.py dataset --datasets_name persian_data --subfolders train_data --no_alignments

Embedding Preprocessing

python synthesizer_preprocess_embeds.py dataset/SV2TTS/synthesizer

Train the Synthesizer

To begin training the synthesizer model:

python synthesizer_train.py my_run dataset/SV2TTS/synthesizer

Inference

To generate a wav file, place all trained models in the saved_models/final_models directory. If you haven’t trained the speaker encoder or vocoder models, you can use pretrained models from saved_models/default.

Using WavRNN as Vocoder

python inference.py --vocoder "WavRNN" --text "یک نمونه از خروجی" --ref_wav_path "/path/to/sample/reference.wav" --test_name "test1"

Using HiFiGAN as Vocoder (Recommended)

WavRNN is an old vocoder and if you want to use HiFiGAN you must first download a pretrained model in English.

Install Parallel WaveGAN

pip install parallel_wavegan

Download Pretrained HiFiGAN Model

from parallel_wavegan.utils import download_pretrained_model
download_pretrained_model("vctk_hifigan.v1", "saved_models/final_models/vocoder_HiFiGAN")

Run Inference with HiFiGAN

python inference.py --vocoder "HiFiGAN" --text "یک نمونه از خروجی" --ref_wav_path "/path/to/sample/reference.wav" --test_name "test1"

Demo

Check out some audio samples from the trained model in this directory.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
encoder		encoder
results		results
saved_models/default		saved_models/default
synthesizer		synthesizer
utils		utils
vocoder		vocoder
.gitignore		.gitignore
README.md		README.md
auto_inference.py		auto_inference.py
encoder_preprocess.py		encoder_preprocess.py
encoder_train.py		encoder_train.py
inference.py		inference.py
prepare_data.py		prepare_data.py
requirements.txt		requirements.txt
start_instruction.txt		start_instruction.txt
synthesizer_preprocess_audio.py		synthesizer_preprocess_audio.py
synthesizer_preprocess_embeds.py		synthesizer_preprocess_embeds.py
synthesizer_train.py		synthesizer_train.py
train_info.txt		train_info.txt
vocoder_preprocess.py		vocoder_preprocess.py
vocoder_train.py		vocoder_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiSpeaker Tacotron2 for Persian Language

Quickstart

Data Structure

Preprocessing

Train the Synthesizer

Inference

Using WavRNN as Vocoder

Using HiFiGAN as Vocoder (Recommended)

Demo

References:

About

Releases

Packages

Languages

Adibian/Persian-MultiSpeaker-Tacotron2

Folders and files

Latest commit

History

Repository files navigation

MultiSpeaker Tacotron2 for Persian Language

Quickstart

Data Structure

Preprocessing

Train the Synthesizer

Inference

Using WavRNN as Vocoder

Using HiFiGAN as Vocoder (Recommended)

Demo

References:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages