Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS by RobinDong · Pull Request #7409 · NVIDIA-NeMo/NeMo

RobinDong · 2023-09-10T08:47:58Z

What does this PR do ?

Add dataset 'AISHELL-3' from OpenSLR for training FastPitch model.

Changelog

Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS

Usage

create directories

mkdir data_aishell3 mani_aishell3 sup_aishell3

install NeMo packages (doc)
run scripts to get data, process data and train the model from the data

python3 scripts/dataset_processing/tts/aishell3/get_data.py \
    --data-root data_aishell3 \
    --val-size 0.1 \
    --test-size 0.2 \
    --seed-for-ds-split 100 \
    --manifests-path mani_aishell3

python3 scripts/dataset_processing/tts/extract_sup_data.py \
    --config-path aishell3/ds_conf \
    --config-name ds_for_fastpitch_align.yaml \
    manifest_filepath=mani_aishell3/train_manifest.json \
    sup_data_path=sup_aishell3

python3 examples/tts/fastpitch.py --config-path conf/zh/ \
    --config-name fastpitch_align_multispeaker_22050.yaml \
    model.train_ds.dataloader_params.batch_size=32 \
    model.validation_ds.dataloader_params.batch_size=8 \
    train_dataset=mani_aishell3/train_manifest.json \
    validation_datasets=mani_aishell3/val_manifest.json \
    sup_data_path=sup_aishell3 \
    exp_manager.exp_dir=result \
    trainer.max_epochs=160 \
    trainer.check_val_every_n_epoch=1 \
    pitch_mean=214.61354064941406 \
    pitch_std=64.61677551269531 \
    +exp_manager.create_wandb_logger=true \
    +exp_manager.wandb_logger_kwargs.name="tutorial" \
    +exp_manager.wandb_logger_kwargs.project="aishell"

Sample

The FastPitch model trained by AISHELL-3 dataset is here. You can download it and run the snippet below:

from nemo.collections.tts.models.base import SpectrogramGenerator, Vocoder
from nemo.collections.tts.models import FastPitchModel, HifiGanModel

text = "这些新一代的中央处理器不只效能惊人，价格也十分有竞争力。"
device = "cpu"
fastpitch_model = FastPitchModel.restore_from("~/Downloads/aishell3.nemo").eval().to(device)
model = HifiGanModel.from_pretrained(model_name="tts_zh_hifigan_sfspeech").eval().to(device)

# Normalize the text and convert it into individual phonemes/tokens.
tokens = fastpitch_model.parse(text)

# Generate spectrogram from text
spectrogram = fastpitch_model.generate_spectrogram(tokens=tokens, speaker=5)

# Invert the spectrogram into audio samples
audio = model.convert_spectrogram_to_audio(spec=spectrogram)

# Convert output from pytorch tensor to numpy array
audio = audio.cpu().detach().numpy()

import soundfile as sf

sf.write("sample.wav", audio.T, 22050, format="WAV")

The sample.wav is here: aishell3.zip
Change the speaker id to 1863 speaker=1863 would make a male sound.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

@blisc @okuchaiev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS#7409

Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS#7409
XuesongYang merged 7 commits intoNVIDIA-NeMo:mainfrom
RobinDong:add_aishell3_dataset

RobinDong commented Sep 10, 2023 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RobinDong commented Sep 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

Sample

Before your PR is "Ready for review"

Who can review?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RobinDong commented Sep 10, 2023 •

edited

Loading