Skip to content

Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS#7409

Merged
XuesongYang merged 7 commits intoNVIDIA-NeMo:mainfrom
RobinDong:add_aishell3_dataset
Sep 26, 2023
Merged

Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS#7409
XuesongYang merged 7 commits intoNVIDIA-NeMo:mainfrom
RobinDong:add_aishell3_dataset

Conversation

@RobinDong
Copy link
Contributor

@RobinDong RobinDong commented Sep 10, 2023

What does this PR do ?

Add dataset 'AISHELL-3' from OpenSLR for training FastPitch model.

Changelog

  • Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS

Usage

  1. create directories
mkdir data_aishell3 mani_aishell3 sup_aishell3
  1. install NeMo packages (doc)
  2. run scripts to get data, process data and train the model from the data
python3 scripts/dataset_processing/tts/aishell3/get_data.py \
    --data-root data_aishell3 \
    --val-size 0.1 \
    --test-size 0.2 \
    --seed-for-ds-split 100 \
    --manifests-path mani_aishell3

python3 scripts/dataset_processing/tts/extract_sup_data.py \
    --config-path aishell3/ds_conf \
    --config-name ds_for_fastpitch_align.yaml \
    manifest_filepath=mani_aishell3/train_manifest.json \
    sup_data_path=sup_aishell3

python3 examples/tts/fastpitch.py --config-path conf/zh/ \
    --config-name fastpitch_align_multispeaker_22050.yaml \
    model.train_ds.dataloader_params.batch_size=32 \
    model.validation_ds.dataloader_params.batch_size=8 \
    train_dataset=mani_aishell3/train_manifest.json \
    validation_datasets=mani_aishell3/val_manifest.json \
    sup_data_path=sup_aishell3 \
    exp_manager.exp_dir=result \
    trainer.max_epochs=160 \
    trainer.check_val_every_n_epoch=1 \
    pitch_mean=214.61354064941406 \
    pitch_std=64.61677551269531 \
    +exp_manager.create_wandb_logger=true \
    +exp_manager.wandb_logger_kwargs.name="tutorial" \
    +exp_manager.wandb_logger_kwargs.project="aishell"

Sample

The FastPitch model trained by AISHELL-3 dataset is here. You can download it and run the snippet below:

from nemo.collections.tts.models.base import SpectrogramGenerator, Vocoder
from nemo.collections.tts.models import FastPitchModel, HifiGanModel

text = "这些新一代的中央处理器不只效能惊人,价格也十分有竞争力。"
device = "cpu"
fastpitch_model = FastPitchModel.restore_from("~/Downloads/aishell3.nemo").eval().to(device)
model = HifiGanModel.from_pretrained(model_name="tts_zh_hifigan_sfspeech").eval().to(device)

# Normalize the text and convert it into individual phonemes/tokens.
tokens = fastpitch_model.parse(text)

# Generate spectrogram from text
spectrogram = fastpitch_model.generate_spectrogram(tokens=tokens, speaker=5)

# Invert the spectrogram into audio samples
audio = model.convert_spectrogram_to_audio(spec=spectrogram)

# Convert output from pytorch tensor to numpy array
audio = audio.cpu().detach().numpy()

import soundfile as sf

sf.write("sample.wav", audio.T, 22050, format="WAV")

The sample.wav is here: aishell3.zip
Change the speaker id to 1863 speaker=1863 would make a male sound.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

@blisc @okuchaiev

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants