Which dataset do you use for VN - Male voice? #2

kingkong135 · 2023-09-09T03:28:25Z

First thanks for this great repo.
I have a question.
Are you using this viet-tts-dataset ? If so, do you have the preprocessing code before adding it to the training model?

NTT123 · 2023-09-11T03:00:42Z

Hi, I used the VietBibleVox dataset available at https://huggingface.co/datasets/ntt123/VietBibleVox. For preprocessing steps, please refer to this notebook: https://github.com/NTT123/light-speed/blob/main/prepare_vbx_tfdata.ipynb.

kingkong135 · 2023-09-12T15:32:58Z

Thank you very much, do you try with VITS 2 like https://github.com/p0p4k/vits2_pytorch ?

UncleBob2 · 2023-09-16T05:52:10Z

I am a bit confused here.

If I am running the prepare_vbx_tfdata.ipynb then I don't have to be concerned with the prepare_ljs_tfdata.ipynb correct?
what files or output should I get when I am Train an MFA model, then align speech and phonemes (creating a timestamp for each phoneme). Are these json files?
I got tfrecord files; however, they are 0 bytes

Your help is greatly appreciated.

NTT123 · 2023-09-16T06:06:15Z

Hi @UncleBob2,

If I am running the prepare_vbx_tfdata.ipynb, then I don't have to be concerned with the prepare_ljs_tfdata.ipynb, correct?

The VBX notebook is for preprocessing the VietBibleVox (Vietnamese) dataset, while the LJS notebook is for the LJSpeech (English) dataset. If you're focused on using the Vietnamese dataset, then prepare_ljs_tfdata.ipynb is irrelevant.

What files or output should I get when I am training an MFA model, then aligning speech and phonemes (creating a timestamp for each phoneme)? Are these JSON files?

You should expect to see multiple JSON files inside the data/VietBibleVox directory.

I got tfrecord files; however, they are 0 bytes.

This is unexpected. The files should not be empty. There is likely an issue when you ran the following command:

# replace `nproc` with `sysctl -n hw.physicalcpu` if you are using MacOS
!source miniconda/bin/activate aligner; \
mfa train \
    --num_jobs `nproc` \
    --use_mp \
    --clean \
    --overwrite \
    --no_textgrid_cleanup \
    --single_speaker \
    --output_format json \
    --output_directory VietBibleVox \
    VietBibleVox ./lexicon.txt vbx_mfa

UncleBob2 · 2023-09-16T19:11:32Z

Thanks for your prompt reply. I got it working and got the json files. I am currently running the train.py and it will take some time since my RTX3060 will not be arriving in 5 days. Once the model is trained, is it correct that I can then run the inference.ipynb file? BTW, how many epochs are we running? I notice that the code is set to run for up to 100,000 epochs (for epoch in range(_epoch + 1, 100_000):) Do we have a strategy for early stopping?

FYI, I am a newbie at TTS; hence, please bear with me as I am ramping up my understanding. I am looking for your guidance and hope to contribute to your project.

I can see that the attentions.py, commonys.py, and modules are called from the models.py.

Have a great day.

kingkong135 closed this as completed Sep 20, 2023

nganhtua mentioned this issue Sep 26, 2023

Training took forever to finish #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which dataset do you use for VN - Male voice? #2

Which dataset do you use for VN - Male voice? #2

kingkong135 commented Sep 9, 2023

NTT123 commented Sep 11, 2023

kingkong135 commented Sep 12, 2023

UncleBob2 commented Sep 16, 2023

NTT123 commented Sep 16, 2023

UncleBob2 commented Sep 16, 2023

Which dataset do you use for VN - Male voice? #2

Which dataset do you use for VN - Male voice? #2

Comments

kingkong135 commented Sep 9, 2023

NTT123 commented Sep 11, 2023

kingkong135 commented Sep 12, 2023

UncleBob2 commented Sep 16, 2023

NTT123 commented Sep 16, 2023

UncleBob2 commented Sep 16, 2023