Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which dataset do you use for VN - Male voice? #2

Closed
kingkong135 opened this issue Sep 9, 2023 · 5 comments
Closed

Which dataset do you use for VN - Male voice? #2

kingkong135 opened this issue Sep 9, 2023 · 5 comments

Comments

@kingkong135
Copy link

First thanks for this great repo.
I have a question.
Are you using this viet-tts-dataset ? If so, do you have the preprocessing code before adding it to the training model?

@NTT123
Copy link
Owner

NTT123 commented Sep 11, 2023

Hi, I used the VietBibleVox dataset available at https://huggingface.co/datasets/ntt123/VietBibleVox. For preprocessing steps, please refer to this notebook: https://github.com/NTT123/light-speed/blob/main/prepare_vbx_tfdata.ipynb.

@kingkong135
Copy link
Author

Thank you very much, do you try with VITS 2 like https://github.com/p0p4k/vits2_pytorch ?

@UncleBob2
Copy link

I am a bit confused here.

  • If I am running the prepare_vbx_tfdata.ipynb then I don't have to be concerned with the prepare_ljs_tfdata.ipynb correct?

  • what files or output should I get when I am Train an MFA model, then align speech and phonemes (creating a timestamp for each phoneme). Are these json files?

  • I got tfrecord files; however, they are 0 bytes

Your help is greatly appreciated.

@NTT123
Copy link
Owner

NTT123 commented Sep 16, 2023

Hi @UncleBob2,

If I am running the prepare_vbx_tfdata.ipynb, then I don't have to be concerned with the prepare_ljs_tfdata.ipynb, correct?

The VBX notebook is for preprocessing the VietBibleVox (Vietnamese) dataset, while the LJS notebook is for the LJSpeech (English) dataset. If you're focused on using the Vietnamese dataset, then prepare_ljs_tfdata.ipynb is irrelevant.

What files or output should I get when I am training an MFA model, then aligning speech and phonemes (creating a timestamp for each phoneme)? Are these JSON files?

You should expect to see multiple JSON files inside the data/VietBibleVox directory.

I got tfrecord files; however, they are 0 bytes.

This is unexpected. The files should not be empty. There is likely an issue when you ran the following command:

# replace `nproc` with `sysctl -n hw.physicalcpu` if you are using MacOS
!source miniconda/bin/activate aligner; \
mfa train \
    --num_jobs `nproc` \
    --use_mp \
    --clean \
    --overwrite \
    --no_textgrid_cleanup \
    --single_speaker \
    --output_format json \
    --output_directory VietBibleVox \
    VietBibleVox ./lexicon.txt vbx_mfa

@UncleBob2
Copy link

Thanks for your prompt reply. I got it working and got the json files. I am currently running the train.py and it will take some time since my RTX3060 will not be arriving in 5 days. Once the model is trained, is it correct that I can then run the inference.ipynb file? BTW, how many epochs are we running? I notice that the code is set to run for up to 100,000 epochs (for epoch in range(_epoch + 1, 100_000):) Do we have a strategy for early stopping?

FYI, I am a newbie at TTS; hence, please bear with me as I am ramping up my understanding. I am looking for your guidance and hope to contribute to your project.

I can see that the attentions.py, commonys.py, and modules are called from the models.py.

Have a great day.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants