Clear process for generating custom voice #284

JRMeyer · 2021-03-07T08:59:49Z

JRMeyer
Mar 7, 2021
Maintainer

>>> vocajon
[October 15, 2020, 12:48pm]

Hi,

Newbie here, so apologies if I'm missing the obvious. I am trying to
achieve the following and think the steps are as follows, but have a few
gaps. Please can someone help me fill in the gaps/answer the inline
questions in Italics? And of course suggest any steps that may be
missing. Maybe this can become a clear quick-start guide for this
particular aim. Thank you.

Aim: To install Mozilla TTS on a Linux machine, and fine-tune a
pre-trained LJSpeech with a new voice of my own.

Steps: slash
1)Install cuda (this will allow the nvidia GPU to be used) slash
sudo apt-get install cuda

2. Install Mozilla TTS using the Simple packaging method detailed
here: slash
https://github.com/mozilla/TTS/wiki/Released-Models slash
Using package: slash
https://github.com/reuben/TTS/releases/download/ljspeech-fwd-attn-pwgan/TTS-0.0.1+92aea2a-py3-none-any.whl

Check audio synth is working by running it up with: slash
python3 -m TTS.server.server slash
Open web page to http://localhost:5002. Enter some text and check the
wav file is produced and can play OK.

Check CUDA is working by: slash
python3 -m TTS.train slash
Before the usage prompt you should see some info about CUDA:

> Using CUDA: True slash
> Number of GPUs: 1 slash
> If it's working Using CUDA should be True, and Number of GPUs match
> what is installed in your machine.

3. Record a set of wav files in the LJSpeech format - 22050Hz 16-bit
Mono WAV slash
Recommended duration:Single sentences 5 - 10 seconds each slash
Remove any silence at the beginning and end of each recording. slash
Normalize the audio level. slash
Minimum number of files/combined duration: Help please

4)Store the wavs in a dataset folder with the following structure: slash
slash |- yourchosenname slash
slash |- metadata.csv slash
slash |- wavs slash
slash |- xxxx-0001.wav slash
slash |- xxxx-0002.wav

5. Create metadata.csv inside your dataset folder with the following
format: slash
xxxx-0001 slash |There were 50 people in the room. slash |There were fifty
people in the room. slash
xxxx-0002 slash |Mr Jones is the friendly local butcher. slash |Mister Jones is
the friendly local butcher. slash
Using pipes to separate the 3 columns. slash
Where the 3rd column expands numbers, titles etc.

Help please - is it necessary to split this into metadata_train.csv and
metadata_val.csv? I saw suggestions val should be about 10% of the size
of train?

6. Pre-process Please Help - I think there should be some kind of
preprocess stage here? But don't see any .py for doing this in this
repo? Is it done automatically as part of the training? I'm guessing
so as there are params about do_trim_silence etc.

7. Prepare a config.json for your new dataset slash
Please help - I am not sure what needs changing in here. I'm
thinking the following?: slash
restore_path - set to the new dataset? But I'm not sure how to get
the dataset into .pth.rar format? slash
run_name - set to new dataset name (although probably not
mandatory?) slash
run_description - describe new dataset slash
mel_fmin - set to slash ~50 for male, slash ~95 for female slash
batch_size - 32 standard. I understand there are issues with GPU's
with smaller amounts of memory, and that you really need 16GB. Are
there any recommendations (e.g. drop this value) if trying to use a
4-8GB GPU? Or are you just wasting your time? slash
output_path - do these need changing? slash
datasets - set the name and path to match your new dataset path?

8. Fine tune the model: slash
python3 -m TTS.train slash --config_path TTS/tts/configs/config.json
slash --restore_path /path/to/your/model.pth.tar

9. Run up the server as in step 2 and test the new voice.

[This is an archived TTS discussion thread from discourse.mozilla.org/t/clear-process-for-generating-custom-voice]

JRMeyer · 2021-03-07T08:59:51Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[October 25, 2020, 1:13am]

couple notes;

for step 1 it is way easier to use conda. You don't manually deal with
cuda which might be a problem.

you also need to find the right audio values for your dataset like
silence threshold, normalization, etc. Our data analysis notebook can
help. slash

{.site-icon
GitHub

### mozilla/TTS

forum: https://discourse.mozilla.org/c/tts) - mozilla/TTS

You should also remove noisy samples and do some running based on
quality. Again you can use CheckSNR notebook for this to start.

Also if you create your own dataset, you should also perform a phoneme
coverage filtering to create your transcript set as representative as
possible for the target language.

BTW, if there is any volunteer to create a nice script or notebook to
automate these steps to make life easier for beginners, we can work on
that together.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:59:54Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> rdh
[October 30, 2020, 11:48am]

> Minimum number of files/combined duration: Help please

I've been wanting to find the answer to this myself.

In my
experience
for Dutch, 4.000 recordings were not sufficient for transfer learning
using Tacotron DDC, using near to default configuration.

For completeness, I started off from this
model,
which was trained with 15.000 fragments.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:59:57Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[October 30, 2020, 1:05pm]

can you share a couple of samples from your dataset to see the quality?
I can comment better.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T08:59:59Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> rdh
[October 30, 2020, 1:43pm]

I appended some
samples
of the second dataset. Samples from the original dataset can be found on
my dataset repo.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear process for generating custom voice #284

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Clear process for generating custom voice #284

JRMeyer Mar 7, 2021 Maintainer

Replies: 4 comments

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author