Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help making Italian Vocoder/Synthesizer #697

Closed
xzVice opened this issue Mar 8, 2021 · 11 comments
Closed

Help making Italian Vocoder/Synthesizer #697

xzVice opened this issue Mar 8, 2021 · 11 comments

Comments

@xzVice
Copy link

xzVice commented Mar 8, 2021

Let's suppose I got the Italian dataset from here (ASR one, flac) http://www.openslr.org/94/
How am I supposed to create all the pretrained models from it (the .pt files, for vocoder, synthesizer and encoder)?

@ghost
Copy link

ghost commented Mar 8, 2021

Please start by reading my advice on training. This contains the link to training documentation: #431 (comment)

If I were doing this, I would reuse the encoder and vocoder models. For the synthesizer, you have the option of training from scratch or finetuning the English model. Training from scratch should give better pronunciation and prosody. Finetuning will reduce training time and possibly have better voice similarity. If you finetune, modify the text cleaner to remove diacritics from vowels (change à to a, è and é to e, etc.). This is necessary since the English synthesizer does not include these characters in symbols.py.

@xzVice
Copy link
Author

xzVice commented Mar 8, 2021

Please start by reading my advice on training. This contains the link to training documentation: #431 (comment)

If I were doing this, I would reuse the encoder and vocoder models. For the synthesizer, you have the option of training from scratch or finetuning the English model. Training from scratch should give better pronunciation and prosody. Finetuning will reduce training time and possibly have better voice similarity. If you finetune, modify the text cleaner to remove diacritics from vowels (change à to a, è and é to e, etc.). This is necessary since the English synthesizer does not include these characters in symbols.py.

So, I tried doing what you told me to do and everything was doing well until the synthesizer_train.py command...
Here is the execution of all the commands contained there https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training (till the train one ofc, which thrown the error)
Any idea? 🤔

I also noticed those weird symbols inside the SV2TTS/synthesizer/train.txt file...
image
Is it normal? I tried to edit the symbols.py/cleaners.py files but doing that didn't fix it... but anyways this is probably not what's causing the crash of the train command...

C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>py -3.6 synthesizer_preprocess_audio.py datasets_root --datasets_name LibriTTS --subfolders testing --no_alignments
Arguments:
    datasets_root:   datasets_root
    out_dir:         datasets_root\SV2TTS\synthesizer
    n_processes:     None
    skip_existing:   False
    hparams:
    no_alignments:   True
    datasets_name:   LibriTTS
    subfolders:      testing

Using data from:
    datasets_root\LibriTTS\testing
LibriTTS: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:09<00:00,  9.52s/speakers]
The dataset consists of 9 utterances, 7450 mel frames, 1488960 audio timesteps (0.03 hours).
Max input length (text chars): 140
Max mel frames length: 889
Max audio timesteps length: 177600





C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>python synthesizer_preprocess_embeds.py datasets_root/SV2TTS/synthesizer
Arguments:
    synthesizer_root:      datasets_root\SV2TTS\synthesizer
    encoder_model_fpath:   encoder\saved_models\pretrained.pt
    n_processes:           4

Embedding:   0%|                                                                         | 0/9 [00:00<?, ?utterances/s]Loaded encoder "pretrained.pt" trained to step 1564501
Loaded encoder "pretrained.pt" trained to step 1564501
Loaded encoder "pretrained.pt" trained to step 1564501
Loaded encoder "pretrained.pt" trained to step 1564501
Embedding: 100%|█████████████████████████████████████████████████████████████████| 9/9 [00:05<00:00,  1.73utterances/s]





C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>python synthesizer_train.py testing datasets_root/SV2TTS/synthesizer
Arguments:
    run_id:          testing
    syn_dir:         datasets_root/SV2TTS/synthesizer
    models_dir:      synthesizer/saved_models/
    save_every:      1000
    backup_every:    25000
    force_restart:   False
    hparams:

Checkpoint path: synthesizer\saved_models\testing\testing.pt
Loading training data from: datasets_root\SV2TTS\synthesizer\train.txt
Using model: Tacotron
Using device: cpu

Initialising Tacotron Model...

Trainable Parameters: 30.876M

Starting the training of Tacotron from scratch

Using inputs from:
        datasets_root\SV2TTS\synthesizer\train.txt
        datasets_root\SV2TTS\synthesizer\mels
        datasets_root\SV2TTS\synthesizer\embeds
Found 9 samples
+----------------+------------+---------------+------------------+
| Steps with r=2 | Batch Size | Learning Rate | Outputs/Step (r) |
+----------------+------------+---------------+------------------+
|   20k Steps    |     12     |     0.001     |        2         |
+----------------+------------+---------------+------------------+

Traceback (most recent call last):
  File "synthesizer_train.py", line 35, in <module>
    train(**vars(args))
  File "C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning\synthesizer\train.py", line 158, in train
    for i, (texts, mels, embeds, idx) in enumerate(data_loader, 1):
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 355, in __iter__
    return self._get_iterator()
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 914, in __init__
    w.start()
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'train.<locals>.<lambda>'

C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

@ghost
Copy link

ghost commented Mar 8, 2021

I don't have time to fully troubleshoot issues, but this may help. If not, you'll need to figure it out yourself.

Weird characters in train.txt

Problem may be coming from this line, which reads the transcripts:

with text_fpath.open("r") as text_file:

Try adding utf-8 file encoding.

with text_fpath.open("r", encoding="utf-8") as text_file:

Error running synthesizer_train.py

For a soluton to:

AttributeError: Can't pickle local object 'train.<locals>.<lambda>'
EOFError: Ran out of input

Please see #669 (comment) for a workaround. We set num_workers=0 on Windows.

@xzVice
Copy link
Author

xzVice commented Mar 9, 2021

Thanks! Now both errors got solved... but it's really slow (the 20000 steps train command)... also idk why it says Using device: cpu even tho I installed the latest cuda toolkit and I got a GTX 1050 Ti...

@xzVice
Copy link
Author

xzVice commented Mar 9, 2021

Nevermind I had the cpu version of pytorch installed...

@AVTV64
Copy link

AVTV64 commented Mar 14, 2021

Let's suppose I got the Italian dataset from here (ASR one, flac) http://www.openslr.org/94/
How am I supposed to create all the pretrained models from it (the .pt files, for vocoder, synthesizer and encoder)?

HI, can you release the Italian models you trained? How do I set it up? I want to clone voices in this language.

@frossi65
Copy link

@ArianaGlande
hello, i am looking for italian models. let me know if i can help to train the model. i have a rtx2070 gpu.

@FedericoFedeFede
Copy link

@ArianaGlande I'm also looking for it. If you managed to do that, it would be very helpful sharing that with us. Thanks

@ghost ghost closed this as completed Apr 1, 2021
@TalissaDreossi
Copy link

I'm trying to do the same and as @blue-fish said (if I got it correct) I just need to train the synthesizer so I have to skip the first steps in https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training#datasets until I reach the:
"Begin with the audios and the mel spectrograms:
python synthesizer_preprocess_audio.py <datasets_root>".
Is it right? If so, how have I to structure my dataset? I have downloaded the italian one from http://www.openslr.org/94/ but I don't know if I have to preprocess it before running the instruction above (in other words I don't know what it is expected in <datasets_root>)
Thanks in advance

@ghost ghost mentioned this issue Oct 8, 2021
@alessandrolamberti
Copy link

@ArianaGlande Hi, how did you manage to preprocess the italian dataset into the format the scripts accept?

@Alex2610
Copy link

can please someone upload the pretrained models?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants