384 repeatable voice cloning #432

ghost · 2020-07-19T18:03:55Z

See #384. This PR adds a "--seed" option to make the output of the toolbox repeatable. It also implements a workaround for #53 by adding an option to trim silences in the vocoder output (caused by gaps in the spectrograms created during synthesizing).

…atability

…and tf.layers.dense()

demo_toolbox.py

ghost · 2020-07-19T23:31:10Z

Making some more improvements, will mark as ready when complete

…flags

Start implementing toolbox UI updates. Add some code for #53 trim silences (still in work)

…e removal

CorentinJ · 2020-07-20T08:24:44Z

Ensuring that the dropout in the prenet is set to inference mode at inference time would work too, it's the only source of randomness in tacotron

ghost · 2020-07-20T08:28:30Z

Ensuring that the dropout in the prenet is set to inference mode at inference time would work too, it's the only source of randomness in tacotron

Thanks for the info, I'll implement it and test it out. Much better than the brute force approach.

Edit: Will this suggestion make the synthesizer output unaffected by the state of the random number generator?

Because this tacotron is liable to produce gaps in the output (#53), I think it is preferable to keep the randomness, and allow for controlling it by setting the seed. When using the toolbox I repeatedly click "synthesize only" until a spectrogram with no large gaps appears. Vocoding is reserved for good spectrograms, especially since it is very slow with CPU inference.

ghost · 2020-07-20T10:07:12Z

I have performed additional experimentation to identify the minimum change needed for repeatability. Ready for review.

Some thoughts:

Tacotron's randomness is a feature that is useful for fixing the large gaps that it sometimes creates. I find it useful to control the synthesizer output by adjusting the seed.
It would be nice to get repeatable output without reloading the synthesizer and vocoder models on every use, but it works.

ghost · 2020-07-20T10:19:03Z

This PR resolves #384, and introduces a workaround for the problem identified in #53.
User interface after the proposed changes. "Random seed" and "Enhance vocoder output" are new.

mbdash · 2020-07-20T14:28:13Z

+1 for repeatability feature. Proposed changes looks neat.

@blue-fish I just want to say thank you for your work, you are adding features and fixing bugs that are really helpful and appreciated.
thank you to @CorentinJ for also allowing blue to make / integrate all the updates.
(also thanks to the original author of the feature, if i am not mistaken I saw someone else made the initial code change suggestion and blue is pimping it out)

ghost · 2020-07-20T17:24:38Z

Thank you for the kind words @mbdash , it is nice knowing that others also find these improvements worthwhile. Feel free to provide feedback to help guide development, though as usual we find ourselves long on ideas and short on developers.

…lected

ghost · 2020-07-20T18:53:06Z

Just pushed a fix for a small bug found during testing. Tacotron was incorrectly retaining the seed after the "random seed" checkbox transitioned from a checked to unchecked state. No further changes are expected.

mbdash · 2020-07-20T18:59:15Z

I would have not dared asking for anything, but since you mentioned it...

If I may ask for your opinion on 2 questions I have been thinking about:
(and I hope these are not stupid questions)

Q1

Do you see a way in the future to reduce / tweak the minimum output audio length below the minimum 5 sec?

For example,
Something that would allow input text lengths as low as single words such as:

Hi
Hi your-name-here
How are you
I'm fine thank you
yes
no
thank you

My understanding is that the minimum audio output length is around 5 sec.
I have experimented with 90, 70, 60, 50 and 40 characters of input text.
The minimum workable input seem to be 60-70 chars to fill that 5 sec of audio,
below that, the audio output is just weird / creepy.
The sweet spot seems to be a minimum of 80-90 characters to fill nicely the minimum 5 sec audio output.

Q2
this one is a weird one and might go against the design itself...

Would using a dataset purely generated by a single actor, result in a better audio output when reproducing solely that actor's voice?

and if so,

Do you have any guess of how big of a dataset would be required to reproduce the voice of a single voice actor?
1 to 1.
Essentially removing the capacity to reproduce any other voices properly when using that specific model,
for the purpose of achieving better cloning accuracy for a single voice.

ie:
a single voice actor reads 12h of transcript (or more)
then we can generate higher quality TTS for that single actor.

thank you for any feedback.

ghost · 2020-07-20T19:05:56Z

@mbdash Opened #433 to discuss your questions. Let's continue the conversation there.

CorentinJ · 2020-07-22T07:24:32Z

demo_cli.py

@@ -32,12 +32,13 @@
        "overhead but allows to save some GPU memory for lower-end GPUs.")
    parser.add_argument("--no_sound", action="store_true", help=\
        "If True, audio won't be played.")
+    parser.add_argument("--seed", type=int, default=None, help=\
+        "Optional random number seed value for repeatable output.")


Change "repeatable" to "deterministic" everywhere, that's more precise

blue-fish added 8 commits July 19, 2020 09:02

Changes for repeatable voice cloning

995a1f0

Cleanup

44eb8d0

Remove feature that sets seed on each synthesize. Not needed for repe…

6380d1c

…atability

Revert change to use kernel initializer for tf.nn.rnn_cell.GRUCell() …

2966ce7

…and tf.layers.dense()

Remove change for single-threaded tensorflow

4a49878

Remove another unneeded kernel initializer

74272c3

Add "--repeatable" option for toolbox

08b6fd9

Minor cleanup

8ad7544

ghost commented Jul 19, 2020

View reviewed changes

demo_toolbox.py Outdated Show resolved Hide resolved

ghost requested a review from CorentinJ July 19, 2020 18:06

ghost marked this pull request as draft July 19, 2020 23:30

blue-fish added 6 commits July 19, 2020 18:13

Separate "--repeatable" argument into "--reload_models" and "--seed" …

5a8c481

…flags

Remove "--reload_models" and infer its value based on seed

267a132

Start implementing toolbox UI updates. Add some code for #53 trim silences (still in work)

AttributeError: 'Toolbox' object has no attribute 'seed'

aa49d12

Changes for #53 (fix silences caused by gaps in spectrograms)

2cea297

Make modifications to demo_cli.py for repeatability and excess silenc…

2e98fc7

…e removal

Minor cleanups, cosmetic changes only

71a2291

Removed unnecessary changes for repeatability

04f4e0b

ghost marked this pull request as ready for review July 20, 2020 10:07

ghost mentioned this pull request Jul 20, 2020

Improving repeatability of voice cloning #384

Closed

In toolbox, reset synthesizer seed to None when random seed is not se…

8c6d428

…lected

ghost mentioned this pull request Jul 20, 2020

Questions about the toolbox from @mbdash #433

Closed

CorentinJ approved these changes Jul 22, 2020

View reviewed changes

ghost mentioned this pull request Jul 22, 2020

Single speaker fine-tuning process and results #437

Closed

Change 'repeatable' to 'deterministic' in comments

cb92557

ghost merged commit eaf5ec4 into CorentinJ:master Jul 22, 2020

ghost deleted the 384_repeatable_voice_cloning branch July 22, 2020 10:58

ghost mentioned this pull request Jul 24, 2020

Trim silences on output wavs by default, in toolbox and demo_cli.py #445

Closed

ghost mentioned this pull request Aug 17, 2020

Pytorch synthesizer #472

Merged

12 tasks

ghost mentioned this pull request Nov 3, 2020

Inconsistend results in the course of 24 hours #588

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

384 repeatable voice cloning #432

384 repeatable voice cloning #432

ghost commented Jul 19, 2020 •

edited by ghost

Loading

ghost commented Jul 19, 2020

CorentinJ commented Jul 20, 2020 •

edited

Loading

ghost commented Jul 20, 2020 •

edited by ghost

Loading

ghost commented Jul 20, 2020 •

edited by ghost

Loading

ghost commented Jul 20, 2020 •

edited by ghost

Loading

mbdash commented Jul 20, 2020

ghost commented Jul 20, 2020

ghost commented Jul 20, 2020

mbdash commented Jul 20, 2020

ghost commented Jul 20, 2020

CorentinJ Jul 22, 2020

384 repeatable voice cloning #432

384 repeatable voice cloning #432

Conversation

ghost commented Jul 19, 2020 • edited by ghost Loading

ghost commented Jul 19, 2020

CorentinJ commented Jul 20, 2020 • edited Loading

ghost commented Jul 20, 2020 • edited by ghost Loading

ghost commented Jul 20, 2020 • edited by ghost Loading

ghost commented Jul 20, 2020 • edited by ghost Loading

mbdash commented Jul 20, 2020

ghost commented Jul 20, 2020

ghost commented Jul 20, 2020

mbdash commented Jul 20, 2020

ghost commented Jul 20, 2020

CorentinJ Jul 22, 2020

Choose a reason for hiding this comment

ghost commented Jul 19, 2020 •

edited by ghost

Loading

CorentinJ commented Jul 20, 2020 •

edited

Loading

ghost commented Jul 20, 2020 •

edited by ghost

Loading

ghost commented Jul 20, 2020 •

edited by ghost

Loading

ghost commented Jul 20, 2020 •

edited by ghost

Loading