Cannot change speaker for interpolation #35

DamienToomey · 2020-06-18T15:11:54Z

Hello,

I am trying to interpolate between two speakers. I am using the model pretrained on LibriTTS.

I have read the issue "How is interpolation between speakers performed?" #33 but I still cannot manage to make it work.

Here are the steps I have followed:

gate_threshold = 1 (as mentioned in How is interpolation between speakers performed? #33)
set 'dummy_speaker_embedding = True` in config.json as in the paper is written "For the experiment without speaker embeddings we interpolate between Sally and Helen using the phrase “We are testing this model.”."
I have removed seeds torch.manual_seed(seed) and torch.cuda.manual_seed(seed) from inference.py
z_1 ∼ N(0, 0.5) (as in paper)
z_2 ∼ N(0, 0.5) (as in paper)
interpolation
reset gate_threshold = 0.5
model.infer
waveglow.infer

But when sampling z_1 and z_2, even multiple times, after generating the spectrogram with the pretrained Flowtron and generating the audio with the pretrained WaveGlow, the speaker sounds the same, only the audio quality seems to vary. (z_1 and z_2 have different values)

Could you tell me which of the above steps I have done wrong or if I have forgotten any steps?
Once I have found z_1 and z_2 that I want to interpolate, do I have to reset gate_threshold = 0.5 before interpolation?
Why did we have to set gate_threshold = 1 in the first place when looking for z_1 and z_2?

Thanks

The text was updated successfully, but these errors were encountered:

rafaelvalle · 2020-06-25T17:05:07Z

You need to make sure z_1 and z_2 produce samples from different speakers.
Sample z_1 once, perform inference and memorize the speaker's voice.
Keep sampling z_2, performing inference and listening to the samples produced with z_2 until the speaker you hear is different from the speaker produced with z_1.
You can interpolate once you have z_1 and z_2 values associated with different speakers.
It is safer to let gate_threshold = 1 and prune the audio later.

DamienToomey · 2020-06-26T08:20:54Z

I have also model_config['dummy_speaker_embedding'] = True

I keep sampling z_2, performing inference and listening to the samples produced with z_2 but the speaker's voice sounds the same as the voice produced with z_1. By the way, it is always a female voice. Do you have any idea why this might be happening ?

rafaelvalle · 2020-06-26T20:27:20Z

Are you using the LibriTTS model?

DamienToomey · 2020-06-27T23:35:46Z

Yes I am using the LibriTTS model

rafaelvalle · 2020-08-07T23:37:47Z

Hey Damien, the pre-trained LibriTTS model available in our repo has speaker embeddings.

You need to train a model without speaker embeddings, i.e. model_config['dummy_speaker_embedding'] = True, to be able to interpolate between speakers through interpolation in the latent space.

You can warm-start from the pre-trained LibriTTS model with speaker embeddings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot change speaker for interpolation #35

Cannot change speaker for interpolation #35

DamienToomey commented Jun 18, 2020

rafaelvalle commented Jun 25, 2020

DamienToomey commented Jun 26, 2020

rafaelvalle commented Jun 26, 2020

DamienToomey commented Jun 27, 2020

rafaelvalle commented Aug 7, 2020

Cannot change speaker for interpolation #35

Cannot change speaker for interpolation #35

Comments

DamienToomey commented Jun 18, 2020

rafaelvalle commented Jun 25, 2020

DamienToomey commented Jun 26, 2020

rafaelvalle commented Jun 26, 2020

DamienToomey commented Jun 27, 2020

rafaelvalle commented Aug 7, 2020