Transfer learning #223

diego-s · 2019-06-19T08:54:24Z

Hi, thanks for the great implementation. I was wondering if you have any guidelines for performing transfer learning to smaller datasets? I have been trying to start from the pre-trained model and then fit to a new and smaller (~500 samples) dataset but the general problem I have is that the attention alignments become very odd and it no longer aligns well. Are there any tricks to do this better (lower learning rates, etc) or to diagnose problems with alignment? Thank you.

Yablon · 2019-07-16T06:45:06Z

Hi, thanks for the great implementation. I was wondering if you have any guidelines for performing transfer learning to smaller datasets? I have been trying to start from the pre-trained model and then fit to a new and smaller (~500 samples) dataset but the general problem I have is that the attention alignments become very odd and it no longer aligns well. Are there any tricks to do this better (lower learning rates, etc) or to diagnose problems with alignment? Thank you.

I think maybe you can try some adaptation methods like Merlin does?

pravn · 2019-07-16T16:03:50Z

There are a few things we can do. 1) Train on more voices 2) Penalize the model a bit more with dropout. Use extra prenet/dropout layers. 3) In my case, I had to bump up the strength of the CBHG component (I use Tacotron 1 setup). In Tacotron 2 we might simply want to use more convolutional layers. 4) I am told by a few people (although this is not very helpful ...) that other forms of attention might be worth trying. In particular, GMM attention like in the voice loop paper. pravn.wordpress.com

…

On Mon, Jul 15, 2019 at 11:45 PM Yablon ***@***.***> wrote: Hi, thanks for the great implementation. I was wondering if you have any guidelines for performing transfer learning to smaller datasets? I have been trying to start from the pre-trained model and then fit to a new and smaller (~500 samples) dataset but the general problem I have is that the attention alignments become very odd and it no longer aligns well. Are there any tricks to do this better (lower learning rates, etc) or to diagnose problems with alignment? Thank you. I think maybe you can try some adaptation methods like Merlin does? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#223?email_source=notifications&email_token=AAEGOVJONS35CP4DOKTJIPLP7VU7HA5CNFSM4HZHLB7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ733MI#issuecomment-511688113>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEGOVPYEA2XAEBYY43P673P7VU7HANCNFSM4HZHLB7A> .

Energyanalyst · 2019-07-29T16:02:16Z

Did you have any luck in getting the transfer to work? I'm looking to do something similar but unsure how to start - do I simply train up the large model and then replace the voice samples?

pravn · 2019-07-30T01:20:42Z

Yes, you essentially train the large model and then replace that with your smaller dataset. We recently put up our voice conversion work on arxiv: https://arxiv.org/abs/1907.07769 There are several things to note. 1) Sampling rate differences: In my own experiments, I could not get things to work @16 kHz, so we ran our experiments at 22 kHz. 2) Prosody variations: Typically, we want the adapted dataset to be *similar* to the large training dataset. For example, if you train on female voices (say, with ljspeech), then your adapted dataset might work better with female voices. Train with more than one dataset. 3) Hyperparameters: Things like hop size matter. YMMV 4) Attention mechanism: Although we implement additive attention, it is conceivable that better results might be obtained through other mechamisms. In my work, I found multiplicative attention to work better, so we used that. Even so, other people I spoke to suggested that GMM attention worked for them best. 5) Hyperparameters: The more the merrier, when it comes to penalizing the smaller dataset. So feel free to improve the capacity of the preprocessing layers (CBHG, conv layers, etc.) 6) It is almost certain that additional supervision signal will help the decoder inflate compressed encoder summary with extra hints. Just wait for the soon to be released multispeaker model :). pravn.wordpress.com

…

On Mon, Jul 29, 2019 at 9:02 AM Energy Analyst ***@***.***> wrote: Did you have any luck in getting the transfer to work? I'm looking to do something similar but unsure how to start - do I simply train up the large model and then replace the voice samples? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#223?email_source=notifications&email_token=AAEGOVJOT3V6LFUR6SW23MLQB4IATA5CNFSM4HZHLB7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3BFVSY#issuecomment-516053707>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEGOVNPUIQ36UZQLPR72Y3QB4IATANCNFSM4HZHLB7A> .

Yeongtae · 2019-07-30T02:41:18Z

@pravn
I have read your paper.
It's great.
In addition, I'm so sorry that I didn't reply to your email.
It's my mistake.
Because I had forgotten to reply.

diego-s · 2019-07-31T13:13:45Z

Thanks a lot for the replies. I did try something after some unsuccessful results with less drastic approaches, although it might have been unnecessarily complicated. I altered the architecture and the optimization as follows: (1) I added a second attention layer [with the 'v' property initialized to zero] and a second linear projection initialized to zero, they are added to the original attention layer and linear projection; (2) I added two losses, one without the new additional layers that is still computed on the old dataset, and one with the new layers that is computed on the new dataset. They are added together with a mixed batch of old and new voice data. My hope was that this would regularize the model and help find a model that works on the two datasets with minimal changes on the original weights (freezing them was not working for me). I was expecting it would utterly fail since I never tried anything like this. It seems to capture the new voices and accents quite well, although I'm by no means an expert in TTS, so perhaps other might find it less good. At the moment my code is quite messy and I'm quite busy, but if this is interesting, after November I could try to block some time to clean it up and share it in a branch.

rafaelvalle · 2019-08-01T03:52:04Z

@diego-s can you share the loss curves for your warm started model?

rafaelvalle · 2019-10-26T04:53:08Z

Closing due to inactivity.

tiomaldy · 2021-07-20T16:45:09Z

Someone know the code for transfer learning ?

rafaelvalle closed this as completed Oct 26, 2019

dtfasteas mentioned this issue Jan 5, 2022

Transfer learning r9y9/ttslearn#31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer learning #223

Transfer learning #223

diego-s commented Jun 19, 2019 •

edited

Loading

Yablon commented Jul 16, 2019

pravn commented Jul 16, 2019 via email

Energyanalyst commented Jul 29, 2019

pravn commented Jul 30, 2019 via email

Yeongtae commented Jul 30, 2019 •

edited

Loading

diego-s commented Jul 31, 2019 •

edited

Loading

rafaelvalle commented Aug 1, 2019

rafaelvalle commented Oct 26, 2019

tiomaldy commented Jul 20, 2021

Transfer learning #223

Transfer learning #223

Comments

diego-s commented Jun 19, 2019 • edited Loading

Yablon commented Jul 16, 2019

pravn commented Jul 16, 2019 via email

Energyanalyst commented Jul 29, 2019

pravn commented Jul 30, 2019 via email

Yeongtae commented Jul 30, 2019 • edited Loading

diego-s commented Jul 31, 2019 • edited Loading

rafaelvalle commented Aug 1, 2019

rafaelvalle commented Oct 26, 2019

tiomaldy commented Jul 20, 2021

diego-s commented Jun 19, 2019 •

edited

Loading

Yeongtae commented Jul 30, 2019 •

edited

Loading

diego-s commented Jul 31, 2019 •

edited

Loading