-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer learning #223
Comments
I think maybe you can try some adaptation methods like Merlin does? |
There are a few things we can do.
1) Train on more voices
2) Penalize the model a bit more with dropout. Use extra prenet/dropout
layers.
3) In my case, I had to bump up the strength of the CBHG component (I use
Tacotron 1 setup). In Tacotron 2 we might simply want to use more
convolutional layers.
4) I am told by a few people (although this is not very helpful ...) that
other forms of attention might be worth trying. In particular, GMM
attention like in the voice loop paper.
pravn.wordpress.com
…On Mon, Jul 15, 2019 at 11:45 PM Yablon ***@***.***> wrote:
Hi, thanks for the great implementation. I was wondering if you have any
guidelines for performing transfer learning to smaller datasets? I have
been trying to start from the pre-trained model and then fit to a new and
smaller (~500 samples) dataset but the general problem I have is that the
attention alignments become very odd and it no longer aligns well. Are
there any tricks to do this better (lower learning rates, etc) or to
diagnose problems with alignment? Thank you.
I think maybe you can try some adaptation methods like Merlin does?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#223?email_source=notifications&email_token=AAEGOVJONS35CP4DOKTJIPLP7VU7HA5CNFSM4HZHLB7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ733MI#issuecomment-511688113>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEGOVPYEA2XAEBYY43P673P7VU7HANCNFSM4HZHLB7A>
.
|
Did you have any luck in getting the transfer to work? I'm looking to do something similar but unsure how to start - do I simply train up the large model and then replace the voice samples? |
Yes, you essentially train the large model and then replace that with your
smaller dataset. We recently put up our voice conversion work on arxiv:
https://arxiv.org/abs/1907.07769
There are several things to note.
1) Sampling rate differences: In my own experiments, I could not get things
to work @16 kHz, so we ran our experiments at 22 kHz.
2) Prosody variations: Typically, we want the adapted dataset to be
*similar* to the large training dataset. For example, if you train on
female voices (say, with ljspeech), then your adapted dataset might work
better with female voices. Train with more than one dataset.
3) Hyperparameters: Things like hop size matter. YMMV
4) Attention mechanism: Although we implement additive attention, it is
conceivable that better results might be obtained through other mechamisms.
In my work, I found multiplicative attention to work better, so we used
that. Even so, other people I spoke to suggested that GMM attention worked
for them best.
5) Hyperparameters: The more the merrier, when it comes to penalizing the
smaller dataset. So feel free to improve the capacity of the preprocessing
layers (CBHG, conv layers, etc.)
6) It is almost certain that additional supervision signal will help the
decoder inflate compressed encoder summary with extra hints. Just wait for
the soon to be released multispeaker model :).
pravn.wordpress.com
…On Mon, Jul 29, 2019 at 9:02 AM Energy Analyst ***@***.***> wrote:
Did you have any luck in getting the transfer to work? I'm looking to do
something similar but unsure how to start - do I simply train up the large
model and then replace the voice samples?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#223?email_source=notifications&email_token=AAEGOVJOT3V6LFUR6SW23MLQB4IATA5CNFSM4HZHLB7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3BFVSY#issuecomment-516053707>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEGOVNPUIQ36UZQLPR72Y3QB4IATANCNFSM4HZHLB7A>
.
|
@pravn |
Thanks a lot for the replies. I did try something after some unsuccessful results with less drastic approaches, although it might have been unnecessarily complicated. I altered the architecture and the optimization as follows: (1) I added a second attention layer [with the 'v' property initialized to zero] and a second linear projection initialized to zero, they are added to the original attention layer and linear projection; (2) I added two losses, one without the new additional layers that is still computed on the old dataset, and one with the new layers that is computed on the new dataset. They are added together with a mixed batch of old and new voice data. My hope was that this would regularize the model and help find a model that works on the two datasets with minimal changes on the original weights (freezing them was not working for me). I was expecting it would utterly fail since I never tried anything like this. It seems to capture the new voices and accents quite well, although I'm by no means an expert in TTS, so perhaps other might find it less good. At the moment my code is quite messy and I'm quite busy, but if this is interesting, after November I could try to block some time to clean it up and share it in a branch. |
@diego-s can you share the loss curves for your warm started model? |
Closing due to inactivity. |
Someone know the code for transfer learning ? |
Hi, thanks for the great implementation. I was wondering if you have any guidelines for performing transfer learning to smaller datasets? I have been trying to start from the pre-trained model and then fit to a new and smaller (~500 samples) dataset but the general problem I have is that the attention alignments become very odd and it no longer aligns well. Are there any tricks to do this better (lower learning rates, etc) or to diagnose problems with alignment? Thank you.
The text was updated successfully, but these errors were encountered: