-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detailed work pipeline to train a multi-speaker flowtron model #113
Comments
We provide a checkpoint for libritts with over 2k speakers.
Turn the attention prior to True before training. After training for some
time,set it to false once the model has learned to attend and resume
training
…On Tue, Apr 6, 2021, 5:03 AM JohnHerry ***@***.***> wrote:
Hi, all,
I am new to this job, had any body try to train a flowtron in
multi-speaker model?
It seems there need a TWO-STAGE trainging for flowtron. But there is only
one config.json file. I don't know how to modify this config in the
two-stage trining.
Is there any demo for a multi-speaker instance?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#113>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARSFDYPLFJPMLVJMTYOW4TTHL2A3ANCNFSM42OTBSJQ>
.
|
@rafaelvalle Thanks for you help. I am training flowtron on other languages instead of English, So I had to train from scratch. There is no pretrained tacotron2 model for me as a text encoder, So what I need is to train a tacotron2 on my mulit-speech corpus? |
no, you will not need tacotron 2. |
@rafaelvalle I change three values according to my dataset, and I did not set the use_attn_prior, instead, I restirctly using the training command in your document:
in our dataset there are speech of 67 hours from 142 speakers Should I firstly change the parameter "n_flows" as value 1 at start step to good attention, then as value 2 at the second step? and so on? How many steps should I train to get the first step attention? |
@rafaelvalle What does the x-ticks and y-ticks mean in the attetion plot? I see attention channels are 640, while my attention image above makes x-ticks to 200 and y-ticks to 70; what does these mean? I had used config.json with n_texts=200; I saw that there are little samples whose text length over 160; so I removed samples whose text length are bigger then 160. But the Attention picture is not good too. Is there any suggestion about how to use attention plot to find my problems? I think most of those problems are about preprocessing. though. |
My corpus is about multiple speakers, but my speaker ids are not consistant integers. There are 142 different speakers , while speaker id range from 1 to 240, many middle speaker samples are deleted dure to low count of samples. Is this the reason for the bad attention? |
Hi, all,
I am new to this job, had any body try to train a flowtron in multi-speaker model?
It seems there need a TWO-STAGE trainging for flowtron. But there is only one config.json file. I don't know how to modify this config in the two-stage trining. What does the “n_flows” mean?
Is there any demo for a multi-speaker instance? and if my language is not English, what are the work steps should I do?
The text was updated successfully, but these errors were encountered: