detailed work pipeline to train a multi-speaker flowtron model #113

JohnHerry · 2021-04-06T12:03:04Z

Hi, all,
I am new to this job, had any body try to train a flowtron in multi-speaker model?
It seems there need a TWO-STAGE trainging for flowtron. But there is only one config.json file. I don't know how to modify this config in the two-stage trining. What does the “n_flows” mean?
Is there any demo for a multi-speaker instance? and if my language is not English, what are the work steps should I do?

rafaelvalle · 2021-04-06T14:37:02Z

We provide a checkpoint for libritts with over 2k speakers. Turn the attention prior to True before training. After training for some time,set it to false once the model has learned to attend and resume training

…

On Tue, Apr 6, 2021, 5:03 AM JohnHerry ***@***.***> wrote: Hi, all, I am new to this job, had any body try to train a flowtron in multi-speaker model? It seems there need a TWO-STAGE trainging for flowtron. But there is only one config.json file. I don't know how to modify this config in the two-stage trining. Is there any demo for a multi-speaker instance? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#113>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARSFDYPLFJPMLVJMTYOW4TTHL2A3ANCNFSM42OTBSJQ> .

JohnHerry · 2021-04-07T04:40:27Z

@rafaelvalle Thanks for you help. I am training flowtron on other languages instead of English, So I had to train from scratch. There is no pretrained tacotron2 model for me as a text encoder, So what I need is to train a tacotron2 on my mulit-speech corpus?

rafaelvalle · 2021-04-07T05:05:43Z

no, you will not need tacotron 2.
just make sure to turn the attention prior to True until the model learns attention. it's ok to train 2 steps of flow at once.
then turn the attention prior to False and resume training.
https://github.com/NVIDIA/flowtron/blob/master/config.json#L34

JohnHerry · 2021-04-07T10:19:42Z

@rafaelvalle
My config.json is as follows:

I change three values according to my dataset, and I did not set the use_attn_prior, instead, I restirctly using the training command in your document:

python train.py -c config.json -p data_config.use_attn_prior=1

in our dataset there are speech of 67 hours from 142 speakers

Should I firstly change the parameter "n_flows" as value 1 at start step to good attention, then as value 2 at the second step? and so on?

How many steps should I train to get the first step attention?

JohnHerry · 2021-04-12T01:33:26Z

I had run first step from scratch for three days: Totally 6 RTX3090 GPU, but the attention still seems strange now. Is there any problem?

JohnHerry · 2021-04-15T13:36:26Z

@rafaelvalle What does the x-ticks and y-ticks mean in the attetion plot? I see attention channels are 640, while my attention image above makes x-ticks to 200 and y-ticks to 70; what does these mean?

I had used config.json with n_texts=200; I saw that there are little samples whose text length over 160; so I removed samples whose text length are bigger then 160. But the Attention picture is not good too.

Is there any suggestion about how to use attention plot to find my problems? I think most of those problems are about preprocessing. though.

JohnHerry · 2021-05-06T05:39:38Z

My corpus is about multiple speakers, but my speaker ids are not consistant integers. There are 142 different speakers , while speaker id range from 1 to 240, many middle speaker samples are deleted dure to low count of samples. Is this the reason for the bad attention?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detailed work pipeline to train a multi-speaker flowtron model #113

detailed work pipeline to train a multi-speaker flowtron model #113

JohnHerry commented Apr 6, 2021 •

edited

Loading

rafaelvalle commented Apr 6, 2021 via email

JohnHerry commented Apr 7, 2021

rafaelvalle commented Apr 7, 2021

JohnHerry commented Apr 7, 2021

JohnHerry commented Apr 12, 2021

JohnHerry commented Apr 15, 2021 •

edited

Loading

JohnHerry commented May 6, 2021

detailed work pipeline to train a multi-speaker flowtron model #113

detailed work pipeline to train a multi-speaker flowtron model #113

Comments

JohnHerry commented Apr 6, 2021 • edited Loading

rafaelvalle commented Apr 6, 2021 via email

JohnHerry commented Apr 7, 2021

rafaelvalle commented Apr 7, 2021

JohnHerry commented Apr 7, 2021

JohnHerry commented Apr 12, 2021

JohnHerry commented Apr 15, 2021 • edited Loading

JohnHerry commented May 6, 2021

JohnHerry commented Apr 6, 2021 •

edited

Loading

JohnHerry commented Apr 15, 2021 •

edited

Loading