Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warm start from published model #200

Open
ksaidin opened this issue May 28, 2020 · 15 comments
Open

Warm start from published model #200

ksaidin opened this issue May 28, 2020 · 15 comments

Comments

@ksaidin
Copy link

ksaidin commented May 28, 2020

Hi, anyone who tested a warm start from published model, can you please share your experience? I aim to train my model in order to improve inference male voice in unseen language. Do we convert model?

@ksaidin
Copy link
Author

ksaidin commented Jun 1, 2020

Still struggling with this issue. I've tried with converted and the other way with no success.

I've removed iteration and optimizer because published waveglow model doesn't have them. as suggested here.

I've removed some lines for multi gpu. I've issued and closed it just for the records here.

My dataset is >11 hrs male speaker, config.json is untouched other than checkpoint_path.

Still, just smooth noise while inferencing. (The loss turning to positive in some cases)

@sharathadavanne , I see you suggesting to use repo source as it is, do you recall any info that may be useful for me? What is the least iteration I can possibly hear something a little bit comprehensible.

@sharathadavanne
Copy link

Can you share some audio examples, of a) original, b) synthesized with training waveglow from scratch, and c) synthesized by using warm-start?

@sharathadavanne
Copy link

So from the attachment and the description above, I am guessing 'waveglow_v5' is synthesized using the model trained from scratch. 'waveglow_train' is synthesized using a warm start?

How long did you train for both the cases a) training from scratch and b)warm-start?

Your training recordings sound robotic, are these real spoken recordings? or are they output of some parametric TTS model? Anyway, I am more curious about the absolute zero-valued silences. This can hurt your waveglow training since it randomly samples a one-second segment from each of your recordings. And in the scenario, this absolute zero-valued segment is chosen it can result in weird loss values (like NaN). Did you observe anything weird in your training curve?

@ksaidin
Copy link
Author

ksaidin commented Jun 2, 2020

Yes it is a real spoken recording, tho my dataset was converted from mp3. Hope it doesn't affect the result. I've seen loss values gone positive numerous times, but not once observed NaN.

waveglow_v5 version is not the one trained from scratch, sorry for inconvenience. I don't have trained from scratch version I am experimenting to use pre-trained models' behaviour. I've trained tacotron model from warm start with this dataset for 50k iters (250 epochs) and I've got waveglow_v5 version. (where vocoder is just a published pretrained model)

As you mentioned here, I hear vowels exploding or hoarseness as you describe. Later on you suggest it can be corrected by training waveglow plus it is much faster to train starting from the pretrained waveglow model in followups.

The thing is, you said you've trained 500k iterations in 2 days. That is too much gpu power for me.

@AnkurDebnath35
Copy link

@sharathadavanne @ksaidin I am training waveglow for a different language and from scratch. Is there a way to warm start from a pretrained waveglow model like v1-v5?

@sharathadavanne
Copy link

@AnkurDebnath35 in my experience you dont have to worry about language or gender for training waveglow in warm start mode. So go ahead and train in warm start mode.

@AnkurDebnath35
Copy link

@AnkurDebnath35 in my experience you dont have to worry about language or gender for training waveglow in warm start mode. So go ahead and train in warm start mode.

I have already started training waveglow from scratch on a Hindi Dataset of length 9k, 870 epochs and the loss has come down to -5.8 to -6.0 in 25K iterations. But what I wanted to ask is how to do a warm start, I couldnt find any parameter for that. Please help

@sharathadavanne
Copy link

Download the pre-trained waveglow model given in the repo, and update the path of the downloaded model in 'checkpoint_path' variable of the config file.

@AnkurDebnath35
Copy link

Thanks a lot @sharathadavanne . By the way, does it help converge faster? I am running on single GPU and training is slow. Although I can run distributed. So, if I warm start and run distributed, how many epochs should be sufficient?

@sharathadavanne
Copy link

It definitely converges faster! No doubt in that. I haven't timed it, so dont have an estimate.

@AnkurDebnath35
Copy link

I have been reading some issues reported here, got to know that atleast 100 or 500 epochs is needed. Can you suggest something?

@rafaelvalle
Copy link
Contributor

@AnkurDebnath35 warm-start from the pre-trained Waveglow.
Please post Waveglow relatedissues on the Waveglow repo.

@rafaelvalle
Copy link
Contributor

@ksaidin Please share loss curves, predicted mel-spectrograms and alignments.

@v-nhandt21
Copy link

@AnkurDebnath35 warm-start from the pre-trained Waveglow.
Please post Waveglow relatedissues on the Waveglow repo.

image

Hello, my idol!
I am also trying to train waveglow from your pretrain on my language, but it does not include optimizer, so I use the default like this:

  • optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

The image shows my training process after 1 day. Can you teach me when should I stop, and the number of epochs is enough if I use pretrain?

Thank you <3

@crackedeggs1
Copy link

crackedeggs1 commented Apr 29, 2022

Download the pre-trained waveglow model given in the repo, and update the path of the downloaded model in 'checkpoint_path' variable of the config file.

This triggers KeyError 'iteration'. So there must be more steps you haven't mentioned in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants