Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed loss during prior training #106

Closed
chebmarcel opened this issue Jul 22, 2022 · 10 comments
Closed

Fixed loss during prior training #106

chebmarcel opened this issue Jul 22, 2022 · 10 comments

Comments

@chebmarcel
Copy link

Hello,

It seems that i'm missing something in the training procedure.
I trained a RAVE model for about 650K steps (after which it seemed to plateau).

Then exported it and started training the prior. Weirdly I am already at 500K steps and the loss is not decreasing, it seems to be stuck between 3,19 and 3,12.

When i try to get audio sample in Tensorboard, the prior model outputs noise whereas the rave one is doing ok.

I am not sure what i'm doing wrong, I read some people talking about phase 2 kicking in, but im also confused wether this is the second step (the GAN training) or something happening within the first step.

If someone could shed some light on these questions that would be super helpful.
Thanks!

@moiseshorta
Copy link

It seems you need to finish training until the second phase of training, which by default kicks in after 1 Million steps, or you can set it custom with the --warmup flag.

@chebmarcel
Copy link
Author

@moiseshorta Thanks for your reply! Indeed after phase 2 kicks in I get much better results
I am still a bit confused regarding the prior training since phase 2 already seems to be using a GAN framework.
What does the prior training exactly adds from the RAVE model and how long do you train your model on it ?

@moiseshorta
Copy link

moiseshorta commented Aug 1, 2022

@chebmarcel The prior is actually another neural network, a type of RNN, which basically tries to predict the most likely next latent variable of your pre-trained RAVE model...it is needed if you want to perform unconditional generation (not timbre transfer)

@chebmarcel
Copy link
Author

@moiseshorta Ok great, its clear now! Last question, do you train the prior for as many steps as the RAVE model?
Or what would be a good step ratio? Thanks!

@moiseshorta
Copy link

@chebmarcel depends on you, I usually train beyond 1M steps for the prior, but really depends on your dataset and how it converges

@chebmarcel chebmarcel reopened this Aug 15, 2022
@chebmarcel
Copy link
Author

Hi @moiseshorta sorry for reopening haha. I trained the Rave model over 2M steps and it gives me very good results. Nevertheless when i train the prior after the loss is still not going down. I dont really understand why, does this mean that the prior training is not really effective in the case of my dataset? Thannks

@lang216
Copy link

lang216 commented Sep 12, 2022

Hi, do you mind showing the distance graph of your training? I don't know why after 1M steps my distance graph starts increasing.

@chebmarcel
Copy link
Author

No worries this is normal, till 1M steps it is the warmup phase, the distance logs after 1M dont really matter as it is a different phase.
Capture d’écran 2022-09-13 à 00 03 02

@lang216
Copy link

lang216 commented Sep 13, 2022

OMG it makes so much sense now lol! I just checked the train,py and saw 1M is hard coded inside the system. Thank you! So do I just need to check the validation graph now?

Also, do you know how the steps (the 3353 below) are calculated? My batch size is 8 (the default). And sample size is 14376.
Epoch 308: 35% 1160/3353 [09:48<-1:50:12, -3.72it/s, v_num=24]

@chebmarcel
Copy link
Author

The second part is using a GAN framework so you want to check the loss_dis and loss_gen curves now.
An epoch usually means one iteration over all of the training data. For instance if you have 20,000 images and a batch size of 100 then the epoch should contain 20,000 / 100 = 200 steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants