Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pretrained model which can resume training #35

Closed
emmacirl opened this issue Nov 19, 2018 · 25 comments
Closed

pretrained model which can resume training #35

emmacirl opened this issue Nov 19, 2018 · 25 comments

Comments

@emmacirl
Copy link

@rafaelvalle Thanks for your sharing! It helps a lot.
I find it trains very slow (about 1 epoch/day,batch size=1) when I trained model using my data(about 12h). Could you offer a model which can resume training from it ?

Thanks a lot !!

@jiqizaisikao
Copy link

Hi,it seems that it is too much slow,how many epochs should be done before we can get good result

@candlewill
Copy link

I use the following method to retrain on the official released pre-trained model:

  1. change config.json:
    set "checkpoint_path": "models/waveglow_new.pt"

  2. change train.py:

def load_checkpoint(checkpoint_path, model, optimizer):
    assert os.path.isfile(checkpoint_path)
    checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')
    # iteration = checkpoint_dict['iteration']
    iteration = 1
    # optimizer.load_state_dict(checkpoint_dict['optimizer'])
    model_for_loading = checkpoint_dict['model']
    model.load_state_dict(model_for_loading.state_dict())
    print("Loaded checkpoint '{}' (iteration {})" .format(
          checkpoint_path, iteration))
    return model, optimizer, iteration

The training is still in process, and I can't tell if this is completely right.

@belevtsoff
Copy link

belevtsoff commented Nov 22, 2018

@candlewill I'm trying to finetune the official pretrained model and I'm getting nan's because the determinants of the Intertible1x1Conv weight matrices are negative, so the logdet operation produces nan's. Have you changed something to address that? @rafaelvalle what do you think about it? This apparently doesn't matter for inference, but I'm thinking that you've probably used a slightly different version of code to train that checkpoint (i.e. using abs of the det or smth).

@WendongGan
Copy link

@candlewill Thanks for your share! I want to know how about your retraining? Does it work well?

@WendongGan
Copy link

@belevtsoff @candlewill @rafaelvalle I also meet the problem when I try to finetune the official pretrained model.
image

Look forward your help!

@belevtsoff
Copy link

belevtsoff commented Nov 23, 2018

@UESTCgan it seems like the problem is twofold:

  1. the glow_old.py uses interchanging splitting of channels (in both forward and infer), whereas glow.py always uses first half of channels as audio_0. So it seems that right now to finetune the thing you have to either modify glow.py to use the old style channels or uncomment and fix the forward method in glow_old.py and use that one instead.
  2. in glow.py, replace torch.logdet(W) with torch.det(W).abs().log(). This makes more sense mathematically and will get rid of nan's

I'm not sure if I've spotted everything needed though, but at least now my finetuned model produces speech and not just a bunch of noise

@WendongGan
Copy link

@belevtsoff Thanks for your reply! I will have a try, and then I will share my results to communicate with you. Thanks again!

@rafaelvalle
Copy link
Contributor

rafaelvalle commented Nov 28, 2018

@belevtsoff careful with the alterations. We initialize the determinants to be positive and the determinant's crossing between positive and negative values suggests that during optimization one is stepping over infinite error, at determinant 0, which is bad. This can be caused by a large update caused by either large learning rate or some outlier batch...

@belevtsoff
Copy link

belevtsoff commented Nov 28, 2018

@rafaelvalle Thanks for the response! But actually my point is that if you simply take the matrix W of the InvConv layer from the official pre-trained checkpoint (without any finetuning) - it's determinant will be negative. How is it possible?

@rafaelvalle
Copy link
Contributor

Probably because in the old model we did not enforce the determinant to be 1 at initialization.

@belevtsoff
Copy link

belevtsoff commented Nov 29, 2018

@rafaelvalle yeah, probably

Anyway, I've successfully finetuned the pretrained checkpoint using the recipe above. Although it didn't completely remove that reverb-like effect on a male voice.

@candlewill
Copy link

After two weeks' fine tune, I can also get some clear voice. However, the problem as @belevtsoff mentioned, also occurs in my experiment. Maybe more training is needed.

Here are some samples (In Chinese):

nvidia_waveglow_samples.zip

@hdmjdp
Copy link

hdmjdp commented Dec 6, 2018

@candlewill your wav is female or male?

@candlewill
Copy link

@hdmjdp Female.

@li-xx-5
Copy link

li-xx-5 commented Jan 15, 2019

@UESTCgan,hello.i met the same problem.i want to ask you how do you solve that problem,thank you very much!

@li-xx-5
Copy link

li-xx-5 commented Jan 15, 2019

hi,@candlewill.i met that problem.how can i resume training from the checkpoint,thank you very much.

@HashiamKadhim
Copy link

@UESTCgan it seems like the problem is twofold:

  1. the glow_old.py uses interchanging splitting of channels (in both forward and infer), whereas glow.py always uses first half of channels as audio_0. So it seems that right now to finetune the thing you have to either modify glow.py to use the old style channels or uncomment and fix the forward method in glow_old.py and use that one instead.

@belevtsoff can you please share your fix for this part?

@duvtedudug
Copy link

Also interested in resuming from waveglow_old.pt checkpoint.

@belevtsoff @candlewill Can you share your fix?

@rafaelvalle Is there a better way? Or have you a new checkpoint? (That works with current code)

@belevtsoff
Copy link

@duvtedudug @HashiamKadhim Oh, sorry guys, I forgot about this. I'll share the code as soon as I get to the computer

@rohan6366
Copy link

rohan6366 commented Mar 2, 2019

Any advice on training for adaption, where the dataset is small.

Thanks

@belevtsoff
Copy link

@duvtedudug @HashiamKadhim @doctor-xiang @rafaelvalle Ok, I've submitted a pull request to add the possibility to continue training from the official checkpoint: #99. You can use my fork if the PR will never get merged. Let me know if I overlooked smth

@anshshan
Copy link

anshshan commented Nov 4, 2019

After two weeks' fine tune, I can also get some clear voice. However, the problem as @belevtsoff mentioned, also occurs in my experiment. Maybe more training is needed.

Here are some samples (In Chinese):

nvidia_waveglow_samples.zip

Which Chinese dataset do you use?

@FadyKhalaf
Copy link

FadyKhalaf commented Dec 27, 2019

hello, @belevtsoff your modifications is no longer working i get errors like that whenever i try to load the model
image

@MuyangDu
Copy link

@UESTCgan it seems like the problem is twofold:

  1. the glow_old.py uses interchanging splitting of channels (in both forward and infer), whereas glow.py always uses first half of channels as audio_0. So it seems that right now to finetune the thing you have to either modify glow.py to use the old style channels or uncomment and fix the forward method in glow_old.py and use that one instead.
  2. in glow.py, replace torch.logdet(W) with torch.det(W).abs().log(). This makes more sense mathematically and will get rid of nan's

I'm not sure if I've spotted everything needed though, but at least now my finetuned model produces speech and not just a bunch of noise

I have replace torch.logdet(W) with torch.det(W).abs().log() but still getting NaN loss after several epochs. I have also used vad to remove all the silense in the wav and made sure all the wave is longer than segment length. I have also use std() to make sure no sliense in training
segment. Also tried to add some random noise to the audio samples in the dataloader. However, none of the above helps. The loss reached around -4.6 and suddenly became NaN. I have tried to use smaller learning rate but it turns out the smaller learning rate just slow down the converge. When the loss reached -4.6. The loss became NaN again. Any ideas?

@rafaelvalle
Copy link
Contributor

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests