Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use pretrain.model for continuing training? #8

Closed
youyou098888 opened this issue Nov 10, 2021 · 8 comments
Closed

How to use pretrain.model for continuing training? #8

youyou098888 opened this issue Nov 10, 2021 · 8 comments

Comments

@youyou098888
Copy link

I want to add some chinese audios to the training data.

Can I use your pretrain.model and continue to train using my data,

Or Do I have to download all the VoxCeleb1data plusing my data, and train it from the beginning?

Thank you for your reply.

@TaoRuijie
Copy link
Owner

TaoRuijie commented Nov 10, 2021

I think both are ok.

You can use my pretrain model to train on the Chinese audios, it will be faster than training from the random init. I guess you just want to finetune, which means the number of Chinese utterances is much smaller than VoxCeleb2. So you need to start with a small learning rate.

You can also download VoxCeleb2 and train it together. However, if your number of Chinese data is much smaller than VoxCeleb2, I do not suggest you to do that.

@youyou098888
Copy link
Author

Chinese audios are almost the same size of voxceb2.
In that case, Can I use the pretrain model ?

@TaoRuijie
Copy link
Owner

I think you can use the pretrain model, and reduce the initial learning rate. Then only train on your data.

You can do experiments to compare. Here is my understanding:

  1. Train from scratch on your chinese data only
  2. Train from the pretrain model, and train on your chinese data only.
  3. Train from scratch and train on your data and Vox2 together.

I guess 2 and 3 might perform similar results, which might be better than 1.
2 can train much faster than 1. 3 needs a quite long time.

That is my understanding, you can do experiments to verify that.

@youyou098888
Copy link
Author

Thank you so much , it is very clear.
Another question, Are MUSAN and RIR dataset required in all of the three experiments?

@TaoRuijie
Copy link
Owner

That is used for augmentation,

you can add it or not in all experiments, it can make the result better;

you can also remove it. It can make training faster.

@youyou098888
Copy link
Author

Thank you so much , it is very clear.
Another question, Are MUSAN and RIR dataset required in all of the three experiments?

@youyou098888
Copy link
Author

Get it~

@wwyl2000
Copy link

Thanks for the information about continuous training.
I have a question about the continuous training:
After we trained a general model (from X speakers), we want to adapt or finetune the model for the N speakers. If we have dataset for the N speakers (X>>>N, and not none of the N speakers is included in X), and we want to use these N speakers to finetune the model, how to train? The models trained has X classes. When training with the new N speakers, how to set the number of classes, and what is the relationship between the X speakers and the new N speakers? During finetuning, what is the number of classes, N (overwrite the original X speakers), or X+N (append N new speakers to the original X speakers)?

For a given N, say N=10, what value of X should be suitable for acceptable performance? Will size of X influence model size much?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants