Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Training Time #43

Open
jhkonan opened this issue Mar 15, 2022 · 0 comments
Open

Unexpected Training Time #43

jhkonan opened this issue Mar 15, 2022 · 0 comments

Comments

@jhkonan
Copy link

jhkonan commented Mar 15, 2022

I am trying to get FullSubNet up and running by following the repo instructions. It seem we must make a custom train.toml, where we specify the relevant file paths and have text files with absolute paths. I am only looking at the training with no reverb.

I observe the following training time for one epoch on my system with two 2080 Ti GPUs.

This project contains 1 models, the number of the parameters is: 
        Network 1: 5.637635 million.
The amount of parameters in the project is 5.637635 million.
=============== 1 epoch ===============
[0 seconds] Begin training...
         Saving 1 epoch model checkpoint...
[966 seconds] Training has finished, validation is in progress...
         Saving 1 epoch model checkpoint...
         😃 Found a best score in the 1 epoch, saving...
[1031 seconds] This epoch is finished.

This is much faster than I would expect a 5M parameter model to train for a dateset of this size. I am also not sure how to use the evaluation logs, since they seem to be in a proprietary format.

Could you tell us how long it takes to train a few epochs and what evaluation results we should expect early on?

Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant