Unexpected Training Time #43

jhkonan · 2022-03-15T22:35:13Z

I am trying to get FullSubNet up and running by following the repo instructions. It seem we must make a custom train.toml, where we specify the relevant file paths and have text files with absolute paths. I am only looking at the training with no reverb.

I observe the following training time for one epoch on my system with two 2080 Ti GPUs.

This project contains 1 models, the number of the parameters is: 
        Network 1: 5.637635 million.
The amount of parameters in the project is 5.637635 million.
=============== 1 epoch ===============
[0 seconds] Begin training...
         Saving 1 epoch model checkpoint...
[966 seconds] Training has finished, validation is in progress...
         Saving 1 epoch model checkpoint...
         😃 Found a best score in the 1 epoch, saving...
[1031 seconds] This epoch is finished.

This is much faster than I would expect a 5M parameter model to train for a dateset of this size. I am also not sure how to use the evaluation logs, since they seem to be in a proprietary format.

Could you tell us how long it takes to train a few epochs and what evaluation results we should expect early on?

Thank you for your help.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected Training Time #43

Unexpected Training Time #43

jhkonan commented Mar 15, 2022

Unexpected Training Time #43

Unexpected Training Time #43

Comments

jhkonan commented Mar 15, 2022