Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time #7

Closed
rohitrango opened this issue Jun 16, 2022 · 5 comments
Closed

Training time #7

rohitrango opened this issue Jun 16, 2022 · 5 comments

Comments

@rohitrango
Copy link

Hi,

Thank you for the amazing work!

I was wondering how long does the synthetic data training take, considering there is no DataParallel/DistributedDataParallel implementation.
Thank you in advance!

@rohitrango
Copy link
Author

Just in case, do you also have training loss curves, etc?

@akashsengupta1997
Copy link
Owner

Hi - sorry for the late reply!

Synthetic training takes around 6 days on a 2080Ti. Unfortunately, it is quite slow due to the matrix-Fisher distributions used - since we need to do rejection sampling from these distributions. (Future work could look at using better distributions over SO(3) with faster sampling.)

I do have training logs saved - I will email them to you by the end of the week (please remind me if I haven't done it)

@rohitrango
Copy link
Author

Thank you for your response.

Can you email them to rohitrango@gmail.com?

@rohitrango
Copy link
Author

Another small question.

The training parameters that is mentioned in the paper mentions 150 epochs in total. But the default config file mentions more iterations (300) with different number of iterations for loss stages 1 and 2 (different from that in the paper). Which configuration works best? TIA.

@akashsengupta1997
Copy link
Owner

Hi,

The epochs in the default config is set higher than needed - I usually track the train/val curves and early-stop the experiment when it seems like they have converged. IIRC the metrics in the paper were achieved with ~150 epochs. The weights that are released with the repo were trained for the full 300 epochs, mostly because I started the experiment and forgot about it for a while 😄 Training for 300 epochs probably will not perform much better on real test data than 150 epochs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants