Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best hyperparameter #1

Closed
kosuke1701 opened this issue Mar 14, 2021 · 2 comments
Closed

Best hyperparameter #1

kosuke1701 opened this issue Mar 14, 2021 · 2 comments

Comments

@kosuke1701
Copy link

Thank you very much for sharing your interesting project!

I'm planning to train a classifier with your classification code on another character face dataset.

Our best model, ViT L-16 with image size 128x128 and batch size 64

May I ask configurations of other hyperparameters of the best ViT L-16 model? (E.g. learning rate, epoch, decay.)

@arkel23
Copy link
Owner

arkel23 commented Mar 15, 2021

Hey, I'm glad you find it interesting!

For all of the experiments we use these settings:
Stochastic gradient descent (SGD) with momentum, with an initial learning rate (LR) of 0.001 and momentum of 0.9. We also apply LR decay, where we reduce the current LR by 1/3 after each 20 epochs if training for 50 epochs, and after 50 epochs if training for 200 epochs.

In general we also found we could get even better and "faster" results by using a small batch size, in particular with batch size 16, we could get past the 80% top-1 classification accuracy in less than 10 epochs using DAF:re.

With respect to the results in page 4 of the paper (https://arxiv.org/pdf/2101.08674.pdf), they were obtained using the 50 epochs setting and saving the model with the best results on the validation set, but the "best" validation accuracy is also reached by somewhere around the 20th epoch.

@kosuke1701
Copy link
Author

I got it! Thank you very much for detail description of the hyperparameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants