Best hyperparameter #1

kosuke1701 · 2021-03-14T10:35:19Z

Thank you very much for sharing your interesting project!

I'm planning to train a classifier with your classification code on another character face dataset.

Our best model, ViT L-16 with image size 128x128 and batch size 64

May I ask configurations of other hyperparameters of the best ViT L-16 model? (E.g. learning rate, epoch, decay.)

arkel23 · 2021-03-15T07:16:49Z

Hey, I'm glad you find it interesting!

For all of the experiments we use these settings:
Stochastic gradient descent (SGD) with momentum, with an initial learning rate (LR) of 0.001 and momentum of 0.9. We also apply LR decay, where we reduce the current LR by 1/3 after each 20 epochs if training for 50 epochs, and after 50 epochs if training for 200 epochs.

In general we also found we could get even better and "faster" results by using a small batch size, in particular with batch size 16, we could get past the 80% top-1 classification accuracy in less than 10 epochs using DAF:re.

With respect to the results in page 4 of the paper (https://arxiv.org/pdf/2101.08674.pdf), they were obtained using the 50 epochs setting and saving the model with the best results on the validation set, but the "best" validation accuracy is also reached by somewhere around the 20th epoch.

kosuke1701 · 2021-03-15T09:55:18Z

I got it! Thank you very much for detail description of the hyperparameters.

kosuke1701 closed this as completed Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best hyperparameter #1

Best hyperparameter #1

kosuke1701 commented Mar 14, 2021

arkel23 commented Mar 15, 2021

kosuke1701 commented Mar 15, 2021

Best hyperparameter #1

Best hyperparameter #1

Comments

kosuke1701 commented Mar 14, 2021

arkel23 commented Mar 15, 2021

kosuke1701 commented Mar 15, 2021