New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommendation #8
Comments
Hi, |
Hi, thank you for your help. I followed your recommendations by using 8 heads, 14 layers, a mlp ratio of 3 and embedding of 512, 256, 768 but my results are much better with vit_lite_7 and cvt_7. I am currently using a dataset of 5.5 M images size 128x128. |
Hi, Well the problem could be that your model is larger and is overfitting. The two models you mentioned both have 7 layers, 4 heads and an mlp ratio of 2, while maintaining a dim of 64 (64 * 4 heads = 256). You increased the number of heads, dim and ratio along with the number of layers, so your new variant is bigger, but may suffer from overfitting, which is what I'm assuming is happening here. One suggestion I have is to keep the heads, dim and ratio fixed while increasing the number of layers. Have you tried the variant I recommended before? 6 heads, 14 layers, 3.0 ration? |
Hi, sorry for the late reply and thank you for your last message. Yes I was able to test your recommended implementation with 6 heads, 14 layer and 3 ratio. But the results were not better than than ViT_Lite_7 and cvt_7 . However compared to the "CNN version" of the implementation I am working on only using 4.6M parameters, I am using with compact transformer method more than 17M parameters. Thus I tried to reduce the number of layers while keeping the same ratio and the same number of heads but The result are not better than original ViT_lite_7 and cvt_7. |
Hi, Well, that could mean that merely changing the size of the model doesn't have a significant effect. Just out of curiosity, what models have you tried? Are the results a lot better than ViT lite and CVT? And when you say the results are not better, does that mean that they're all within the same range or is the bigger one worse? |
I'm closing the issue now, if you still have questions feel free to open it back up. |
Thank you for sharing this amazing work. I am currently attempting to apply your ideas to a specific problem with bigger images sized 128x128. Do you have any recommendations on how to improve the performances of your network on bigger images?
The text was updated successfully, but these errors were encountered: