Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendation #8

Closed
Babars7 opened this issue May 28, 2021 · 6 comments
Closed

Recommendation #8

Babars7 opened this issue May 28, 2021 · 6 comments
Labels
question Further information is requested

Comments

@Babars7
Copy link

Babars7 commented May 28, 2021

Thank you for sharing this amazing work. I am currently attempting to apply your ideas to a specific problem with bigger images sized 128x128. Do you have any recommendations on how to improve the performances of your network on bigger images?

@alihassanijr
Copy link
Member

Hi,
Thank you for your interest!
We actually applied bigger variants of CCT to larger datasets, and recommend trying deeper models (more Transformer encoder layers) but with similar or even fewer heads and mlp ratio. 6 heads, mlp ratio of 3.0 and 14 layers are a good point to start, but of course it really depends on how many images you have to train on.
We also just submitted the paper and will share those results soon.
Looking forward to hearing your results.

@Babars7
Copy link
Author

Babars7 commented Jun 1, 2021

Hi, thank you for your help. I followed your recommendations by using 8 heads, 14 layers, a mlp ratio of 3 and embedding of 512, 256, 768 but my results are much better with vit_lite_7 and cvt_7. I am currently using a dataset of 5.5 M images size 128x128.

@alihassanijr
Copy link
Member

Hi,

Well the problem could be that your model is larger and is overfitting. The two models you mentioned both have 7 layers, 4 heads and an mlp ratio of 2, while maintaining a dim of 64 (64 * 4 heads = 256). You increased the number of heads, dim and ratio along with the number of layers, so your new variant is bigger, but may suffer from overfitting, which is what I'm assuming is happening here. One suggestion I have is to keep the heads, dim and ratio fixed while increasing the number of layers.

Have you tried the variant I recommended before? 6 heads, 14 layers, 3.0 ration?

@Babars7
Copy link
Author

Babars7 commented Jun 8, 2021

Hi, sorry for the late reply and thank you for your last message. Yes I was able to test your recommended implementation with 6 heads, 14 layer and 3 ratio. But the results were not better than than ViT_Lite_7 and cvt_7 . However compared to the "CNN version" of the implementation I am working on only using 4.6M parameters, I am using with compact transformer method more than 17M parameters. Thus I tried to reduce the number of layers while keeping the same ratio and the same number of heads but The result are not better than original ViT_lite_7 and cvt_7.

@alihassanijr
Copy link
Member

Hi,

Well, that could mean that merely changing the size of the model doesn't have a significant effect. Just out of curiosity, what models have you tried? Are the results a lot better than ViT lite and CVT? And when you say the results are not better, does that mean that they're all within the same range or is the bigger one worse?
Because I understand that merely changing the model size may not affect performance much if you're already at a performance limit.

@alihassanijr
Copy link
Member

I'm closing the issue now, if you still have questions feel free to open it back up.

@alihassanijr alihassanijr added the question Further information is requested label Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants