Recommendation #8

Babars7 · 2021-05-28T13:13:35Z

Thank you for sharing this amazing work. I am currently attempting to apply your ideas to a specific problem with bigger images sized 128x128. Do you have any recommendations on how to improve the performances of your network on bigger images?

alihassanijr · 2021-05-29T21:48:29Z

Hi,
Thank you for your interest!
We actually applied bigger variants of CCT to larger datasets, and recommend trying deeper models (more Transformer encoder layers) but with similar or even fewer heads and mlp ratio. 6 heads, mlp ratio of 3.0 and 14 layers are a good point to start, but of course it really depends on how many images you have to train on.
We also just submitted the paper and will share those results soon.
Looking forward to hearing your results.

Babars7 · 2021-06-01T07:36:29Z

Hi, thank you for your help. I followed your recommendations by using 8 heads, 14 layers, a mlp ratio of 3 and embedding of 512, 256, 768 but my results are much better with vit_lite_7 and cvt_7. I am currently using a dataset of 5.5 M images size 128x128.

alihassanijr · 2021-06-01T17:04:12Z

Hi,

Well the problem could be that your model is larger and is overfitting. The two models you mentioned both have 7 layers, 4 heads and an mlp ratio of 2, while maintaining a dim of 64 (64 * 4 heads = 256). You increased the number of heads, dim and ratio along with the number of layers, so your new variant is bigger, but may suffer from overfitting, which is what I'm assuming is happening here. One suggestion I have is to keep the heads, dim and ratio fixed while increasing the number of layers.

Have you tried the variant I recommended before? 6 heads, 14 layers, 3.0 ration?

Babars7 · 2021-06-08T06:34:04Z

Hi, sorry for the late reply and thank you for your last message. Yes I was able to test your recommended implementation with 6 heads, 14 layer and 3 ratio. But the results were not better than than ViT_Lite_7 and cvt_7 . However compared to the "CNN version" of the implementation I am working on only using 4.6M parameters, I am using with compact transformer method more than 17M parameters. Thus I tried to reduce the number of layers while keeping the same ratio and the same number of heads but The result are not better than original ViT_lite_7 and cvt_7.

alihassanijr · 2021-06-08T22:47:37Z

Hi,

Well, that could mean that merely changing the size of the model doesn't have a significant effect. Just out of curiosity, what models have you tried? Are the results a lot better than ViT lite and CVT? And when you say the results are not better, does that mean that they're all within the same range or is the bigger one worse?
Because I understand that merely changing the model size may not affect performance much if you're already at a performance limit.

alihassanijr · 2021-06-20T18:08:34Z

I'm closing the issue now, if you still have questions feel free to open it back up.

alihassanijr closed this as completed Jun 20, 2021

alihassanijr added the question Further information is requested label Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommendation #8

Recommendation #8

Babars7 commented May 28, 2021

alihassanijr commented May 29, 2021

Babars7 commented Jun 1, 2021

alihassanijr commented Jun 1, 2021

Babars7 commented Jun 8, 2021

alihassanijr commented Jun 8, 2021

alihassanijr commented Jun 20, 2021

Recommendation #8

Recommendation #8

Comments

Babars7 commented May 28, 2021

alihassanijr commented May 29, 2021

Babars7 commented Jun 1, 2021

alihassanijr commented Jun 1, 2021

Babars7 commented Jun 8, 2021

alihassanijr commented Jun 8, 2021

alihassanijr commented Jun 20, 2021