New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransUNet - Why is the patch_dim set to 1? #10
Comments
Hello @dsitnik , I used it based on the figure since the output tokens of vit is There are indeed many details missing on the paper but i guess this one is ok. What do you think? |
I think patch dimensionality should be a hyper-parameter. In the paper (table 3.), they investigated the influence of patch size. If you have an input image of size 3x256x256, after convolution layers, the size would be 1024x16x16. You should then choose a patch size <16 (e.g. 2,4,8). Changes I made:
-Also, I defined
Hope these changes are correct. If you have any opinion about this, I would appreciate it. |
Yes that makes sense! Some remarks:
I am reopening the issue! Thanks for the contribution!!! |
It's not clear why to me why this change is necessary in the vit class. would this change make the vit class run as before for the other architectures? # before
img_patches = rearrange(img, 'b c (patch_x x) (patch_y y) -> b (x y) (patch_x patch_y c)',
patch_x=self.p, patch_y=self.p)
# after
img_patches = rearrange(img,
'b c (patch_x x) (patch_y y) -> b (x y) (patch_x patch_y c)',
x=self.img_dim//self.p, y=self.img_dim//self.p, patch_x=self.p ,patch_y=self.p) are you sure that this change is necessary? in einops axis decomposition , specifing one axis len should be enough |
I included x, y, patch_x, and patch_y just to be more clear what is going on from the code. If self.p is fixed to 1 and this is the only change made, the vit class should work with other architectures like before. However, if you include |
Yes, exactly!
Is this just for readability then (to make this change) or it is necessary to work? I dont mind changing this but I have to check that all the other architectures that are based on vit work fine. Thanks again. |
Hello @dsitnik I added the proposed changes on the transunet architecture. let me know if you find any problem. cheers, |
Hi,
Can you please explain why is the patch_dim set to 1 in TransUNet class? Thank you in advance!
self-attention-cv/self_attention_cv/transunet/trans_unet.py
Line 54 in 8280009
The text was updated successfully, but these errors were encountered: