Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hi, some questions about the model! #35

Closed
roger-cv opened this issue Dec 2, 2021 · 7 comments
Closed

Hi, some questions about the model! #35

roger-cv opened this issue Dec 2, 2021 · 7 comments

Comments

@roger-cv
Copy link

roger-cv commented Dec 2, 2021

Hi, Thanks for this nice work. Recently, I am trying to modify this excellent model to make it suit for my work. As you see, why the "average" operation is required here, the shape of the tensor changes from 6464 to 88 after the "average" operation. However, the shape of the tensor should be the 64*64 just like the description" fusion at (B,64,64,64)".
QQ截图20211202152824

@ap229997
Copy link
Collaborator

ap229997 commented Dec 2, 2021

The fusion can also be done at 64x64 resolution but that would be too computationally expensive since a transformer is used (quadratic complexity due to attention), so I reduced the size to 8x8 at each resolution of the intermediate feature maps.

@roger-cv
Copy link
Author

roger-cv commented Dec 6, 2021

Thanks for your quick reply. I guess that the input feature map of the transformer of each layer will be downsampled to 8*8 according to what you mean?

@ap229997
Copy link
Collaborator

ap229997 commented Dec 6, 2021

that's correct, now there are several variants of transformer which address the quadratic complexity issue of the transformer (eg. Linformer) so maybe it's possible to use the transformer without downsampling.

@roger-cv
Copy link
Author

roger-cv commented Dec 8, 2021

that's correct, now there are several variants of transformer which address the quadratic complexity issue of the transformer (eg. Linformer) so maybe it's possible to use the transformer without downsampling.

Ok, Another interesting question is that can this fusion fashion based on the transformer be replaced with other transformers, such as swim or PVT. Because I notice that this transformer is developed based on the GPT suited for the NLP area.

@ap229997
Copy link
Collaborator

ap229997 commented Dec 8, 2021

I agree, architecture design can be improved quite a bit.

@roger-cv
Copy link
Author

roger-cv commented Dec 9, 2021

Ok, Nice work, Thanks for your reply.

@Kin-Zhang
Copy link

But it may require more resources to train...

I agree, architecture design can be improved quite a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants