Add support of Vision Transformer #133

Yung-zi · 2022-01-22T14:20:10Z

🚀 Feature

I am appreciated for your great job! However, I have a question. Can Layer-CAM be used with Vision Transformer Network? If it does work, what aspects should I change?

Motivation & pitch

I'm working on the job related to CAM.

Alternatives

No response

Additional context

No response

frgfm · 2022-02-01T23:46:47Z

Hello @Yung-zi 👋

My apologies, I've been busy with other projects lately!
As of right now, the library is designed to work with CNNs. However, the way it was designed basically only relies on forward activation and backpropagated gradient hooks. So to answer your question, I'd need to run some tests but if the output activation of a given layer is of shape (N, C, H, W), whatever the way it was computed as long as this doesn't break the backprop (i.e. being differentiable), the library should work without much (perhaps any) change 😄

Either way, I intend on spending more time on Vision transformers compatibility for the next release 👍
If you're interested in helping / or providing feedback once it's in progress, let me know!

Yung-zi · 2022-07-08T08:49:38Z

Hello @Yung-zi 👋

My apologies, I've been busy with other projects lately! As of right now, the library is designed to work with CNNs. However, the way it was designed basically only relies on forward activation and backpropagated gradient hooks. So to answer your question, I'd need to run some tests but if the output activation of a given layer is of shape (N, C, H, W), whatever the way it was computed as long as this doesn't break the backprop (i.e. being differentiable), the library should work without much (perhaps any) change 😄

Either way, I intend on spending more time on Vision transformers compatibility for the next release 👍 If you're interested in helping / or providing feedback once it's in progress, let me know!

I am so sorry for late reply. I tried to change your code before. However, the effect looked not well maybe I made some mistakes. Have you ever made it on Vision transformer?

frgfm · 2022-08-02T19:22:09Z

Partially yes!
But I have staged this for the next release anyway so I'll dive into it to make it available :)

frgfm · 2022-12-31T00:44:37Z

Quick update!
As of today, here is the support status of Torchvision transformer architectures:

maxvit
swin
swin_v2
vit (so far I can't see a way to make this integration seamless, because of the concatenation on the channel dimension and the dimension swapping)

frgfm · 2023-01-02T21:18:21Z

Another update: VIT requires another method called Attention flow!
I'll try to investigate & implement this but this is a bit more complex than just inverting the axis swap & slicing.

YAN-0802 · 2024-05-17T12:25:46Z

Your excellent work has helped me a lot! Thank you for this! However, I have a question. I downloaded torchcam0.4.0 and had good visualization results on the CNN models. But it didn't work on the Vit model. Here's what happened: Since I was working offline, I downloaded the ViT weight file and loaded the model using timm. The result was blue pixels covering the entire image, i.e. no heatmap area was found. What do I need to change in the code to make it work? Or as you mentioned above, are you still working on it? Thank you for taking time out of your busy schedule.

Yung-zi added the type: improvement New feature or request label Jan 22, 2022

Yung-zi assigned frgfm Jan 22, 2022

frgfm added this to the 0.4.0 milestone Feb 1, 2022

frgfm added the module: methods Related to torchcam.methods label Feb 1, 2022

frgfm changed the title ~~Vision Transformer~~ Add support of Vision Transformer Feb 6, 2022

frgfm mentioned this issue Feb 6, 2022

Release tracker - v0.4.0 #136

Closed

8 tasks

frgfm modified the milestones: 0.4.0, 0.4.1, 0.5.0 Oct 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support of Vision Transformer #133

Add support of Vision Transformer #133

Yung-zi commented Jan 22, 2022

frgfm commented Feb 1, 2022

Yung-zi commented Jul 8, 2022

frgfm commented Aug 2, 2022

frgfm commented Dec 31, 2022 •

edited

frgfm commented Jan 2, 2023

YAN-0802 commented May 17, 2024

Add support of Vision Transformer #133

Add support of Vision Transformer #133

Comments

Yung-zi commented Jan 22, 2022

🚀 Feature

Motivation & pitch

Alternatives

Additional context

frgfm commented Feb 1, 2022

Yung-zi commented Jul 8, 2022

frgfm commented Aug 2, 2022

frgfm commented Dec 31, 2022 • edited

frgfm commented Jan 2, 2023

YAN-0802 commented May 17, 2024

frgfm commented Dec 31, 2022 •

edited