You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your work on VTN is excellent, it inspires me a lot.
In the paper, you said you visualize the [CLS] token attention weights. I am trying to visualize it , but I struggled to understand the meaning of the cls token. I now can get the cls token before mlp which shapes ( batch size * 768 ), how can I visualize it?
I would appreciate it if you could tell me. If you can help me with a simple example, that would be great.
Thank you!
The text was updated successfully, but these errors were encountered:
We use the cls token attention weight, as it is the only token with global attention.
Also, it is worth mentioning that we used one layer Longformer for that purpose.
Hello,
Your work on VTN is excellent, it inspires me a lot.
In the paper, you said you visualize the [CLS] token attention weights. I am trying to visualize it , but I struggled to understand the meaning of the cls token. I now can get the cls token before mlp which shapes ( batch size * 768 ), how can I visualize it?
I would appreciate it if you could tell me. If you can help me with a simple example, that would be great.
Thank you!
The text was updated successfully, but these errors were encountered: