Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Attention v.s. Covolutional Kernel #23

Open
rayleizhu opened this issue Aug 11, 2021 · 1 comment
Open

External Attention v.s. Covolutional Kernel #23

rayleizhu opened this issue Aug 11, 2021 · 1 comment

Comments

@rayleizhu
Copy link

Intuitively, the memory units serve as prototypes for different patterns, almost play the same role as a convolution kernel (especially 1*1 conv kernel). From the perspective of mathematical operation, In both cases, the dot product between the feature vector and the memory unit/convolution kernel will be the output.

Hence the question comes that, what are the differences between a memory unit and a convolution kernel?

@tjyuyao
Copy link

tjyuyao commented Feb 12, 2023

I have had the same question and have convinced myself that the differences do exist. There are two weight W_k and W_v. W_k is for affinity generation, which is then used for a linear (convex) combination of W_v. Given fixed number of weight, this can capture much flexibler mapping than a two-layer convolution since the result is more an interpolation of free parameter W_v instead of the typical ReLU-based nonlinear projection of the input feature. The essence of external attention is a more effective way of non-linear mapping than simple ReLU, according to my personal understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants