External Attention v.s. Covolutional Kernel #23

rayleizhu · 2021-08-11T12:13:29Z

Intuitively, the memory units serve as prototypes for different patterns, almost play the same role as a convolution kernel (especially 1*1 conv kernel). From the perspective of mathematical operation, In both cases, the dot product between the feature vector and the memory unit/convolution kernel will be the output.

Hence the question comes that, what are the differences between a memory unit and a convolution kernel?

tjyuyao · 2023-02-12T07:54:28Z

I have had the same question and have convinced myself that the differences do exist. There are two weight W_k and W_v. W_k is for affinity generation, which is then used for a linear (convex) combination of W_v. Given fixed number of weight, this can capture much flexibler mapping than a two-layer convolution since the result is more an interpolation of free parameter W_v instead of the typical ReLU-based nonlinear projection of the input feature. The essence of external attention is a more effective way of non-linear mapping than simple ReLU, according to my personal understanding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External Attention v.s. Covolutional Kernel #23

External Attention v.s. Covolutional Kernel #23

rayleizhu commented Aug 11, 2021

tjyuyao commented Feb 12, 2023

External Attention v.s. Covolutional Kernel #23

External Attention v.s. Covolutional Kernel #23

Comments

rayleizhu commented Aug 11, 2021

tjyuyao commented Feb 12, 2023