You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Intuitively, the memory units serve as prototypes for different patterns, almost play the same role as a convolution kernel (especially 1*1 conv kernel). From the perspective of mathematical operation, In both cases, the dot product between the feature vector and the memory unit/convolution kernel will be the output.
Hence the question comes that, what are the differences between a memory unit and a convolution kernel?
The text was updated successfully, but these errors were encountered:
I have had the same question and have convinced myself that the differences do exist. There are two weight W_k and W_v. W_k is for affinity generation, which is then used for a linear (convex) combination of W_v. Given fixed number of weight, this can capture much flexibler mapping than a two-layer convolution since the result is more an interpolation of free parameter W_v instead of the typical ReLU-based nonlinear projection of the input feature. The essence of external attention is a more effective way of non-linear mapping than simple ReLU, according to my personal understanding.
Intuitively, the memory units serve as prototypes for different patterns, almost play the same role as a convolution kernel (especially 1*1 conv kernel). From the perspective of mathematical operation, In both cases, the dot product between the feature vector and the memory unit/convolution kernel will be the output.
Hence the question comes that, what are the differences between a memory unit and a convolution kernel?
The text was updated successfully, but these errors were encountered: