Inverting the Inherence of Convolution for Visual Recognition
paper: https://arxiv.org/pdf/2103.06255.pdf
Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision. However, it is not spatial agnostic and channel-specific.
(i) involution could summarize the context in a wider spatial arrangement, thus overcome the difficulty of modeling long-range interactions well;
(ii) involution could adaptively allocate the weights over different positions, so as to prioritize the most informative visual elements in the spatial domain.