I noticed that after the channel shuffle, a 1×1 convolution is applied, which performs a weighted average of all channels at each position on the feature map. This operation is not affected by the channel shuffle. Therefore, I find it difficult to understand the role of channel shuffle.