Selective Kernel Networks #232

chullhwan-song · 2019-10-25T04:52:48Z

https://arxiv.org/abs/1903.06586

chullhwan-song · 2019-10-28T07:02:14Z

abstract

CNN은 공유된 같은 크기의 receptive fields를 갖는게 원칙
- 그러나, 신경 과학계(neuroscience community)에서는 visual cortical neurons의 receptive field(RF) size가 자극에 의해 조절되는 것으로 잘 알려져 있다. 하지만, 이는 CNN에서는 고려되지 않았다.
그래서 이 연구에서는,
- 입력 뉴런의 다양한 스케일의 입력 정보기반으로한, receptive field size를 adapted하게 조정가능하도록, dynamic selection mechanism 제안 (ㅎ 이 논문 용어가 힘드넹~)
- 이를 위해 Selective Kernel (SK) 이라는 building block를 디자인.
  - 다른 커널크기를 가진 다수의 branch들은 각 branch안의 정보를 가이드 할 수 있는 softmax attention이라것에 의해 혼합(통합>fused)된다.
  - 이들 각각의 branch상에서의 차이가 나는 attention에 의해, fusion layer에서 뉴런의 효과적으로 다른 크기의 receptive field를 가지도록한다.
- Multiple SK units는 Selective Kernel Networks (SKNets) 이라고하는 net에 stacked된다.
- SKNets은 다른스케일을 가진 objet들을 감지. > 이는 input에 따른 다른 크기의 receptive field를 adapted하게 적용가능하게 함으로써..가능하다.

Introduction

neuroscience 개념에서, visual cortex는 같은 영역에서의 다양한 크기의 뉴런에 대한 RF 사이즈를 가지고 있고, 실제 CNN에서, 예를 들어, InceptionNets과 같은 연구에서 응용되고 있다.
- 하나의 입력에 각 3×3, 5×5, 7×7 크기를 가진 kernel를 이용하여 CNN을 적용하여 합치는(= aggregate multi-scale information) “inception” 모듈이 한예
그러나, "some other RF properties of cortical neurons have not been emphasized in designing CNNs, and one such property is the adaptive changing of RF size."
- cortical(대뇌피질)neurons에서는 inception모듈처럼 고정된 RF size가 아닌 adapted하게 RF 사이즈를 변화시키는 속성이 존재한다고 한다.란 의미인듯..
그래서, 이 연구는 “Selective Kernel"란 개념을 통해 RF사이즈를 dynamic하게 선택하는 알고리즘을 소개.

Related Work

Multi-branch convolutional networks > 예) ResNet
Grouped/depthwise/dilated convolutions > 예) ResNeXts, MobileNetV2,,
Attention mechanisms > 예) SENet, BAM,,
Dynamic convolutions
- 위의 3개는 그동안의 리뷰들을 통해서 언급이 되었는데...이 개념은 낯설다. 잘 모르기 때문에 refer한 논문 참조..(저도 봐야할듯.ㅠ)
  - Y. Jeon and J. Kim. Active convolution: Learning the shape of convolution for image classification. In CVPR, 2017.
  - X. Jia, B. De Brabandere, T. Tuytelaars, and L. V. Gool. Dynamic filter networks. In NIPS, 2016.

Methods

Selective Kernel Convolution

자동적으로 adapted하게 RF 사이즈를 선택하기 위해서, 앞에서 이미 언급했던, Selective Kernel (SK) convolution 제안 > multiple kernels with different kernel sizes
Selective Kernel > 3개의 operation
- Split
  - 각 3, 5 kernel Conv (두개의 branch)진행 > 이래서 split 인듯 > 그래서 각각의 branch..가 되고 다음과 같이 표시
    - Fig.1의 ,
  - convolution > grouped/depthwise convolutions를 의미
  - plus > Batch Normalization & ReLU
  - 5 kernel 크기를 가진 case에는 다음과 같이 변형
    - 5 kernel conv > 3x3 kernel & strid=2인 dilated convolution 으로 대체> 이게 더 효과적이라고 언급
- Fuse
  - 이 단계는 adapted하게 RF 사이즈를 선택하는 과정
  - 기본 개념은, 다른 scale의 지닌 각 branch에서 오는 information(feature map)의 다음 layer로 정보를 넘겨주는 flow를 control하는 gate 역할.
  - 그래서,
    - 먼저 다른 scale의 지닌 각 branch에서 오는 feature map에 대를 element-wise summation를 함.
    - 두번째로, global average pooling ( feature map의 channel dimension을 가짐)
    - 세번째로, z = fc > plus : Batch Normalization & ReLU
      - 이때, reduction ratio r 를 가지도록 설계 > 그냥 차원을 ..줄이는..
        
        하지만, 결국 input feature map X의 channel 크기 c를 맞춰줘야할듯.. 최종 수식 6을 보면..(이전의 수식들도 c를 기준으로 하고 있음)
- Select
  - 그래서, 위와 이어져서 설명...
    - 네번째로, softmax를 적용한다. > attention vector
      - 이때의 입력은 z를 branch 수만큼(for 문..) 누적(element-wise summation > 밑의 소스 참조)한 값을 받는 형태이다..
      - 이 부분이 살짝 이해가 안가는데..계속 읽어봐야할듯.~(그림과..?) > 수식 5에서는 각각의 형태로...(바로 밑에 각각 더해야하는데, 실제 소스에서는 U를 가지고만 계산하는 형태인듯...)
    - 다섯번째로, softmax의 결과와 그림에서는 각각의 branch에서의 feature map과 element-wise * 를 하는데, 소스에서는 위의 global average pooling 에서 받는 feature map(U)과 곱하고 있음.
      - 각각 , 을 곱
- final feature map : v
  - 합쳐서하는게 맞다면, 각각 따로 하지 않고 합쳐서 하는 이유가 이 수식에서 힌트가..
- 소스
pytorch > 밑에 이외에 여기도 참고 링크 이게 논문과 더..충실한데..조금씩 조끔씩 다르다.

class SKConv(nn.Module):
    def __init__(self, features, WH, M, G, r, stride=1 ,L=32):
        """ Constructor
        Args:
            features: input channel dimensionality.
            WH: input spatial dimensionality, used for GAP kernel size.
            M: the number of branchs.
            G: num of convolution groups.
            r: the radio for compute d, the length of z.
            stride: stride, default 1.
            L: the minimum dim of the vector z in paper, default 32.
        """
        super(SKConv, self).__init__()
        d = max(int(features/r), L)
        self.M = M
        self.features = features
        self.convs = nn.ModuleList([])
        for i in range(M):
            self.convs.append(nn.Sequential(
                nn.Conv2d(features, features, kernel_size=3+i*2, stride=stride, padding=1+i, groups=G),
                nn.BatchNorm2d(features),
                nn.ReLU(inplace=False)
            ))
        # self.gap = nn.AvgPool2d(int(WH/stride))
        self.fc = nn.Linear(features, d)
        self.fcs = nn.ModuleList([])
        for i in range(M):
            self.fcs.append(
                nn.Linear(d, features)
            )
        self.softmax = nn.Softmax(dim=1)
        
    def forward(self, x):
        for i, conv in enumerate(self.convs):
            fea = conv(x).unsqueeze_(dim=1)
            if i == 0:
                feas = fea
            else:
                feas = torch.cat([feas, fea], dim=1)
        fea_U = torch.sum(feas, dim=1)
        # fea_s = self.gap(fea_U).squeeze_()
        fea_s = fea_U.mean(-1).mean(-1)
        fea_z = self.fc(fea_s)
        for i, fc in enumerate(self.fcs):
            vector = fc(fea_z).unsqueeze_(dim=1)
            if i == 0:
                attention_vectors = vector
            else:
                attention_vectors = torch.cat([attention_vectors, vector], dim=1)
        attention_vectors = self.softmax(attention_vectors)
        attention_vectors = attention_vectors.unsqueeze(-1).unsqueeze(-1)
        fea_v = (feas * attention_vectors).sum(dim=1)
        return fea_v

tf

import tensorflow as tf
import tflearn
import tensorflow.contrib.slim as slim

def SKConv(input, M, r, L=32, stride=1, is_training=True):
    input_feature = input.get_shape().as_list()[3]
    d = max(int(input_feature / r), L)
    net = input
    with slim.arg_scope([slim.conv2d, slim.fully_connected], activation_fn=tf.nn.relu):
        for i in range(M):
            net = slim.conv2d(net, input_feature, [3+i*2, 3+i*2], rate=1+i, stride=stride)
            net = slim.batch_norm(net, decay=0.9, center=True, scale=True, epsilon=1e-5,
                                  updates_collections=tf.GraphKeys.UPDATE_OPS, is_training=is_training)
            net = tf.nn.relu(net)
            if i == 0:
                fea_U = net
            else:
                fea_U = tf.add(fea_U, net)
        gap = tflearn.global_avg_pool(fea_U)
        fc  = slim.fully_connected(gap, d, activation_fn=None)
        fcs = fc
        for _ in range(M):
            fcs = slim.fully_connected(fcs, input_feature, activation_fn=None)
            if _ == 0:
                att_vec = fcs
            else:
                att_vec = tf.add(att_vec, fcs)
        att_vec = tf.expand_dims(att_vec, axis=1)
        att_vec = tf.expand_dims(att_vec, axis=1)
        att_vec_softmax = tf.nn.softmax(att_vec)
        fea_v = tf.multiply(fea_U, att_vec_softmax)
    return fea_v

Network Architecture

Experiments

ImageNet Classification

결론/리뷰

개인적으로는 Selective Kernel (SK)이라는 근사한(?) 이름을 붙였지만, 또한 이름만 보고서는 dynamic하게 kernel 크기를 그때그대 붙여서 사용하는 형태(이런게 가능할까라고..첨에 생각하기도..)라고 착각하기싶기도하다.
결론적으로 제생각에는 multi-kernel SENET이라는게 더 가까운 개념이라고 느꼈다.

from chullhwan-song/Reading-Paper#232

chullhwan-song added the CNN label Oct 25, 2019

chullhwan-song closed this as completed Oct 25, 2019

chullhwan-song reopened this Oct 28, 2019

chullhwan-song mentioned this issue Apr 25, 2020

ResNeSt: Split-Attention Networks #373

Closed

Hyejin-Koo added a commit to Hyejin-Koo/cloud that referenced this issue Jul 26, 2021

Create skconv.py

e80693f

from chullhwan-song/Reading-Paper#232

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selective Kernel Networks #232

Selective Kernel Networks #232

chullhwan-song commented Oct 25, 2019

chullhwan-song commented Oct 28, 2019 •

edited

Loading

Selective Kernel Networks #232

Selective Kernel Networks #232

Comments

chullhwan-song commented Oct 25, 2019

chullhwan-song commented Oct 28, 2019 • edited Loading

abstract

Introduction

Related Work

Methods

Selective Kernel Convolution

Network Architecture

Experiments

ImageNet Classification

결론/리뷰

chullhwan-song commented Oct 28, 2019 •

edited

Loading