Squeeze Excitation Networks #38

chullhwan-song · 2018-08-08T01:16:48Z

chullhwan-song · 2018-08-08T01:21:43Z

what ?

CNN filter(A convolutional filer)는 local receptive field 아래 작동(operate) 하여 학습한 filter이며, 이는 바로 channel-wise와 spatial 정보가 결합된 형태이다.
Spatial 측면에서만 보면, 이전 연구들에서 정보를 재표현하여 향상된 정보들을 이용해왔다.
- 그럼 channels 관련에서는?????
  - 이 연구는 두단계(squeeze and excitation)를 통해, 필터의 결과를 재 교정(recalibration)하여 채널관련 모델링을 할수 있다.
    - feature recalibration : 선택적으로 덜 유용한것을 막고, 유용한것을 향상시킨는 방법을 의미

Senet

senet을 한마디로 말한다면 cnn의 channel attention 관련 연구이다.
Squeeze Excitation Networks는 모듈화가 가능하다.
depth한 nets에서 중간중간 삽입하는형태로 성능향상을 이룬듯~
Squeeze Excitation > 쥐어짜써(압착) 자극내는 ?? > wxh -> 1 차원으로 압착(? 여기서는 Squeeze 란 표현을 쓰는것 같음)
Squeeze 부분 = Global Information Embedding
- 동기 : Each of the learned filters operates with a local receptive field and consequently each unit of the transformation output U is unable to exploit contextual information outside of this region. This is an issue that becomes more severe in the lower layers of the network whose receptive field sizes are small.
  - ?? conv 연산은 기본적으로 local 요소를 지닌 연산이니, 그 밖에의 region에 대해서는 이용하지 못한다는 의미 있듯, 이 이슈는 작은 사이즈의 연산을 하는 lower layer에서 더욱더 심각~
  - To mitigate this problem, we propose to squeeze global spatial information into a channel descriptor. > 이를 해결하기위한 방법이 global average pooling을 적용하는 방법을 제시. = Global Information Embedding
- 파란색 부분은 global average pooling > 각 채널마다 wxh크기의 map를 평균 취하니 channel크기의 vector가 나옴.
Excitation 부분 : feature recalibration
- To make use of the information aggregated in the squeeze operation, we follow it with a second operation which aims to fully capture channel-wise dependencies
- 이를 위해서, two criteria
  - first, it must be flexible (in particular, it must be capable of learning a nonlinear interaction between channels) : 채널간에 비선형 상호작용(?)으로 학습
  - second, it must learn a non-mutually-exclusive relationship since we would like to ensure that multiple channels are allowed to be emphasised opposed to one-hot activation.
    - 한가지만 강조되는 형태( one-hot activation)를 지양하고 여러개의 채널이 강조되기 원하므로, 이를 위해서 상호배타적이지 않게(non-mutually-exclusive relationship) 학습되기를 원함.
  - 마지막 fcn에 대해 sigmoid를 가진, 2개의 fcn을 적용 (첫번째는 relu)
- 이형태는 2개의 fcn 를 가지는 형태이고 각 layer는 activate function 를 가지고 있다.
- 여전히 수식 2와의 dimension은 같게 하는듯~
Scale 부분
다음과 같이 소스 자체도 매우 심플함.

def Squeeze_excitation_layer(self, input_x, out_dim, ratio, layer_name):
        with tf.name_scope(layer_name) :
            squeeze = Global_Average_Pooling(input_x)
            excitation = Fully_connected(squeeze, units=out_dim / ratio, layer_name=layer_name+'_fully_connected1')
            excitation = Relu(excitation)
            excitation = Fully_connected(excitation, units=out_dim, layer_name=layer_name+'_fully_connected2')
            excitation = Sigmoid(excitation)
            excitation = tf.reshape(excitation, [-1,1,1,out_dim])
            scale = input_x * excitation
            return scale

chullhwan-song added CNN Attention labels Aug 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Squeeze Excitation Networks #38

Squeeze Excitation Networks #38

chullhwan-song commented Aug 8, 2018

chullhwan-song commented Aug 8, 2018 •

edited

Loading

Squeeze Excitation Networks #38

Squeeze Excitation Networks #38

Comments

chullhwan-song commented Aug 8, 2018

chullhwan-song commented Aug 8, 2018 • edited Loading

what ?

Senet

chullhwan-song commented Aug 8, 2018 •

edited

Loading