R-FCN: Object Detection via Region-based Fully Convolutional Networks #68

chullhwan-song · 2019-01-28T01:07:48Z

https://arxiv.org/abs/1605.06409
https://github.com/daijifeng001/r-fcn

chullhwan-song · 2019-01-28T01:25:43Z

abstract

이전 region-based detector - fast/faster rcnn
- region마다 sub-network가 존재하는데 이 sub-network가 수백번 적용되는 단점(?)이 있다.
이 연구에서의 region-based detector는 이미지상의 모든 fully convolutional 계산을 공유한다. > 이에 왜?
이를 위해서, 이미지 분류와 object detection에 대한 translation-variance(?) 에 대한 딜레마(?)를 해결하기 위한 position-sensitive score maps(위치에 대한 민감 지수맵?)를 제안
- translation-variance : 위치가 변하게 되면 object detection 결과도 달라지는 현상인듯한데..당연한것이 아닌가.ㅎ
- We argue that the aforementioned unnatural design is caused by a dilemma of increasing translation invariance for image classification vs. respecting translation variance for object detection.
our result is achieved at a test-time speed of 170ms per image, 2.5-20×faster than the Faster R-CNN counterpart.

동기

We argue that the aforementioned unnatural design is caused by a dilemma of increasing translation invariance for image classification vs. respecting translation variance for object detection.
- translation 측면에서, image classification vs detection 측면은 서로 구별해야한다는 의미 인듯.
  - image classification - "shift of an object inside an image should be indiscriminative." > 오브젝트의 내부에서 움직(shfit)이더라도 그 part는 여전히 그 object안에 존재하는 part이므로, 그 object로 분류되어야한다는 의미인듯.
    - 그래서, 이미지 분류 측면에서는 translation invariant
  - object detection 측면에서는, 반대로 translation-variant측면이 강화되어야한다.
    - 하지만, 현재 깊은 이미지 분류를 위한 network는 translation invariant 하다란 딜레마.
이 논문에서는 "Region-based Fully Convolutional Network (R-FCN)" for object detection 제안.
- shared, fully convolutional architectures
- construct a set of position-sensitive score maps by using a bank(?)of specialized convolutional layers as the FCN output.
  - Each of these score maps encodes the position information with respect to a relative spatial position
  - 이를 위해, a position-sensitive RoI pooling layer 추가
- conv feat map에 대한 색깔을 잘 보야할듯하다.. 상대적인 위치를 가지고 있다는 것이 아직 잘 와닿지 않는데, 그렇게 학습을 하겠다는건지...도..
- 3x3 *position-sensitive score maps를 구성하는것 같고, 이들은 Crop 영역(region based)를 기준으로 생성, 그리고, 각 위치는 상대적인 정보를 가지고 있는 heatmap정도를 의미하는듯한데..

Our approach

Overview

based R-CNN - two-stage object detection strategy
- region proposal & region classification
- region proposal = a fully convolutional architecture을 가진 Region Proposal Network (RPN)
최종적으로 RPN 과 R-FCN 사이의 feature를 공유하는게 key-point, 다음 그림 참고
- proposal regions (RoIs)에서, R-FCN 구조는 object와 background를 분류
- R-FCN에서 학습할수 있는 모든 layer들은 conv이고 전체 이미지에서 계산(?) <- ?
- fig.1에서도 나왔듯이, 맨마지막 conv layer에서, kxk개의 position-sensitive score maps를 구성하는데 이는 각 분류/감지하고자하는 카테고리수(C) 만큼 존재하다.
  - = k^2(C + 1)-channel output layer, +1는 background
  - k =3이라면, 3x3 크기의 position-sensitive score map이고, 이는 {top-left, top-center, top-right, ..., bottom-right} 의 의미를 가진다.
  - 이는 각 ROI마다 존재
- R-FCN은 이들을 합산한다.
  - This layer aggregates the outputs of the last convolutional layer and generates scores for each RoI.
- position-sensitive RoI layer 는 선택적 pooling ?
- 각 kxk bin은 온전히 k × k score maps 크기의 bank의 결과 ?

Backbone architecture

ResNet-101
remove the average pooling layer and the fc layer and only use the convolutional layers to compute feature maps.

Position-sensitive score maps & Position-sensitive RoI pooling.

각 ROI의 위치 정보를 encode하기 위해, ROI를 kxk크기로 분할한다(regular grid).
각 카테고리에 대해, k*k개의 score map을 구성.
position-sensitive RoI pooling operation
는 각 카테고리(c)를 위한, (i, j)번째의 pooling된 결과
는 들중의 하나.
는 top-left를 의미
n은 이들 전체 position개수
는 모든 학습가능한 layer
식(1)은 Fig.1과 동일
- 각 color는 각 grid의 위치(i,j)를 의미
- average pooling를 의미하지만, max pooling도 적용될수 있음.
k^2 position-sensitive scores then vote on the RoI ? vote란 개념이 나왔는데 ???
- 간단히 average로 vote된다.
- 이후, softmax
bounding box regression 는 fast/faster rcnn 동일
position-sensitive score maps 의 기본 개념은 "Instance-sensitive fully convolutional networks" 제목의 연구에서 영감을 얻음.
atrous convolution 적용
- 신기하게 맨끝의 conv layer에 dilated conv를 적용 > stride=16, conv5
- FCNs for semantic segmentation 의 방식에...
- 그리고 여기말고, DSSD에서도 사용한듯한데..
- 다음과같이 효과가 나타난다고 함.
vote 개념을 다음 그림에서..이해할수 있음.

실험

chullhwan-song added Localization FCN labels Jan 28, 2019

chullhwan-song closed this as completed Jan 28, 2019

chullhwan-song reopened this Jan 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R-FCN: Object Detection via Region-based Fully Convolutional Networks #68

R-FCN: Object Detection via Region-based Fully Convolutional Networks #68

chullhwan-song commented Jan 28, 2019 •

edited

Loading

chullhwan-song commented Jan 28, 2019 •

edited

Loading

R-FCN: Object Detection via Region-based Fully Convolutional Networks #68

R-FCN: Object Detection via Region-based Fully Convolutional Networks #68

Comments

chullhwan-song commented Jan 28, 2019 • edited Loading

chullhwan-song commented Jan 28, 2019 • edited Loading

abstract

동기

Our approach

Overview

Backbone architecture

Position-sensitive score maps & Position-sensitive RoI pooling.

실험

chullhwan-song commented Jan 28, 2019 •

edited

Loading

chullhwan-song commented Jan 28, 2019 •

edited

Loading