ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval #351

chullhwan-song · 2020-03-28T06:01:43Z

https://arxiv.org/abs/1907.05794

chullhwan-song · 2020-04-03T07:32:17Z

Abstract

REMAP 이 연구의 거의 복사본
- 2018년 google landmark kaggle 대회에서 1등한 팀 - 여기서 REMAP이란 descriptor 적용 > 실제 정체가 그닥 자세하지 않아..궁금하긴하고, 저 RMAP feature만 사용해서 한것이 아니라, 다양한 feature 적용한 case임.
  - 이 점이 이 저자의 연구를 살짝 의심스러운 관점에서 리뷰하는것 같다.:)
learnable activation layer 설계 > CNN 은 다..
controlled multi-stream aggregation > 2-scale cnn feature map에서 추출된 feature를 합친다는 의미
three non-linear activation functions: Sine-Hyperbolic, Exponential and modified Weibull
- REMAP 연구와 마찬가지로 이 논문의 핵심인듯한데, 약간 희귀한 방법(어디서 search가 안되는.ㅠ)을 가지고 적용함. > 소스도 공개되면 좋겠지만, 비공개
triplet loss
ACTNET 라고 부르는데..약자가.??

구조

먼저, REMAP의 구조는
ACNET 구조는
거의 둘은 유사하다.(저자가 같음)
이 연구의 구조
- learnable aggregation for deep descriptors
  - max pooling vs average pooling에 대해 설명
    - max pooling > 각 channel의 가장 강한 결과 > 이미지 안의 특정 object(object of interest)에 대해서만 발현 그렇기 때문, background에 noise에 대해 강함. 그렇지만, 항상 보장되진 않음(no guarantee).
    - average pooling > 모든 channel의 average > 오염(nose부분, contamination) 되는 feature를 포함.
  - Our aim is to maximise the signal-to-noise ratio (SNR) of the convolutional feature map before aggregation
    - 이해가 안가는부분이 있는데, 그냥 multi-scale map에서의 max-pooling의 결합일뿐인데..왜 이 개념과 일치하는지??
- deep multi-layer architecture
  - 이전 연구들은 aggreative 방법들이 실패 - conventional aggregation methods, such as max- or average-pooling fail to combine such features effectively,
    - Combination of Multiple Global Descriptors for Image Retrieval #313 이 연구에선 성공했다고 한다. 다만, multi-layer에서의 한 feature가 아니라, 한 layer에서, spoc+mac ? case가..
- 이를 위해, the convolutional weights, activation parameters and PCA+Whitening weights are jointly trained with triplet loss.
  - 개인적으로 PCA+Whitening 이 모듈에 대해 알고 싶은데, learnable 학습방법tract
REMAP 이 연구의 거의 복사본
- 2018년 google landmark kaggle 대회에서 1등한 팀 - 여기서 REMAP이란 descriptor 적용 > 실제 정체가 그닥 자세하지 않아..궁금하긴하고, 저 RMAP feature만 사용해서 한것이 아니라, 다양한 feature 적용한 case임.
learnable activation layer 설계 > CNN 은 다..
controlled multi-stream aggregation > 2-scale cnn feature map에서 추출된 feature를 합친다는 의미
three non-linear activation functions: Sine-Hyperbolic, Exponential and modified Weibull
- REMAP 연구와 마찬가지로 이 논문의 핵심인듯한데, 약간 희귀한 방법(어디서 search가 안되는.ㅠ)을 가지고 적용함. > 소스도 공개되면 좋겠지만, 비공개
triplet loss
ACTNET 라고 부르는데..약자가.??

구조

먼저, REMAP의 구조는
ACNET 구조는
거의 둘은 유사하다.(저자가 같음)
이 연구의 구조
- learnable aggregation for deep descriptors
  - max pooling vs average pooling에 대해 설명
    - max pooling > 각 channel의 가장 강한 결과 > 이미지 안의 특정 object(object of interest)에 대해서만 발현 그렇기 때문, background에 noise에 대해 강함. 그렇지만, 항상 보장되진 않음(no guarantee).
    - average pooling > 모든 channel의 average > 오염(nose부분, contamination) 되는 feature를 포함.
  - Our aim is to maximise the signal-to-noise ratio (SNR) of the convolutional feature map before aggregation
    - 이해가 안가는부분이 있는데, 그냥 multi-scale map에서의 max-pooling의 결합일뿐인데..왜 이 개념과 일치하는지??
- deep multi-layer architecture
  - 이전 연구들은 aggreative 방법들이 실패 - conventional aggregation methods, such as max- or average-pooling fail to combine such features effectively,
    - Combination of Multiple Global Descriptors for Image Retrieval #313 이 연구에선 성공했다고 한다. 다만, multi-layer에서의 한 feature가 아니라, 한 layer에서, spoc+mac ? case가..
- 이를 위해, the convolutional weights, activation parameters and PCA+Whitening weights are jointly trained with triplet loss.
  - 개인적으로 PCA+Whitening 이 모듈에 대해 알고 싶은데, learnable 학습방법인듯한데, End-to-end 방법이라고 주장하기 때문이다.
    - 사후에 PCA를 돌릴 수도 있는데, 이점이 조금 걸린다. PCA로 feature redcution하면 기본적으로 성능이 떨어지는거 아닌가?
    - learnable하게 설계되어 있다고 주장하던데, Fine-tuning CNN Image Retrieval with No Human Annotation #153 이 연구에서는 이 점이 대해 소스를 보니, learnable인데 미리 학습된 것을 쓴다. ?? 그렇지 않고 wx+b를 하나 추가하면, 학습하면 학습이 안된다.ㅠ (이점은 내가 코딩 실수일 수도 있는데, 단순해서..)
    - 위에 언급했던거 처럼, learnable하게 하면 성능이 향상되는가? > Fine-tuning CNN Image Retrieval with No Human Annotation #153 연구에서..보면 성능이 base network보다 더 좋다고 나오는데,
ACTNET 은 mobile devices 적합하도록 설계
*. .하다고 했는데 개인적으론 이점이 .의심스럽다. > 이 조건에 SOTA라니.ㅠ
Learnable non-linear activation layer > 이 논문에서 설명하는 의심스러운(?) 찾아봐도 관련연구가...ㅠ > 개인적으론 요런게 relu보다 좋을지 몰라도 밝힌 성능처럼 나올것같지 않다.
- Sine-Hyperbolic function (SinH)
- Exponential function (Exp)
- Modified Weibull function (WB)
feature aggregation
1. Direct aggregation (DA) [19]: the features from the last convolution layers are aggregated using average pooling > neuralcode 논문
2. Region of Interest based aggregation (ROIA) [2]: the features are first max-pooled across several multiscale overlapping regions. The regional descriptors are aggregated using sum pooling > DIR 논문
3. SinH aggregation (SinHA): the features are transformed using the Sine-Hyperbolic activation layer before aggregation using average pooling
4. Exponential aggregation (ExpA): the features are passed through the Exponential activation layer before average pooling.
5. Weibull aggregation (WBA): the features are transformed using the Weibull activation layer before average pooling.
  - 3~4> 이게 왜? relu해주는 것과 모가 틀리지.

실험결과

와 성능이..

결론

일단 대충 읽어본건데..디테일하게 읽어봐야겠다(업데이트 예정)
- 너무 이전 연구보다 뛰어나다..> 조금 높이는 것도 어려운데..이론적(이전에도 multi-layer aggreation들은 많았다..)으로 저런 성능이..되는 것이 살짝 의심스럽다(ㅎㅎ) > 너무 부정적으로 ㅠ

chullhwan-song added Deep Image Feature local descriptor Deep Feature labels Mar 28, 2020

chullhwan-song closed this as completed Mar 28, 2020

chullhwan-song reopened this Apr 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval #351

ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval #351

chullhwan-song commented Mar 28, 2020

chullhwan-song commented Apr 3, 2020 •

edited

Loading

ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval #351

ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval #351

Comments

chullhwan-song commented Mar 28, 2020

chullhwan-song commented Apr 3, 2020 • edited Loading

Abstract

구조

구조

실험결과

결론

chullhwan-song commented Apr 3, 2020 •

edited

Loading