Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2018 Google Landmark Retrieval Challenge 리뷰 #105

Open
chullhwan-song opened this issue Feb 28, 2019 · 1 comment
Open

2018 Google Landmark Retrieval Challenge 리뷰 #105

chullhwan-song opened this issue Feb 28, 2019 · 1 comment

Comments

@chullhwan-song
Copy link
Owner

chullhwan-song commented Feb 28, 2019

https://www.kaggle.com/c/landmark-retrieval-challenge

@chullhwan-song chullhwan-song changed the title Google Landmark Retrieval Challenge 리뷰 2018 Google Landmark Retrieval Challenge 리뷰 Feb 28, 2019
@chullhwan-song
Copy link
Owner Author

chullhwan-song commented Feb 28, 2019

Google Landmark Classification Challenge 와 헷갈리지 마세요~

1th

  • 구조
    image

    • 앙상블한 feature 적용
      • 그것도 아주 무식하게 concat해서
        • XG = [2× ResNeXt+REMAP; 1.5× ResNeXt+RMAC; 1.5× ResNeXt+MAC; 1.5× ResNeXt+SPoC; ResNet+MAC; ResNet+REMAP]
      • 사용된 feature만 6개
        • ResNeXt(REMAP , R-MAC, MAC, SPoC)+ResNet(REMAP, MAC)
          • REMAP - 자신들이 만든것같음(공걔예정)
      • attention type의 feature는 없음.
    • concat feature에 대한 normalize+whitening은 이제 default
    • 최종적으로 4096-dimensional descriptor 적용 - 앗..PCA했다며 ㅜ
    • QE와 DB에대한 argumentation으로 성능향상을 꽤 했음.
    • 학습셋은 neuralcode의 nosiy한 landmark 셋( 120k images of 650 famous landmarks)을 이용하여 clean 버전으로 만든다음 이용.(이는 DIR, delf에서도 마찬가지)
  • 간단히 표현한다면
    image

  • 1th 에 사용한 REMAP Feature 에 대한 정체 힌트
    https://twitter.com/ducha_aiki/status/1008833406036107270
    image
    image
    image
    image
    image

  • multi-conv layer(계층별)에서 roi 별 aggregate한 feature를 만드는듯

    • multi-layer r-mac이라고 보일수도 있음.
    • 다만, 과정에서 Entropy Weighting이라는 부분이 있는데 이는 KL-divergence를 이용하여 power up한다고 했는데..이부분이 자세히 안나와서..

remap 논문 : REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval

image

Practically, the KL-divergence Weighting (KLW) block in the REMAP architecture is implemented
using a  convolutional layer with weights initialized by the KL-divergence values and optimized
using Stochastic Gradient Descent (SGD) on the triplet loss function.

...
Our novel component KL-divergence weighting (KLW) can be implemented using 1D convolutional 
layer, with weights than can be optimized. 

2th

3th

4th - NaverLabs Europe

image

  • End-to-end Learning of Deep Visual Representations for Image Retrieval #17 이 연구가 base > 기존 global descriptor로 써 이용한것중에 SOTA 논문임.
  • 웃긴건 R-MAC 아니라, triplet GeM이란는것
  • 개인적으로는 base network를 하나만 사용했다는 점에서 1위가 아닐까?ㅎ 여러 base network를 사용하여 1등한다는게 실 서비스 측면에서 가능한가?? 큰 의문이 있음(개인적으로 불가능..)
  • base (R-MAC)논문에서 256/512 dim feature를 이용했다는데, 여기서는 2k dim임 > 이것으로 feature dim이 클수록 더 좋아짐을 예측..할수 있음.(DL 이전에 이런측면이 매우 강했음)
    • 논문저자와 상관없나??

4위 landmark-recognition 리뷰 - 주의) Retrieval 이 아님.

Step 1: Training a usual CN

  • 네트워크 구성 : ResNet34, ResNet50, ResNet101, ResNet152, ResNeXt101, InceptionResNetV2, DenseNet201.
  • 15k classes
  • augmentation
    • Random 224x224 crops
    • resizes, scales, shifts, rotates, flips
  • Pavel Pleskov
    • CNN's were training using fast.ai until accuracy on validation does not become 0.975.
    • He did not use classes with less than 10 samples (the number of remaining classes was eight thousand)
    • However, it is not enough to train a lot of CNN's and merge their predictions in this competition. There are a lot of challenges:
      • Classes with one or two samples in the train dataset > 1~2개는 너무 적다? Few-shot learning ?
      • Images from Google Landmark Retrieval Challenge in the test dataset
      • Non-landmark images in the test dataset > 없는데 어떻게 찾아??

Step 2: Recognizing landmarks from retrieval challenge

  • retrieval 챌린지데이터를 이용했다는..

Step 3: Recognizing images with no landmark

  • 따로 랜드 vs 비랜드 분류기를 만듦

Step 4: Few-shot learning

  • "Learning Robust Visual-Semantic Embeddings" 연구에서 이런 개념을 저는 들은적이 있음, 즉, Classes with one or two samples in the train dataset 의 개념에서 클래스당 이미지 개수가 1~2개정도의 적은수일 때, 어떻게든(??) 학습이 가능하도록하는 개념.
1. Extract features from hidden layer
2. For each image from test set find K closest images from train set (K=5)
3. For each class_id we computed: scores[class_id] = sum(cos(query_image, index) for index in K_closest_images)
4. For each class_id we normalized its score: scores[class_id] /= min(K, number of samples in train dataset with class=class_id)
5. label = argmax(scores), confidence = scores[label]

Step 5: kNN with features from local crops

  • We extracted 100 augmented local crops from each image. Crops with no landmarks were removed (using CNN from step 3)
    • 한이미지에서 100개의 crop한후, landmark가 아닌것은 제외하여 ROI를..헐...

Merging

  • 위에서 언급한 각가의 network 모델을 통합 그리고 다음과같은 휴이스틱 rules
1. Compute confidence score for each label using predictions from steps 1-4 as follow: score[label] = label_count / models_count + sum(label_confidence for each model) / models_count. Here label_count is a number of models where the prediction with max confidence is equal to the label.
2. We also used each prediction from step 5 with confidence = 1 + confidence_from_step_5 / 100

6 th

image
image

8th Etri

image

9th Naver

image

10 th

image

  1. CNN: ResNet, Wide ResNet, Inception-v3, DenseNet - imagenet 기반 pretrained model > The best result : ResNet-101 pretrained on ImageNet.
    • augmentation : random resized crops, color jittering, horizontal flipping , random resizing 224x224 crops
  2. Loss : metric learning, ArcFace: Additive Angular Margin Loss for Deep Face Recognition
  3. Inference - softmax문제로 풀지 않았기 때문에.
    • 학습셋중에 random하게 100개 선택
    • 이것에 대한 vector를 추출하고 mean vector를 구함.
    • 이 vector에 대해 normalize
    • 최종적으로 cosine distance 적용
  4. Ensembles
    • 5 folds training set > 5개 모델 생성
    • 이들 모델 결과에 대한 voting

14th - 4 main steps : 링크

  1. finetuned ImageNet-pretrained PyTorch ResNet50 > 이때 Landmark Recognition Challenge 데이터셋을 적용하여 학습.
  2. 이렇게 생성된 featrue를 KNN 이용 - 여기서는 facebook라이브러리
  3. top-100 에 대해 local descriptor + ransac verification > sift 기반 매칭을 사용한것같음.
  4. query expansion

17 th

19th

  • 2 stage
  1. landmark vs non-landmark. 분류기 생성
    1-1. Xception(with sigmoid activation layer)모델을 적용하여 fine-tuned > landmark vs non-landmark. 분류기 생성
    1-2. "Fine-tuning CNN Image Retrieval with No Human Annotation" 의 연구인 generalized average pooling 기법을 적용하여함.
  2. Google DELF features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant