Skip to content

Image Retrieval using Deep Learning, Visual Search, Deep Image Feature

Notifications You must be signed in to change notification settings

chullhwan-song/OLD-Deep-Image-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Deep Image Retrieval

Image Retrieval using Deep Feature

My Experiements

notify
  • My individual review is that methods such as multi-scale input images, multiple backbone networks, QE, etc. are not practical and are just tricks to increase performance. In this experiments, I avoid the above mentioned methods as much as possible, and I will try to achieve SOTA in a way that is as true to the basics as possible.
  • These results show only the best results for each evaluation set(Oxford5k, Paris6k, Holidays) among the results of applying the model generated during training. That is, it is not a result of a single model. Of course, one model may yield the best results for all evaluation sets.
  • In case of npair loss, normalization is not performed in the last layer. The reason was not learned.
  • using tensorflow(tf) and pytorch(pt)
  • P of the gem was fixed at 3. In the future, I plan to continue tuning around 3. > hint
  • pytorch gem [1-10][2-2] : Code for reproducing fintuned-gem > Some code modifed
  • [2] shows results using fintuned-gem's trained model
  • [2-3] this paper based
update : 2020-04-09 (Currently in progress)
NO net feat Holidays Paris6k Oxf5k dim loss trainset pre-trained lib
[1-1] alexnet fc6 0.789 0.557 128 cls nc imgnet
[1-2] res152 gem 0.9026 0.8927 0.7808 1024 npairs nc imgnet tf
[1-3] res152 gem:single 0.9001 0.8927 0.7507 1024 npairs nc imgnet tf
[1-4] res152 mac 0.8983 0.8779 0.7732 1024 npairs nc imgnet tf
[1-5] res152 mac:single 0.8983 0.8702 0.7636 1024 npairs nc imgnet tf
[1-6] res152 spoc 0.8845 0.8322 0.7184 1024 npairs nc imgnet tf
[1-7] res152 spoc:single 0.8813 0.8306 0.7184 1024 npairs nc imgnet tf
[1-8] res101 r-mac 0.8527 0.9104 0.8018 2048 triplet nc imgnet pt
[1-9] res152 r-mac 0.8468 0.935 0.808 2048 triplet nc imgnet pt
[1-10] res101 gem 0.8487 0.7339 2048 contrastive SfM imgnet pt
[2] res101 gem 0.829 0.782 2048 contrastive SfM imgnet pt
[2-3] res101 gem 0.9323 0.9185 512 arcface GDV1 imgnet pt
update : 2020-04-09 (Currently in progress)
NO net feat rox_e rox_m rox_h rpa_e rpa_m rpa_h dim loss trainset pre-trained lib
[2] res101 gem 0.7389 0.539 0.247 0.8467 0.659 0.388 2048 contrastive SfM imgnet pt
[2-1] res101 r-mac 0.6058 0.4156 0.1421 0.828 0.6759 0.4418 2048 triplet nc imgnet pt
[2-2] res101 gem 0.706 0.495 0.19 0.849 0.6757 0.4183 2048 contrastive SfM imgnet pt
[2-3] res101 gem 0.8512 0.7097 0.4665 0.9157 0.8061 0.6319 512 arcface GDV1 imgnet pt
  • refer[1-1] : Neural Codes for Image Retrieval : [paper][review]
  • nc: neuralcode clean dataset
  • SfM: retrieval-SfM-120k
  • GDV1 : google landmark V1
  • GDV2 : google landmark V2
  • rox: revisitop_oxford
  • rpa: revisitop_rparis
  • e:easy, m:middle, h:hard
  • triplet : triplet loss
  • imgnet : imagenet
  • tf : tensorflow
  • pt : pytorch

Instance benchmark dataset

NO Title 카테고리 link category query all 비고
1 Oxford5k landmark 링크 16 55 5,062
2 Paris6k landmark 링크 11 55 6,412
3 Holidays landmark 링크 500 500 1,491
4 Google-Landmarks_V1 landmark 링크 12,894 100,000 1,060,709 textbysearch
4 Google-Landmarks_V2 landmark 링크 203,094 117,577 5,012,248 textbysearch
5 UKBench landmark 링크 2,550 10,200 10,200
6 FlickrLogos-32 logo 링크 32 500 8,240
7 FlickrLogos-47 logo 링크 47 ? ?
8 INSTRE Instance 링크 200 N/A 28,543
9 ZuBuD landmark 링크 200 115 1,005
10 SMVS 표지류 링크 1,200 3,300 1,200
11 DupImage Instance 링크 33 108 1,104
12 Neural Codes landmark 링크 672 213,678 textbysearch

Learnable (fine-tuning using targeted datasets)

  • QE performance remove
  • GD : Global Descriptor
  • LD : Local Descriptor
Paper Oxf5k Par6k Oxf105k Par106k Holidays descriptor 비고
SOTA 86.1 94.5 82.8 90.6 90.3/94.8
[1] 86.1 94.5 82.8 90.6 90.3/94.8 GD DIR, triplet, R-MAC
[2] 83.8 85.0 82.6 81.7 LD delf, softmax
[3] 79.7 83.8 73.9 76.4 82.5 GD siamense, R-MAC
  • [1] End-to-end Learning of Deep Visual Representations for Image Retrieval : [paper][review]
  • [2] Large-Scale Image Retrieval with Attentive Deep Local Features : [paper][review]
    • delf는 QE+DIR과의조합을 통해 SOTA를 기록한 Case임.
  • [3] CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples

Not Learnable (only trained on ImageNet)

Paper Oxf5k Par6k Oxf105k Par106k Holidays Sculp6k UKB descriptor 비고
SOTA 0.712 0.805 0.672 0.733
[1] 0.712 0.805 0.672 0.733 GD CAM
[2] 53.3 67.0 48.9 71.6 37.7 84.2 GD MAC (first paper), Max pooling + l1 dist
  • [1] Class-Weighted Convolutional Features for Visual Instance Search
  • [2] Visual Instance Retrieval with Deep Convolutional Networks

About

Image Retrieval using Deep Learning, Visual Search, Deep Image Feature

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published