# Image Retrieval Experiments

In [1]:
import os
import glob
import cv2
import pickle
import numpy as np
import matplotlib.pyplot as plt
from utils import *

### Get  Feature Database
The features are generated offline, see GenerateFeautreDatabase.ipynb

In [2]:
with open("FeatureDatabase.pkl","rb") as f:
    FeatureDatabase = pickle.load(f)

### Start Experiments !
Here, we experiment with all sort of features extracted <br>
We calculate overall MAP and MAP for each category <br>

*The result table and some findings are at the very bottom*

#### Random - Null Hypothesis Test
First, let's establish a null hypothesis to see if the methods below actually work <br>
We randomly generate features for each image, thus, the query should be random as well <br>
We get a pretty stable result around 0.04 overall MAP

In [3]:
DistMetric = ['cosine']
for run in range(5):
    Features = np.random.rand(599, 128)
    Id2Label = {}
    Id = 0
    for Label in FeatureDatabase:
        for Image in FeatureDatabase[Label]:
            Id2Label[Id] = Label
            Id += 1
    MAP, CMAP = GetMAP([Features], Id2Label, Metrics=DistMetric)
    print("Run %d - MAP: %8f" % (run+1, MAP))
    Labels = []
    CMAPs = []
    for Label in CMAP:
        Labels.append(Label)
        CMAPs.append(CMAP[Label])
    Labels = np.array(Labels)
    CMAPs = np.array(CMAPs)
    Rank = np.argsort(CMAPs)
    print("Best: %s(%8f)" % (Labels[Rank[-1]], CMAPs[Rank[-1]]), end="")
    print(", %s(%8f)" % (Labels[Rank[-2]], CMAPs[Rank[-2]]))
    print("Worst: %s(%8f)" % (Labels[Rank[0]], CMAPs[Rank[0]]), end="")
    print(", %s(%8f)" % (Labels[Rank[1]], CMAPs[Rank[1]]))

Run 1 - MAP: 0.041509
Best: nba_jersey(0.053166), baby_shoes(0.048646)
Worst: chair(0.034581), garment(0.034623)
Run 2 - MAP: 0.040547
Best: cartoon_purse(0.054079), bracelet(0.052686)
Worst: bicycle(0.032513), bottle(0.032546)
Run 3 - MAP: 0.041753
Best: korean_snack(0.053002), children_dress(0.051431)
Worst: women_clothes(0.033087), sprite(0.033302)
Run 4 - MAP: 0.041563
Best: children_dress(0.067377), nba_jersey(0.055482)
Worst: minnie_shoes(0.032202), skirt(0.032747)
Run 5 - MAP: 0.039554
Best: drum(0.052197), skirt(0.048698)
Worst: trousers(0.032239), minnie_dress(0.032474)


### Color Features
Here, we experiment with color features <br>
(Checkout GenerateFeautreDatabase.ipynb and Features.py for details of each feature)

We experement with 5 different color codecs:
- RGB
- Gray
- HSV
- YUV
- Lab

And also 5 different features:
- Global/Local color histogram: 
- Global/Local color moment:
- Color auto-correlogram:

#### Color Histograms
- First quantize the colors, then calculate histograms
- Local means that we cut the image into grids, and calculate histogram for each grid
- Reference: [Robust image retrieval based on color histogram of local feature regions](https://link.springer.com/article/10.1007/s11042-009-0362-0)

In [4]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global RGB Histogram"], 
              MetricList=['cityblock'])

MAP: 0.216819 - Time: 0.178556s
Best: garment(0.430642), sprite(0.371505)
Worst: nba_jersey(0.054858), chair(0.082266)


In [5]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local RGB Histogram"], 
              MetricList=['cityblock'])

MAP: 0.227421 - Time: 1.085099s
Best: goggles(0.480197), gge_snack(0.423427)
Worst: nba_jersey(0.061249), trousers(0.074155)


In [6]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global Gray Histogram"], 
              MetricList=['cityblock'])

MAP: 0.147758 - Time: 0.193506s
Best: minnie_dress(0.416588), garment(0.369363)
Worst: nba_jersey(0.043021), trousers(0.045867)


In [7]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local Gray Histogram"], 
              MetricList=['cityblock'])

MAP: 0.196801 - Time: 0.738027s
Best: goggles(0.579366), sprite(0.437721)
Worst: nba_jersey(0.057353), trousers(0.062184)


In [8]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global HSV Histogram"], 
              MetricList=['cityblock'])

MAP: 0.244082 - Time: 0.202458s
Best: minnie_dress(0.489047), korean_snack(0.441652)
Worst: nba_jersey(0.049035), trousers(0.057826)


In [9]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local HSV Histogram"], 
              MetricList=['cityblock'])

MAP: 0.252586 - Time: 2.307830s
Best: sprite(0.560629), goggles(0.469974)
Worst: trousers(0.056851), nba_jersey(0.057878)


In [10]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global YUV Histogram"], 
              MetricList=['cityblock'])

MAP: 0.205076 - Time: 0.204488s
Best: minnie_dress(0.554774), goggles(0.453895)
Worst: trousers(0.043189), nba_jersey(0.045316)


In [11]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local YUV Histogram"], 
              MetricList=['cityblock'])

MAP: 0.248130 - Time: 2.020626s
Best: sprite(0.579795), goggles(0.550990)
Worst: trousers(0.047736), nba_jersey(0.049910)


In [12]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global Lab Histogram"], 
              MetricList=['cityblock'])

MAP: 0.202028 - Time: 0.211468s
Best: minnie_dress(0.437601), goggles(0.431584)
Worst: nba_jersey(0.043520), trousers(0.051446)


In [13]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local Lab Histogram"], 
              MetricList=['cityblock'])

MAP: 0.247574 - Time: 2.030536s
Best: sprite(0.563474), aloe_vera_gel(0.550830)
Worst: nba_jersey(0.046656), trousers(0.065445)


#### Color Moments

In [14]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global RGB Moment"], 
              MetricList=['cityblock'])

MAP: 0.108482 - Time: 0.210436s
Best: garment(0.281315), goggles(0.217116)
Worst: nba_jersey(0.043861), aloe_vera_gel(0.054339)


In [15]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local RGB Moment"], 
              MetricList=['cityblock'])

MAP: 0.100542 - Time: 0.499691s
Best: garment(0.222236), gge_snack(0.187341)
Worst: nba_jersey(0.036981), drum(0.036991)


In [16]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global Gray Moment"], 
              MetricList=['cityblock'])

MAP: 0.084076 - Time: 0.173535s
Best: garment(0.190709), goggles(0.190580)
Worst: clock(0.039349), nba_jersey(0.045735)


In [17]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local Gray Moment"], 
              MetricList=['cityblock'])

MAP: 0.093259 - Time: 0.443842s
Best: garment(0.186177), gge_snack(0.161884)
Worst: bicycle(0.036053), drum(0.036238)


In [18]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global HSV Moment"], 
              MetricList=['cityblock'])

MAP: 0.120536 - Time: 0.218442s
Best: skirt(0.268911), goggles(0.236657)
Worst: nba_jersey(0.049542), trousers(0.052583)


In [19]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local HSV Moment"], 
              MetricList=['cityblock'])

MAP: 0.104022 - Time: 0.486698s
Best: aloe_vera_gel(0.277636), korean_snack(0.245633)
Worst: ice_cream(0.038129), drum(0.041457)


In [20]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global YUV Moment"], 
              MetricList=['cityblock'])

MAP: 0.114469 - Time: 0.173568s
Best: goggles(0.273059), minnie_dress(0.253180)
Worst: clock(0.035845), bicycle(0.042445)


In [21]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local YUV Moment"], 
              MetricList=['cityblock'])

MAP: 0.081900 - Time: 0.492682s
Best: garment(0.187043), aloe_vera_gel(0.173401)
Worst: drum(0.024146), bicycle(0.029019)


In [22]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Global Lab Moment"], 
              MetricList=['cityblock'])

MAP: 0.117041 - Time: 0.184500s
Best: minnie_dress(0.337858), garment(0.244982)
Worst: bicycle(0.033868), clock(0.037392)


In [23]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local Lab Moment"], 
              MetricList=['cityblock'])

MAP: 0.089716 - Time: 0.559494s
Best: aloe_vera_gel(0.296906), garment(0.225646)
Worst: drum(0.021298), bicycle(0.027063)


#### Color Auto-Correlogram
- Auto-correlogram calculate the probability that a color appears near(in some distance) itself
- The probabilities were approximated by only counting the 8 pixels in the 8 directions to reduce computation 
- Reference: [Image Indexing Using Color Correlograms](http://www.cs.cornell.edu/~rdz/Papers/Huang-CVPR97.pdf)

In [24]:
RunExperiment(FeatureDatabase, 
              FeatureList=["RGB Auto-Correlogram"], 
              MetricList=['cityblock'])

MAP: 0.280854 - Time: 0.286270s
Best: women_clothes(0.721712), minnie_dress(0.671403)
Worst: chair(0.058344), leather_purse(0.103197)


In [25]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Gray Auto-Correlogram"], 
              MetricList=['cityblock'])

MAP: 0.167133 - Time: 0.227401s
Best: minnie_dress(0.623290), garment(0.404961)
Worst: nba_jersey(0.035346), clock(0.040869)


In [26]:
RunExperiment(FeatureDatabase, 
              FeatureList=["HSV Auto-Correlogram"], 
              MetricList=['cityblock'])

MAP: 0.302070 - Time: 0.282244s
Best: minnie_dress(0.748849), korean_snack(0.710094)
Worst: chair(0.068773), bicycle(0.116661)


In [27]:
RunExperiment(FeatureDatabase, 
              FeatureList=["YUV Auto-Correlogram"], 
              MetricList=['cityblock'])

MAP: 0.243780 - Time: 0.287266s
Best: minnie_dress(0.667765), sprite(0.512101)
Worst: chair(0.060503), nba_jersey(0.065502)


In [28]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Lab Auto-Correlogram"], 
              MetricList=['cityblock'])

MAP: 0.233564 - Time: 0.283269s
Best: minnie_dress(0.584799), women_clothes(0.486346)
Worst: chair(0.053759), clock(0.056109)


### Texture/Shape Features 
Here, we experiment with texture and shape features <br>
(Checkout GenerateFeautreDatabase.ipynb and Features.py for details of each feature)

We use gray images for calculating all the features below, <br>
since we experimented with colors in the above experiments already

We experiment with 8 different features:
- Gabor extracted features
- Gabor global/local histogram
- (Grid) Local binary pattern
- (Pyramid) Histogram of oriented gradients
- Shape Index

#### Gabor Extracted Features
- First, the gray scale image is passed through several gabor filters
- Then extract energy, amplitude mean, and amplitude variance for each filtered image
- Reference: [Texture Features for Browsing and Retrieval of Image Data](https://www.csie.ntu.edu.tw/~b97053/paper/Texture%20features%20for%20browsing%20and%20retrieval%20of%20image%20data.pdf)

In [29]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Gabor Features"], 
              MetricList=['cosine'])

MAP: 0.128974 - Time: 0.208448s
Best: garment(0.402665), gge_snack(0.396663)
Worst: clock(0.043924), glasses(0.046348)


#### Gabor Global/Local Histogram
- First, the gray scale image is passed through several gabor filters
- Then calculate the color histogram for each filtered image
- Local means that we cut the image into grids, and calculate histogram for each grid

In [30]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Gabor Global Histogram"], 
              MetricList=['cityblock'])

MAP: 0.152167 - Time: 0.433875s
Best: garment(0.509958), minnie_dress(0.337481)
Worst: tennis_ball(0.050089), glasses(0.050118)


In [31]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Gabor Local Histogram"], 
              MetricList=['cityblock'])

MAP: 0.184137 - Time: 17.507202s
Best: goggles(0.486365), garment(0.426358)
Worst: trousers(0.072763), glasses(0.078090)


#### Local Binary Pattern
- First generate the local binary pattern with skimage.feature.local_binary_pattern
- Then calculate it's histogram
- Grid means that we cut the image into grids, and calculate histogram for each grid
- Reference: [Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns](http://www.ee.oulu.fi/research/mvmp/mvg/files/pdf/pdf_94.pdf)

In [32]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Local Binary Pattern"], 
              MetricList=['cityblock'])

MAP: 0.118357 - Time: 0.219415s
Best: goggles(0.273743), garment(0.250648)
Worst: tennis_ball(0.044047), clock(0.050103)


In [33]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Grid Local Binary Pattern"], 
              MetricList=['cityblock'])

MAP: 0.174267 - Time: 1.154907s
Best: goggles(0.454712), garment(0.375908)
Worst: orange(0.065158), drum(0.073253)


#### Histogram of Oriented Gradients
- Generated by skimage.feature.hog
- PHOG is Pyramid + HOG
    - Different resolutions of HOG are weighted by its inverse
    - Since we use cityblock as distance metric, the weights are applied when generating features (Shouldn't do it this way in reality lol)
- Reference: [Histograms of Oriented Gradients for Human Detection](https://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf)

In [34]:
RunExperiment(FeatureDatabase, 
              FeatureList=["HOG"], 
              MetricList=['cityblock'])

MAP: 0.233068 - Time: 1.161912s
Best: goggles(0.660422), gge_snack(0.536469)
Worst: drum(0.087050), glasses(0.091062)


In [35]:
RunExperiment(FeatureDatabase, 
              FeatureList=["PHOG"], 
              MetricList=['cityblock'])

MAP: 0.275521 - Time: 20.433430s
Best: gge_snack(0.797922), goggles(0.663505)
Worst: glasses(0.091843), ice_cream(0.135433)


#### Shape Index
- Generate by skimage.feature.shape_index
- Calculate Histogram after on shape index transformed image
- Use different sigmas then concatenate
- Reference: [Surface shape and curvature scales](https://www.sciencedirect.com/science/article/pii/026288569290076F)

In [36]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Shape Index"], 
              MetricList=['cityblock'])

MAP: 0.160053 - Time: 0.213428s
Best: goggles(0.394650), garment(0.369391)
Worst: aloe_vera_gel(0.052802), glasses(0.061207)


### Local Features
Here, we experiment with local features <br>
(Checkout GenerateFeautreDatabase.ipynb and Features.py for details of each feature)

Similarily, we only use gray images in the experiments below

We experiment with 3 different features:
- SIFT descriptors
- Dense SIFT descriptors
- Pyramid SIFT descriptors

#### SIFT
- SIFT is descriptors on detected keypoints
- Dense SIFT is to first Cut image into grids, then compute SIFT descriptors on each grid
- Cut image into different sizes of grids, then compute dense SIFT for each size
- Pyrimad is to cut the image into different sizes of grids and compute SIFT descriptors on each grid
- The `match` distance metric for SIFT is by using FLANN(Fast Library for Approximate Nearest Neighbors) to find the top 2 nearest neighbor for each descriptor, then count the number os matches that passes Lowe's ratio test (the nearest neighbor must be a lot closer than the second nearest neighbor to be counted as a "good match"), finally multiply by -1 (simply because I use min-favor distance)  
- Reference: [Distinctive Image Features from Scale-Invariant Keypoints](https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

In [38]:
RunExperiment(FeatureDatabase, 
              FeatureList=["SIFT"], 
              MetricList=['match'])

MAP: 0.241059 - Time: 2751.956974s
Best: gge_snack(0.996176), korean_snack(0.930700)
Worst: ice_cream(0.028946), glasses(0.029925)


  Distances[i] = ((D.T - Mean) / Var).T


In [39]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Dense SIFT"], 
              MetricList=['match'])

MAP: 0.235383 - Time: 5666.827838s
Best: gge_snack(0.543353), cup(0.504652)
Worst: ice_cream(0.035214), trousers(0.060200)


In [40]:
RunExperiment(FeatureDatabase, 
              FeatureList=["Pyramid SIFT"], 
              MetricList=['match'])

MAP: 0.208704 - Time: 34795.211326s
Best: gge_snack(0.525038), aloe_vera_gel(0.485464)
Worst: ice_cream(0.036362), trousers(0.055615)


### Fusion

Here, we try to fuse different methods above, and try to get better results

We calculate each feature's seperately, and calculate the distances with the provided metric <br>
Then, we normalize the distances of each feature to 0-mean-1-norm <br>
Finally, we weighted sum the distances up if `WeightList` is provided, otherwize, simply sum them up

Since there are too much combinations, we only tested a few by choosing the best performed methods, as well as looking at the best and worse categories of each method (Finding diverse and good features !)

We experimented this part in gradient descend fashion, which is by adding the best feature and adjusting the weights one at a time, and try to get the best weights for each feature, kinda like overfitting on the dataset in some sort, shouldn't be done in this way in reality though

In [41]:
RunExperiment(FeatureDatabase, 
              FeatureList=["HSV Auto-Correlogram"], 
              MetricList=['cityblock'],
              WeightList=[1.0])

MAP: 0.302070 - Time: 0.366021s
Best: minnie_dress(0.748849), korean_snack(0.710094)
Worst: chair(0.068773), bicycle(0.116661)


In [42]:
RunExperiment(FeatureDatabase, 
              FeatureList=["HSV Auto-Correlogram", "RGB Auto-Correlogram"], 
              MetricList=['cityblock', 'cityblock'],
              WeightList=[1.0, 1.0])

MAP: 0.321625 - Time: 0.479729s
Best: minnie_dress(0.795488), women_clothes(0.742243)
Worst: chair(0.063828), nba_jersey(0.120994)


In [43]:
RunExperiment(FeatureDatabase, 
              FeatureList=["HSV Auto-Correlogram", "RGB Auto-Correlogram", "PHOG"], 
              MetricList=['cityblock', 'cityblock', 'cityblock'],
              WeightList=[1.0, 1.0, 0.4])

MAP: 0.397548 - Time: 20.997874s
Best: gge_snack(0.765450), women_clothes(0.721041)
Worst: chair(0.125682), glasses(0.176486)


In [44]:
RunExperiment(FeatureDatabase, 
              FeatureList=["HSV Auto-Correlogram", "RGB Auto-Correlogram", "PHOG", "Local HSV Histogram"], 
              MetricList=['cityblock', 'cityblock', 'cityblock', 'cityblock'],
              WeightList=[1.0, 1.0, 0.4, 0.4])

MAP: 0.405850 - Time: 21.784767s
Best: gge_snack(0.756555), women_clothes(0.711493)
Worst: chair(0.132190), nba_jersey(0.172422)


In [45]:
RunExperiment(FeatureDatabase, 
              FeatureList=["HSV Auto-Correlogram", "RGB Auto-Correlogram", "PHOG", "Local HSV Histogram", "Gabor Local Histogram"], 
              MetricList=['cityblock', 'cityblock', 'cityblock', 'cityblock', 'cityblock'],
              WeightList=[1.0, 1.0, 0.4, 0.4, 0.06])

MAP: 0.405896 - Time: 39.265033s
Best: gge_snack(0.756625), women_clothes(0.711263)
Worst: chair(0.132332), nba_jersey(0.172485)


In [52]:
RunExperiment(FeatureDatabase, 
              FeatureList=["HSV Auto-Correlogram", "RGB Auto-Correlogram", "PHOG", "Local HSV Histogram", "Gabor Local Histogram", "SIFT"], 
              MetricList=['cityblock', 'cityblock', 'cityblock', 'cityblock', 'cityblock', 'match'],
              WeightList=[1.0, 1.0, 0.4, 0.4, 0.06, 1.0])

MAP: 0.407771 - Time: 3079.882237s
Best: gge_snack(0.780173), women_clothes(0.708421)
Worst: chair(0.143022), glasses(0.160553)


### Results

Here, we organize the results into tables <br>
We summarize the MAP, best 2 categories, worst 2 categories, and the inference time

#### Null Hypothesis

The null hypothesis is features generated randomly, thus only the MAP matters here <br>
We could see that the MAP is pretty stable around 0.04

| Categories v.s. Methods | MAP | Best 2 Categories | Worst 2 Categories | Inference Time
|:---------:|:-----:|:-------:|:-----:|:-----:|
| Null Hypothesis Run 1 | 0.041509 | Irrelevant | Irrelevant | Irrelevant |
| Null Hypothesis Run 2 | 0.040547 | Irrelevant | Irrelevant | Irrelevant |
| Null Hypothesis Run 3 | 0.041753 | Irrelevant | Irrelevant | Irrelevant |
| Null Hypothesis Run 4 | 0.041563 | Irrelevant | Irrelevant | Irrelevant |
| Null Hypothesis Run 5 | 0.039554 | Irrelevant | Irrelevant | Irrelevant |

#### Color Features

We experimented with 5 different color codecs (RGB, Gray, HSV, YUV, Lab) <br>
Here are some findings about these color codecs:
- HSV performs the best regardless of what method we choose
- Gray performs the worst overall, since it contains the least information
- YUV and Lab have very simillar results

We also experimented with 5 different methods, here are the findings:
- Auto-correlogram performs the best, since it not only contain color features, but also the spatial distribution of the colors
- Local histograms performed better than global histograms, since it also contains some spatial information. However, since the spatial feature it contains is computed simply by cutting image into grids, when image shift or rotate, it might lead to worse result, but the database we test in this homework is pretty stable, most images are in about the same position and rotaion, thus local histogram didn't lead to disastrous results
- In color moments, grid method actually made results worse, which is pretty surprising, maybe because after cutting into grids, the statistics of moments became too biased

Findings about best and worse categories:
- We could see that sprite, minnie_dress, korean_snack, goggles, garments, aloe_vera_gel performed the best
    - By looking at the images, we could see that these categories all have simmilar in-class color distribution
    - Except for goggles, I think the reason goggles perform well in colors is because of the consistant white background
- We could see that nba_jersey, trousers, chair performs the worse:
    - By looking at the images, we could see that these categories all have very diverse in-class color distribution 

| Categories v.s. Methods | MAP | Best 2 Categories | Worst 2 Categories | Inference Time
|:---------:|:-----:|:-------:|:-----:|:-----:|
| Global Color Histogram (RGB) | 0.216819 | garment(0.430642), sprite(0.371505) | nba_jersey(0.054858), chair(0.082266) | 0.178556s|
| Global Color Histogram (Gray) | 0.147758 | minnie_dress(0.416588), garment(0.369363) | nba_jersey(0.043021), trousers(0.045867) | 0.193506s |
| Global Color Histogram (HSV) | **0.244082** | minnie_dress(0.489047), korean_snack(0.441652) | nba_jersey(0.049035), trousers(0.057826) | 0.202458s |
| Global Color Histogram (YUV) | 0.205076 | minnie_dress(0.554774), goggles(0.453895) | trousers(0.043189), nba_jersey(0.045316) | 0.204488s |
| Global Color Histogram (Lab) | 0.202028 | minnie_dress(0.437601), goggles(0.431584) | nba_jersey(0.043520), trousers(0.051446) | 0.211468s |
| Local Color Histogram (RGB) | 0.227421 | goggles(0.480197), gge_snack(0.423427) | nba_jersey(0.061249), trousers(0.074155) | 1.085099s |
| Local Color Histogram (Gray) | 0.196801 | goggles(0.579366), sprite(0.437721) | nba_jersey(0.057353), trousers(0.062184) | 0.738027s |
| Local Color Histogram (HSV) | **0.252586** | sprite(0.560629), goggles(0.469974) | trousers(0.056851), nba_jersey(0.057878) | 2.307830s |
| Local Color Histogram (YUV) | 0.248130 | sprite(0.579795), goggles(0.550990) | trousers(0.047736), nba_jersey(0.049910) | 2.020626s |
| Local Color Histogram (Lab) | 0.247575 | sprite(0.563474), aloe_vera_gel(0.550830) | nba_jersey(0.046681), trousers(0.065445) | 2.009626s |
| Global Color Moments (RGB) | 0.108482 | garment(0.281315), goggles(0.217116) | nba_jersey(0.043861), aloe_vera_gel(0.054339) | 0.210436s |
| Global Color Moments (Gray) | 0.084076 | garment(0.190709), goggles(0.190580) | clock(0.039349), nba_jersey(0.045735) | 0.173535s |
| Global Color Moments (HSV) | **0.120536** | skirt(0.268911), goggles(0.236657) | nba_jersey(0.049542), trousers(0.052583) | 0.218442s |
| Global Color Moments (YUV) | 0.114469 | goggles(0.273059), minnie_dress(0.253180) | clock(0.035845), bicycle(0.042445) | 0.173568s |
| Global Color Moments (Lab) | 0.117041 | minnie_dress(0.337858), garment(0.244982) | bicycle(0.033868), clock(0.037392) | 0.184500s |
| Local Color Moments (RGB) | 0.100542 | garment(0.222236), gge_snack(0.187341) | nba_jersey(0.036981), drum(0.036991) | 0.499691s |
| Local Color Moments (Gray) | 0.093259 | garment(0.186177), gge_snack(0.161884) | bicycle(0.036053), drum(0.036238) | 0.443842s |
| Local Color Moments (HSV) | **0.104022** | aloe_vera_gel(0.277636), korean_snack(0.245633) | ice_cream(0.038129), drum(0.041457) | 0.486698s |
| Local Color Moments (YUV) | 0.081900 | garment(0.187043), aloe_vera_gel(0.173401) | drum(0.024146), bicycle(0.029019) | 0.492682s |
| Local Color Moments (Lab) | 0.089716 | aloe_vera_gel(0.296906), garment(0.225646) | drum(0.021298), bicycle(0.027063) | 0.559494s |
| Color Auto-Correlogram (RGB) | 0.280854 | women_clothes(0.721712), minnie_dress(0.671403) | chair(0.058344), leather_purse(0.103197) | 0.286270s |
| Color Auto-Correlogram (Gray) | 0.167133 | minnie_dress(0.623290), garment(0.404961) | nba_jersey(0.035346), clock(0.040869) | 0.227401s |
| Color Auto-Correlogram (HSV) | **0.302070** | minnie_dress(0.748849), korean_snack(0.710094) | chair(0.068773), bicycle(0.116661) | 0.282244s |
| Color Auto-Correlogram (YUV) | 0.243780 | minnie_dress(0.667765), sprite(0.512101) | chair(0.060503), nba_jersey(0.065502) | 0.287266s |
| Color Auto-Correlogram (Lab) | 0.233564 | minnie_dress(0.584799), women_clothes(0.486346) | chair(0.053759), clock(0.056109) | 0.283269s |

#### Texture / Shape Features

We experimented with 8 different methods, here are some findings: <br>
- HOG performed the best, and adding pyramid structure made it improved quite a bit
- goggles performed the best in most methods, maybe due to it's consistant shape, but also might be it's consistant white background again
- garment and gge_snack also performed well in texture and shapes, maybe due to the shape and texture of the clothe for garment and the weird texture (the noodle) for gge_snack
- glasses seems to be performing the worst in texture/shape feature
    - Texture of the background interferes a lot
    - Shape of glasses aren't consistant
    - The girl wearing the glass might also intefere a lot
    - Why is there a picture of a cyclist wearing sunglasses in this category??? The majority of this image is the bicycle and the cyclist, I wouldn't want my search of glasses have this image in the results 

| Categories v.s. Methods | MAP | Best 2 Categories | Worst 2 Categories | Inference Time
|:---------:|:-----:|:-------:|:-----:|:-----:|
| Gabor Extrcted Features | 0.128974 | garment(0.402665), gge_snack(0.396663) | clock(0.043924), glasses(0.046348) | 0.208448s |
| Gabor Global Histogram | 0.152167 | garment(0.509958), minnie_dress(0.337481) | tennis_ball(0.050089), glasses(0.050118) | 0.433875s |
| Gabor Local Histogram | 0.184137 | goggles(0.486365), garment(0.426358) | trousers(0.072763), glasses(0.078090) | 17.527202s |
| Local Binary Pattern | 0.118357 | goggles(0.273743), garment(0.250648) | tennis_ball(0.044047), clock(0.050103) | 0.219415s |
| Grid Local Binary Pattern | 0.174267 | goggles(0.454712), garment(0.375908) | orange(0.065158), drum(0.073253) | 1.154907s |
| Histogram of oriented gradients | 0.233608 | goggles(0.660422), gge_snack(0.536469) | drum(0.087050), glasses(0.091062) | 1.161912s |
| Pyramid Histogram of oriented gradients | **0.275521** | gge_snack(0.797922), goggles(0.663505) | glasses(0.091843), ice_cream(0.135433) | 20.433430s |
| Shape Index | 0.160053 | goggles(0.394650), garment(0.369391) | aloe_vera_gel(0.052802), glasses(0.061207) | 0.213428s |

#### Local Feature

We experimented with 3 different methods, here are some findings:
- The overall MAP for SIFT isn't good, but if we check on the best categories, we could see that SIFT performed almost perfect on certain categories, bad bad on some other
- Dense SIFT and Pyramid Dense SIFT performed ok as well, but not as good as SIFT with keypoint detection, maybe because of too much noise when calculating the descriptors
- SIFT takes a very long time to inference, since it cannot simply use cityblock or cosine distance metrics, but have to try to match the descriptors, even after using KD-tree with ANN, we still need about 10 seconds for 1 query, even on such a small database as ours
- If we limit the number of keypoints to detect for SIFT, the performance will drop a bit, but will speed up about quit a lot (about 30% speed up when restriced number of keypoints to 512)
- The matching process could actually be well parallelize, but I didn't spend much time implementing it
- gge_snack performed extremely well with SIFT, since the words(張君雅小妹妹) are very good keypoints for SIFT descriptors, and almost always matches correctly on the words
- The logo on aloe_vera_gel as well as the words are also the reason why SIFT did so well on it
- goggles and oranges performed well because no good descriptors could be found on these images

| Categories v.s. Methods | MAP | Best 2 Categories | Worst 2 Categories | Inference Time
|:---------:|:-----:|:-------:|:-----:|:-----:|
| SIFT | **0.241059** | gge_snack(0.996176), korean_snack(0.930700) | ice_cream(0.028946), glasses(0.029925) | 2751.956974s |
| Dense SIFT | 0.235383 | gge_snack(0.543353), cup(0.504652) | ice_cream(0.035214), trousers(0.060200) | 5666.827838s |
| Pyramid Dense SIFT | 0.208704 | gge_snack(0.525038), aloe_vera_gel(0.485464) | ice_cream(0.036362), trousers(0.055615) | 34795.211326s |

#### Fusion

Fusing increases the performance by a lot, since different features could complement each other <br>
Adding SIFT didn't improve much, probably because the categories SIFT done well is already doing pretty well

| Categories v.s. Methods | MAP | Best 2 Categories | Worst 2 Categories | Inference Time |
|:---------:|:-----:|:-------:|:-----:|:-----:|
| Fusion (without SIFT) | 0.405896 | gge_snack(0.756625), women_clothes(0.711263) | chair(0.132332), nba_jersey(0.172485) | 39.265033s |
| Fusion (with SIFT) | 0.407771 | gge_snack(0.780173), women_clothes(0.708421) | chair(0.143022), glasses(0.160553) | 3079.882237s |