<a href="https://colab.research.google.com/github/AbdullahMakhdoom/Image-Search-Engine/blob/main/improve_search_accuracy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Objective** : In this notebook, we will compare the search accuracy of pre Caltech101 features (generated from pre-trained ResNet-50 on ImageNet) with the fine-tuned ResNet-50 features.
The fine-tuning was performed in 'feature-extraction.ipynb' notebook and saved in Google Drive.

We will also visualize how fine-tuning results in more cleaner seperation of clusters using t-SNE.

In [3]:
# import all required packages
import numpy as np
import pickle
from tqdm import tqdm, tqdm_notebook
import random
import time
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
import PIL
from PIL import Image
from sklearn.neighbors import NearestNeighbors

import glob
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline

Mount the Google drive before loading the pre-trained and fine-tuned '.pickle' files

In [18]:
feature_list = pickle.load(open('/content/drive/MyDrive/Caltech101-features/features-caltech101-resnet.pickle',
                                'rb'))
finetuned_feature_list = pickle.load(open('/content/drive/MyDrive/Caltech101-features/features-caltech101-resnet-finetuned.pickle',
                                          'rb'))


In [6]:
# Also load the filenames and class_ids
filenames = pickle.load(open('/content/drive/MyDrive/Caltech101-features/filenames-caltech101.pickle', 'rb'))
class_ids = pickle.load(open('/content/drive/MyDrive/Caltech101-features/class_ids-caltech101.pickle', 'rb'))

### Helper Functions

In [8]:
# Helper function to get the classname
def classname(str):
    return str.split('/')[-2]


# Helper function to get the classname and filename
def classname_filename(str):
    return str.split('/')[-2] + '/' + str.split('/')[-1]


def calculate_accuracy(feature_list):
    num_nearest_neighbors = 5
    correct_predictions = 0
    incorrect_predictions = 0
    neighbors = NearestNeighbors(n_neighbors=num_nearest_neighbors,
                                 algorithm='brute',
                                 metric='euclidean').fit(feature_list)
    for i in tqdm_notebook(range(len(feature_list))):
        distances, indices = neighbors.kneighbors([feature_list[i]])
        for j in range(1, num_nearest_neighbors):
            if (classname(filenames[i]) == classname(
                    filenames[indices[0][j]])):
                correct_predictions += 1
            else:
                incorrect_predictions += 1
    print(
        "Accuracy is ",
        round(
            100.0 * correct_predictions /
            (1.0 * correct_predictions + incorrect_predictions), 2))

####1. Accuracy of Brute Force over Caltech101 features

In [9]:
calculate_accuracy(feature_list[:])


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


HBox(children=(FloatProgress(value=0.0, max=8677.0), HTML(value='')))


Accuracy is  88.36


####2. Accuracy of Brute Force over the PCA compressed Caltech101 features

In [11]:
num_feature_dimensions = 100
pca = PCA(n_components=num_feature_dimensions)
pca.fit(feature_list)
feature_list_compressed = pca.transform(feature_list[:])

In [12]:
calculate_accuracy(feature_list_compressed[:])

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


HBox(children=(FloatProgress(value=0.0, max=8677.0), HTML(value='')))


Accuracy is  88.48


####3. Accuracy of Brute Force over finetuned Caltech101 features

In [19]:
calculate_accuracy(finetuned_feature_list[:])

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


HBox(children=(FloatProgress(value=0.0, max=8677.0), HTML(value='')))


Accuracy is  95.52


####4. Accuracy over Brute Force over the PCA compressed finetuned Caltech101 features

In [20]:
# Perform PCA on Finetuned features
num_feature_dimensions = 100
pca = PCA(n_components=num_feature_dimensions)
pca.fit(finetuned_feature_list)
feature_list_compressed = pca.transform(finetuned_feature_list[:])

In [21]:
calculate_accuracy(feature_list_compressed[:])


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


HBox(children=(FloatProgress(value=0.0, max=8677.0), HTML(value='')))


Accuracy is  95.53


## Results

Accuracy on Caltech101

| Algorithm | Accuracy using Pretrained features| Accuracy using Finetuned features | 
|-------------|----------------------------|------------------------|
| Brute Force | 88.36 | 95.52 | 
| PCA + Brute Force | 88.48  |  95.53 |
