# Laboratory #3_2 : Image Classification using Bag of Visual Words

At the end of this laboratory, you would get familiarized with

*   Creating Bag of Visual Words
    *   Feature Extraction
    *   Codebook construction
    *   Classification

**Remember this is a graded exercise.**

*   For every plot, make sure you provide appropriate titles, axis labels, legends, wherever applicable.
*   Create reusable functions where ever possible, so that the code could be reused at different places.
*   Mount your drive to access the images.
*   Add sufficient comments and explanations wherever necessary.

---

In [1]:
%%shell
git clone https://github.com/mariorot/CV-MAI
mv CV-MAI/scripts/* /content/
mv 'CV-MAI/Session 7/101_ObjectCategories/'* /content/
mv 'CV-MAI/Session 7/Caltech_101_subset/' /content/

Cloning into 'CV-MAI'...
remote: Enumerating objects: 21081, done.[K
remote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 21081 (delta 3), reused 1 (delta 1), pack-reused 21072[K
Receiving objects: 100% (21081/21081), 171.02 MiB | 26.83 MiB/s, done.
Resolving deltas: 100% (102/102), done.
Updating files: 100% (21037/21037), done.




In [72]:
# Loading necessary libraries (Feel free to add new libraries if you need for any computation)

import os
import numpy as np
from skimage.feature import ORB
from skimage import feature
from skimage.color import rgb2gray
from skimage.io import imread
from scipy.cluster.vq import vq
from matplotlib import pyplot as plt
from skimage import io
from collections import defaultdict
from sklearn.cluster import MiniBatchKMeans

import custom_plots as cp

## Loading dataset

We will use 3 categories from Caltech 101 objects dataset for this experiment. Upload the dataset to the drive and mount it.

In [48]:
# modify the dataset variable with the path from your drive

#We define an empty dictionary an the root folder of the 3 categories
images_dict = defaultdict(list)
root_folder = '/content/Caltech_101_subset'

In [69]:
categories = ['butterfly', 'kangaroo', 'dalmatian']

*   Create a list of file and the corresponding labels

In [49]:
# solution

#We charged the images to the dictionary with the proper label and id
for folder_name in os.listdir(root_folder):
  folder_path = os.path.join(root_folder, folder_name)
  if os.path.isdir(folder_path):
    for filename in os.listdir(folder_path):
      if filename.endswith(('.jpg')):
        image_path = os.path.join(folder_path, filename)
        image_id= os.path.join(folder_name, filename)
        image = io.imread(image_path)
        label = folder_name
        images_dict[image_id].append( (image,label))

In [50]:
print('Total number of images:', len(images_dict))

Total number of images: 244


*   Create a train / test split where the test is 10% of the total data

In [51]:
# solution
from sklearn.model_selection import train_test_split

keys = list(images_dict.keys())
values = list(images_dict.values())

train_keys, test_keys, train_values, test_values = train_test_split(keys, values, test_size=0.1, random_state=7)

x_train = {k: v for k, v in zip(train_keys, train_values)}
x_test = {k: v for k, v in zip(test_keys, test_values)}

print('Train set:', len(x_train))
print('Test set:', len(x_test))

Train set: 219
Test set: 25


*   How do you select the train/test split?

**Solution**

In this case, since the split of the classes was somehow homogeneous  (91 images of butterfly - 37% , 67 images of dalmatian - 28% and 86 images of kangaroos - 35%)  and the test data was only 10% of the total data set, we decided to do it in a random way.

However, this may not be possible in 2 scenarios:

- The distribution of the actual images was not homogeneous. For example having only 10% of an specific class. Because this way, there is a chance that we do not take enough images of the small class for the training set.

- The total training set was small enough to contain only images of the same class.

Since neither of these scenarios were happening we decided to do it randomly.



## Feature Extraction using ORB

The first step is to extract descriptors for each image in our dataset. We will use ORB to extract descriptors.

*   Create ORB detector with 64 keypoints.


In [53]:
# solution
def get_ORB(img1,n):
  descriptor_extractor = feature.ORB(n_keypoints=n)
  descriptor_extractor.detect_and_extract(img1)
  keypoints= descriptor_extractor.keypoints
  descriptors= descriptor_extractor.descriptors

  return descriptors

*   Extract ORB descriptors from all the images in the train set.


In [67]:
# solution
f_desc=[]

for image in x_train:
  if len(x_train[image][0][0].shape)==3:
    desc=get_ORB(rgb2gray(x_train[image][0][0]),64)
    f_desc.append(desc)
  else:
    desc=get_ORB(x_train[image][0][0],64)
    f_desc.append(desc)

*   What is the size of the feature descriptors? What does each dimension represent in the feature descriptors?

In [78]:
# solution
print(len(f_desc))
f_desc[0].shape

219


(64, 256)

**Solution**

The size of the feature descriptor is 64x256



## Codebook Construction

Codewords are nothing but vector representation of similar patches. This codeword produces a codebook similar to a word dictionary. We will create the codebook using K-Means algorithm

*   Create a codebook using K-Means with k=number_of_classes*10
*   Hint: Use sklearn.cluster.MiniBatchKMeans for K-Means

In [79]:
# solution
patches = np.concatenate(f_desc, axis=0)
k = len(categories) * 10
kmeans = MiniBatchKMeans(n_clusters=k)
kmeans.fit(patches)

codebook = kmeans.cluster_centers_

print("Codebook:")
print(codebook)

14016




Codebook:
[[0.30074349 0.70743494 0.38847584 ... 0.51784387 0.64200743 0.31895911]
 [0.55026455 0.88947678 0.4244562  ... 0.40446796 0.69135802 0.75661376]
 [0.46440307 0.77437021 0.32694414 ... 0.44852136 0.80613363 0.51971522]
 ...
 [0.38973384 0.46197719 0.51901141 ... 0.40494297 0.34648289 0.41397338]
 [0.6408787  0.43027698 0.38729704 ... 0.60840497 0.52292264 0.78510029]
 [0.48008611 0.16576964 0.47847147 ... 0.6926803  0.39181916 0.28040904]]


In [76]:
codebook.shape

(30, 256)

*   Create a histogram using the cluster centers for each image descriptor.
    *   Remember the histogram would be of size *n_images x n_clusters*.

In [None]:
# solution

 im_features = np.array([np.zeros(no_clusters) for i in range(image_count)])
    for i in range(image_count):
        for j in range(len(descriptor_list[i])):
            feature = descriptor_list[i][j]
            feature = feature.reshape(1, 128)
            idx = kmeans.predict(feature)
            im_features[i][idx] += 1


histograms = []
for image_descriptor in patches:
    # Assign each descriptor to a cluster
    labels = kmeans.predict(image_descriptor)

    # Build a histogram of size 215x30
    histogram, _ = np.histogram(labels, bins=np.arange(k + 1))
    histograms.append(histogram)

# Convert the list of histograms into a 2D NumPy array
histograms_array = np.array(histograms)

# Display the histograms
plt.imshow(histograms_array, aspect='auto', cmap='viridis')
plt.xlabel('Cluster Index')
plt.ylabel('Image Index')
plt.title('Histograms of Cluster Assignments')
plt.colorbar(label='Frequency')
plt.show()



# Creating Classification Model

*   The next step is to create a classification model. We will use a C-Support Vector Classification for creating the model.



In [128]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

*   Use GridSearchCV to find the optimal value of C and Gamma.

In [None]:
    Cs = [0.5, 0.1, 0.15, 0.2, 0.3]
    gammas = [0.1, 0.11, 0.095, 0.105]
    param_grid = {'C': Cs, 'gamma' : gammas}
    grid_search = GridSearchCV(SVC(kernel=kernel), param_grid, cv=5)
    grid_search.fit(x_train[:][0][0],x_train[:][0][1] )
    grid_search.best_params_
    return grid_search.best_params_


    svm = SVC(kernel = kernel, C =  C_param, gamma = gamma_param, class_weight = class_weight)
    svm.fit(features, train_labels)
    return svm

# Testing the Classification Model

*   Extract descriptors using ORB for the test split
*   Use the previously trained k-means to generate the histogram
*   Use the classifier to predict the label


In [None]:
# solution



*   Calculate the accuracy score for the classification model

In [None]:
# solution


*   Generate the confusion matrix for the classification model

In [None]:
# solution



*   Why do we use Clustering to create the codebook?
*   What are the other techniques that can be used to create the codebook?

**Solution**

- We use the clustering since we want to bring together all the descriptors that are similar in order to not get duplicates and just keep a limited amount of descriptors.

- Since the main objective is to keep the similar information, most of the common clustering techniques will suit properly this task, as the hierarchical or agglomerative clustering. However we can use also some techniques as PCA that tries to achieve a dimmensionality reduction while keeping most of the data.

# Increased Feature Dimensions

*   Repeat the classification using features of 256 ORB keypoints.

In [134]:
# solution

f_desc_2=[]

for image in x_train:
  if len(x_train[image][0][0].shape)==3:
    desc=get_ORB(rgb2gray(x_train[image][0][0]),256)
    f_desc_2.append(desc)
  else:
    desc=get_ORB(x_train[image][0][0],256)
    f_desc_2.append(desc)

*   What is the difference in classifier performance between using 64 keypoints and 256 keypoints?

**Solution**

The accuracy increases since we are taking into account more details of the image

*   Will further adding more keypoints increase the performance of the algorithm?

**Solution**

No, since it happens with most algorithms more data may impply more precision, but there is a point reach where it does not give more information, and it only increases the computation time.
