# Data Science | Lab: Image Processing
**Table of Contents:**  <a name="toc"></a>
1. [Bag of Visual Words](#bovw)
2. [Histogram of Visual Words](#hovw)
3. [Image Classification](#classification)

# Bag of Visual Words
Analogous to the Bag of Words technique which we covered in the last lab. This time we will extract "visual" words in order to classify pictures.
<a name="bovw"></a>
<div style="width: 500px; text-align: center;">
    <img src="https://customers.pyimagesearch.com/wp-content/uploads/2015/09/bovw_image_example.jpg"/>
    <a href="https://customers.pyimagesearch.com/the-bag-of-visual-words-model/" style="">Source</a>
</div>

For the feature point extraction and for clustering we will rely on [OpenCV](https://docs.opencv.org/4.x/index.html) library.

In [None]:
#pip install opencv-contrib-python
#%pip install imutils

import numpy as np
import matplotlib.pyplot as plt
import os
from sklearn.model_selection import train_test_split
from scipy.cluster.vq import vq
from sklearn.neighbors import NearestCentroid
from sklearn.metrics import accuracy_score
import imutils
import cv2

In [None]:
# Define the image source folder
path = "Caltec_101"
# Choose three different classes individually
use_classes= ['accordion', 'airplanes', 'chair']

In [None]:
# This variable will store paths to each image
X_paths = []
# This variable will store class id as label
y = []

### Constructing the dataset
The following code uses the above defined ``path`` and ``use_classes`` to scan the given folders for pictures. Using all pictures stored in the given folders the dataset is going to be constructed. ``X_paths`` is storing paths to each image and ``y`` gives each picture a class id starting with 0. We are using images from [Caltech 101](https://docs.scipy.org/doc/scipy/tutorial/index.html#user-guide) dataset, which contains pictures of objects belonging to 101 categories.

In [None]:
import glob 

for x,image in enumerate(use_classes):
    old_amount = len(X_paths)
    X_paths.extend(glob.glob(path+"/"+image+"/*.jpg"))
    y.extend(np.array([x]*(len(X_paths)-old_amount)))

### Train/Test split
Perform a train/test split of the constructed dataset. Use 80% of data to train the model. Be sure to use stratified sampling since not all categories consist of equal number of images.

In [None]:
x_train, x_test, y_train, y_test = train_test_split(X_paths, y, train_size=0.8, random_state=42, shuffle=True)


### Extract SIFT Features
Use ``cv2`` ([OpenCV](https://docs.opencv.org/4.x/index.html)) to extract meaningful features which will be used as visual words further on. Loop through all given image paths and extract the descriptors of found keypoints using ``detectAndCompute()`` function described [here](https://docs.opencv.org/3.4/d0/d13/classcv_1_1Feature2D.html#a8be0d1c20b08eb867184b8d74c15a677). Moreover, useful info can be found in this [tutorial](https://docs.opencv.org/3.4/da/df5/tutorial_py_sift_intro.html).

In [None]:
def extract_features(x_train, sift):
    desc_list = []
    for path in x_train:
        image = cv2.imread(path) 
        kp,desc = sift.detectAndCompute(image,None)
        desc_list.append(desc)
    
    return desc_list

In [None]:
sift = cv2.SIFT_create()
train_desc_list = extract_features(x_train,sift)

### Clustering
Similar descriptors are building point clouds in hyperdimensional space. Be sure to use K-Means clustering method to extract clusters of descriptors from your descriptor list. Use sklearn's [KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) implementation and parameterize it carefully (i.e., minimize ``n_init`` and ``max_iter``). Otherwise, it will be really slow. OpenCV provides a faster implementation of [K-Means](https://docs.opencv.org/4.x/d1/d5c/tutorial_py_kmeans_opencv.html) and can be used as alternative. Use ``k=100`` as parameter for setting the number of classes

<div class="alert alert-block alert-info">
    <b>Caution:</b> You will have to stack all the descriptors vertically in a numpy array in order to perform the clustering. 
</div>

In [None]:
# Stack all the descriptors vertically in a numpy array
descriptors = np.vstack(train_desc_list)
print(descriptors[0].shape)
print(descriptors[1].shape)
print(len(descriptors))

In [None]:
from sklearn.cluster import KMeans
k = 100
kmeans = KMeans(n_clusters=k, random_state=0).fit(descriptors)

# Histogram of Visual Words
Now is the time to count the features of each picture. In this step we will create a histogram for every single picture. Suppose we chose 100 as the number of clusters for the K-Means clustering. After this step every picture should have a vector consisting of 100 elements. Each element in the vector represents one significant feature in the picture. 
<a name="hovw"></a>
<div style="width: 500px; text-align: center;">
    <img src="https://miro.medium.com/max/625/1*QgI1t-7yJApi4vQigFgsLQ.jpeg"/>
    <a href="https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb" style="">Source</a>
</div>

### Constructing histogram
Suppose that we computed 100 clusters from the descriptor list and we have 250 pictures in the training dataset. This function should create a vector for each picture containing 100 elements, hence the shape of the resulting variable will be ``(250, 100)``. Every picture contains an arbitary number of descriptors found in it, thus we need to loop over the pictures separately before computing the histogram.

In [None]:
def compute_feature_histogram(_model, _desc_list, _k):
    # Compute feature histogram
    _bovw_features = np.zeros((len(_desc_list), _k),"float32")
    for i, descr in enumerate(_desc_list):
        labels = _model.predict(descr)
        # compute histogram and store in _bovw_features
        for l in labels:
            _bovw_features[i][l]+=1
    return _bovw_features


In [None]:
bovw_features_train = compute_feature_histogram(kmeans, train_desc_list, k) 
test_desc_list = extract_features(x_test, sift)
bovw_features_test= compute_feature_histogram(kmeans, test_desc_list, k) 

In [None]:
print(bovw_features_train.shape)
print(bovw_features_test.shape)

In [None]:
len(y_train)

# Image Classification
Use the MinDist classifier to predict the category of each image. Remember that in scikit-learn, the classifier is called [Nearest Centroid](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestCentroid.html?highlight=nearest%20centroid#sklearn.neighbors.NearestCentroid) classifier.

<a name="classification"></a>

In [None]:
prediction_labels = [use_classes[i] for i in y_pred]

In [None]:
for image_path, prediction in zip(X_test_paths, prediction_labels):
    image = cv2.imread(image_path)
    cv2.namedWindow("Image", cv2.WINDOW_NORMAL)
    pt = (0, image.shape[0]-10)
    cv2.putText(image, prediction, (0,10), cv2.FONT_HERSHEY_DUPLEX, 0.5, [0, 255, 0], 1)
    cv2.imshow("Image", image)
    cv2.waitKey(1000)

In [None]:
cv2.destroyAllWindows()

## Homework Assignment

Extend your code to include the following:
1. Extend your dataset to use 5 different individually chosen categories of images.
2. Set up a grid search for at least three different Ks for K-Means and minimum two different MinDist metrics. Evaluate the grid with 3-fold cross validation concerning the best accuracy.
3. Try different scaling techniques for your dataset.
4. Plot the confusion matrix for the test dataset using the best setting according to the grid search.
5. Document your findings.

## Moodle Upload
This is an **indivdual** assignment, meaning that you are graded individually. If you have collaborated with colleagues during the lab, make sure to state **all** of their names at the beginning of the document. The final document **must** exhibit individual efforts (structure, variable settings, reasoning, interpretation) despite some inherent similarities. 

Upload your notebook as ``firstname_lastname_ip.html`` to Moodle. 

Make sure to consider the following:
* Have all your import statements in one single cell at the top of the notebook.
* Remove unnecessary code.
* Include a markdown cell at the end where you:
    * give a short overview of what your notebook is about
    * be sure to describe BoVW in your own words: Which steps are necessary? How does it relate to the BoW-concept from NLP? What are "words" and "documents" in this context?
    * describe and interpret your settings and justify your choices
    * analyze the final/best results

**Deadline: 07.02.2023 23:59pm**