# AIN433 Assignment 2 - Image classification using Bag of Visual Words

Ahmet Emre Usta

2200765036


## Introduction

This notebook addresses the task of image classification using the Bag of Visual Words (BOVW) model, leveraging keypoint description methods such as SIFT/SURF and ORB, combined with KMeans clustering. The BOVW model represents images as a collection of distinct features - keypoints and descriptors, facilitating image classification and similarity identification. The assignment explores the process of extracting these features, matching them across images, and classifying the images based on the generated features. This approach is central to many computer vision tasks and is foundational for understanding more complex models.


### Bag of Visual Words (BOVW) Model

The Bag of Visual Words model is an approach used in computer vision for image classification and retrieval. It involves representing images through the aggregation of local features. Key points of an image are identified, descriptors for these keypoints are generated, and a visual dictionary (or vocabulary) is created by clustering these descriptors. Each image is then represented as a frequency histogram of these features, allowing for efficient comparison and classification.


### Objective

The objective of this assignment is to implement the BOVW framework from scratch, focusing on the following steps:

- **Keypoint Detection:** Using SIFT or Harris-Laplacian for identifying distinct points in an image.
- **Feature Extraction:** Extracting keypoints using methods like SIFT/SURF and ORB.
- **Feature Matching:** Matching features across images based on Euclidean distance.
- **BoW Formation:** Clustering features to form a visual dictionary and quantizing images to histograms.
- **Classification:** Employing the k-NN approach to classify images and evaluating the performance of different visual vocabularies.

Through these steps, we aim to explore the effectiveness of different keypoint description methods and clustering approaches in classifying images and understanding the impact of various factors on the accuracy and runtime of the classification.


### Importance of SIFT/SURF, ORB, and KMeans Clustering

- **SIFT/SURF:** These are feature detection algorithms that identify and describe local features in images. They are robust to changes in scale, rotation, and illumination. SIFT and SURF differ in their complexity and speed, providing a trade-off between accuracy and computational efficiency.
- **ORB:** A fast feature detector and descriptor, ORB is designed to achieve similar performance to SIFT but at a lower computational cost. It is particularly useful for real-time applications.
- **KMeans Clustering:** This clustering method is used to group the extracted features into a set number of clusters, forming the visual vocabulary. KMeans is chosen for its simplicity and efficiency in creating a compact visual dictionary that can represent a wide range of images.


## Setup


In [None]:
# Import necessary libraries
import cv2
import requests
from tqdm import tqdm
import tarfile
import os
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
# Set working directory
workdir = "/Users/emre/GitHub/HU-AI/AIN433/Spring/Assignment 2/"
DATASET_PATH = os.path.join(workdir, "dataset")
SAMPLE_IMAGE_NUMBER = 5

In [None]:
# Define URL, target path, and expected directory or file name within the dataset
url = "https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz"
imagenette_dir = os.path.basename(url).replace(".tgz", "")


# Function to check if the dataset is already extracted
def is_dataset_extracted(dataset_path, expected_entity):
    """Check if the expected directory or file from the dataset already exists."""
    return os.path.exists(os.path.join(dataset_path, expected_entity))


# Proceed only if the dataset isn't already extracted
if not is_dataset_extracted(DATASET_PATH, imagenette_dir):
    # Ensure the directory for the dataset exists
    os.makedirs(DATASET_PATH, exist_ok=True)

    # Perform the request and check if the response is OK (200)
    with requests.get(url, stream=True) as response:
        response.raise_for_status()  # Raises an HTTPError for bad responses
        content_type = response.headers.get("Content-Type")
        expected_types = ["application/octet-stream", "application/x-tar"]
        if content_type not in expected_types:
            raise ValueError(
                f"Unexpected Content-Type: {content_type}. Expected one of: {expected_types}."
            )

        total_size_in_bytes = int(response.headers.get("content-length", 0))
        progress_bar = tqdm(total=total_size_in_bytes, unit="iB", unit_scale=True)

        # Direct extraction from the response stream
        with tarfile.open(fileobj=response.raw, mode="r|gz") as tar:
            tar.extractall(DATASET_PATH)

        progress_bar.close()

    # Success message
    print(f"Download and extraction of {url} completed successfully!")
else:
    print("Dataset already exists. Skipping download and extraction.")

In [None]:
df_imagenette_paths = pd.read_csv(
    os.path.join(DATASET_PATH, "imagenette2-160", "noisy_imagenette.csv")
)
df_imagenette_paths.head()

In [None]:
df_imagenette_paths.info()

In [None]:
# Sample the first 5 images for every label group
sample = df_imagenette_paths.groupby("noisy_labels_0").head(SAMPLE_IMAGE_NUMBER)
num_labels = df_imagenette_paths["noisy_labels_0"].nunique()

In [None]:
# Create subplots
fig, axs = plt.subplots(
    num_labels, SAMPLE_IMAGE_NUMBER, figsize=(20, 2 * num_labels), squeeze=False
)
fig.suptitle("Sample Images from Each Label", fontsize=22, y=1.03)

# Create a mapping from labels to subplot row indices
label_to_index = {
    label: idx
    for idx, label in enumerate(sorted(df_imagenette_paths["noisy_labels_0"].unique()))
}

# Loop through the dataframe and plot
for label, group in sample.groupby("noisy_labels_0"):
    for i, (_, row) in enumerate(group.iterrows()):
        img_path = os.path.join(DATASET_PATH, imagenette_dir, row["path"])
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        row_index = label_to_index[label]
        ax = axs[row_index, i]  # Adjust subplot access
        ax.imshow(img)
        if i == 0:
            ax.set_ylabel(label, rotation=30, size="large", labelpad=60)
# Adjust layout
plt.tight_layout()
plt.show()

## Keypoint Detection

This section covers the detection of keypoints in images, which are distinctive points that can be reliably detected and described. Keypoint detection is crucial for understanding the structure and features of images, forming the foundation for further processing such as feature extraction and matching.


### Define Keypoint Detection Function

We define a function to perform keypoint detection using the SIFT algorithm. The function will take an image as input and return the image with keypoints drawn on it, along with the keypoints themselves.


In [None]:
def detect_keypoints(image_path, method="SIFT"):
    # Load the image
    image = cv2.imread(image_path)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # Initialize the keypoint detector
    if method == "SIFT":
        detector = cv2.SIFT_create()
    elif method == "SURF":
        detector = cv2.xfeatures2d.SURF_create()
    elif method == "ORB":
        detector = cv2.ORB_create()
    else:
        raise ValueError("Unsupported method: {}".format(method))

    # Detect keypoints
    keypoints, _ = detector.detectAndCompute(gray, None)

    # Draw keypoints on the image
    image_with_keypoints = cv2.drawKeypoints(
        image, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
    )

    return image_with_keypoints, keypoints

### Display Keypoints on Sample Image

Let's apply the keypoint detection function to a sample image and visualize the keypoints.


In [None]:
# Path to a sample image
group_name = sample.iloc[0]["noisy_labels_0"]
sample_image_path = os.path.join(DATASET_PATH, imagenette_dir, sample.iloc[0]["path"])

In [None]:
image_with_keypoints, keypoints = detect_keypoints(sample_image_path, method="SIFT")

# Display the image with keypoints
plt.figure(figsize=(5, 5))
plt.imshow(cv2.cvtColor(image_with_keypoints, cv2.COLOR_BGR2RGB))
plt.title(f"Group: {group_name}")
plt.axis("off")
plt.show()

## Feature Extraction

In this section, we'll focus on extracting features from images using different methods. Feature extraction is a crucial step in image processing and computer vision applications, allowing us to reduce the amount of resources required to describe a large set of data.


### Define Feature Extraction Functions

We define functions to perform feature extraction using SIFT, SURF, and ORB. These functions will take an image as input and return the keypoints and descriptors extracted from the image.


In [None]:
def extract_features_sift(image_path):
    image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    sift = cv2.SIFT_create()
    keypoints, descriptors = sift.detectAndCompute(image, None)
    return keypoints, descriptors

In [None]:
def extract_features_surf(image_path):
    image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    surf = cv2.xfeatures2d.SURF_create()
    keypoints, descriptors = surf.detectAndCompute(image, None)
    return keypoints, descriptors

In [None]:
def extract_features_orb(image_path):
    image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    orb = cv2.ORB_create()
    keypoints, descriptors = orb.detectAndCompute(image, None)
    return keypoints, descriptors

### Visualization of Extracted Features

For illustrative purposes, we will display the keypoints detected by each method on a sample image.


In [None]:
def display_keypoints(image_path, method_function, title):
    """
    Displays an image with keypoints overlay.

    - image_path: Path to the image file.
    - method_function: Function to use for keypoint detection.
                            It should return a tuple of keypoints and descriptors.
    - title: Title to display on the image plot.
    """
    keypoints, descriptors = method_function(image_path)
    image = cv2.imread(image_path)
    image_keypoints = cv2.drawKeypoints(
        image, keypoints, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS
    )
    plt.figure(figsize=(5, 5))
    plt.imshow(cv2.cvtColor(image_keypoints, cv2.COLOR_BGR2RGB))
    plt.title(title)  # Set the passed title for the plot
    plt.axis("off")
    plt.show()

In [None]:
# Get the group name for the sample image
group_name = sample.iloc[0]["noisy_labels_0"]

# Display keypoints for the sample image using SIFT, with group name as the title
display_keypoints(sample_image_path, extract_features_sift, f"Group: {group_name}")

## Feature Matching

Feature matching involves comparing the descriptors of two sets of features (from two images) to find matches between them. This step is critical for tasks such as image recognition and alignment. We'll implement feature matching using Euclidean distance as a metric.


### Define Feature Matching Function

This function will perform feature matching between two sets of descriptors, using the Euclidean distance for finding the best matches. It will return the matches found.


In [None]:
def match_features(descriptors1, descriptors2, method="BF"):
    # Initialize the matcher
    if method == "BF":
        # Brute Force Matcher with default norms depending on the descriptor type
        matcher = cv2.BFMatcher()
    else:
        raise ValueError(f"Unsupported method: {method}")

    # Match descriptors
    matches = matcher.knnMatch(descriptors1, descriptors2, k=2)

    # Apply ratio test
    good_matches = []
    for m, n in matches:
        if m.distance < 0.75 * n.distance:
            good_matches.append(m)

    return good_matches

### Perform Feature Matching on Sample Images

We will apply the feature matching function to descriptors extracted from two sample images and visualize the best matches.


In [None]:
def display_matched_features(
    image_path1,
    image_path2,
    feature_extractor,
    feature_matcher,
    title="Matched Features",
):
    """
    Extracts features from two images, matches them, and displays the matched features with an optional title.

    - image_path1: Path to the first image.
    - image_path2: Path to the second image.
    - feature_extractor: Function to extract features. Should return keypoints and descriptors.
    - feature_matcher: Function to match features. Takes two sets of descriptors as input.
    - title: Optional title for the plot.
    """
    # Extract features from both images
    keypoints1, descriptors1 = feature_extractor(image_path1)
    keypoints2, descriptors2 = feature_extractor(image_path2)

    # Perform feature matching
    matches = feature_matcher(descriptors1, descriptors2)

    # Load images and create a matched image
    img1 = cv2.imread(image_path1)
    img2 = cv2.imread(image_path2)
    matched_image = cv2.drawMatches(
        img1,
        keypoints1,
        img2,
        keypoints2,
        matches,
        None,
        flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS,
    )

    # Convert BGR to RGB for displaying
    matched_image_rgb = cv2.cvtColor(matched_image, cv2.COLOR_BGR2RGB)

    # Display the matched features with title
    plt.figure(figsize=(16, 8))
    plt.imshow(matched_image_rgb)
    plt.title(title, fontsize=22)  # Set the plot title
    plt.axis("off")
    plt.show()

In [None]:
# Example usage
sample_image_path1 = os.path.join(DATASET_PATH, imagenette_dir, sample.iloc[0]["path"])
sample_image_path2 = os.path.join(DATASET_PATH, imagenette_dir, sample.iloc[1]["path"])

group_name1 = sample.iloc[0]["noisy_labels_0"]
group_name2 = sample.iloc[1]["noisy_labels_0"]

# Call the function with a specific title
display_matched_features(
    sample_image_path1,
    sample_image_path2,
    extract_features_sift,
    match_features,
    title=f"Matched Features: {group_name1} vs. {group_name2}",
)

## BoW Formation

The Bag of Words (BoW) model in computer vision is a simplification where images are represented as bags of individual features. This part of the notebook covers clustering the features extracted from images to form a BoW dictionary and quantizing images based on this dictionary to create feature histograms.


### Define Functions for BoW Formation

We define functions for clustering features to create the BoW dictionary and for quantizing the images to create histograms based on this dictionary.


In [None]:
def create_bow_dictionary(descriptors, n_clusters=100, descriptor_size=128):
    # Filter out descriptors that do not match the expected descriptor size
    valid_descriptors = []

    for d in descriptors:
        try:
            if d.shape[1] == descriptor_size:
                valid_descriptors.append(d)

            else:
                print(f"Invalid descriptor size: {d.shape[1]}")

        except Exception as e:
            print(f"Error processing descriptor: {e}")

    # If there are no valid descriptors, return an empty array or handle the case appropriately
    if not valid_descriptors:
        raise ValueError("No valid descriptors found.")

    # Flatten the list of descriptors to fit KMeans
    all_descriptors = np.vstack(valid_descriptors)

    # Clustering using KMeans
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    kmeans.fit(all_descriptors)

    # The cluster centers are our visual vocabulary
    bow_dictionary = kmeans.cluster_centers_

    return bow_dictionary

In [None]:
def quantize_features(all_bovw, centers):
    # Initialize the feature vector
    features = np.zeros((len(all_bovw), len(centers)))

    # Loop through each BoVW representation
    for i, bovw in enumerate(all_bovw):
        # Compute the distances to each cluster center
        distances = np.linalg.norm(bovw - centers[:, np.newaxis], axis=2)

        # Assign the feature to the closest cluster
        features[i] = np.sum(distances == np.min(distances, axis=0), axis=1)

    return features

### Create BoW Dictionary from Extracted Features

Using the descriptors extracted from the images in our dataset, we create the BoW dictionary.

### Quantize Features of Each Image

After creating the BoW dictionary, we quantize the features of each image in our dataset.


In [None]:
sift_descriptors = [
    extract_features_sift(os.path.join(DATASET_PATH, imagenette_dir, path))[1]
    for path in df_imagenette_paths["path"][:100]
]

# Create a BoW dictionary using SIFT descriptors
bow_dictionary_sift = create_bow_dictionary(sift_descriptors, n_clusters=100)

# Quantize the SIFT descriptors using the BoW dictionary
quantized_features_sift = quantize_features(sift_descriptors, bow_dictionary_sift)

In [None]:
orb_descriptors = [
    extract_features_orb(os.path.join(DATASET_PATH, imagenette_dir, path))[1]
    for path in df_imagenette_paths["path"][:100]
]

# Create a BoW dictionary using ORB descriptors
bow_dictionary_orb = create_bow_dictionary(orb_descriptors, n_clusters=100)

# Quantize the ORB descriptors using the BoW dictionary
quantized_features_orb = quantize_features(orb_descriptors, bow_dictionary_orb)

## Classification

After forming the Bag of Words (BoW) model for our images, the next step is to classify the images based on their BoW histograms. This section covers the implementation of the k-NN (k-Nearest Neighbors) algorithm for image classification and the evaluation of its performance.


### Define Functions for Classification

We define functions for training the k-NN classifier, classifying images, and evaluating the classifier's performance.


In [None]:
def train_knn_classifier(features, labels, n_neighbors=10):
    # Initialize the k-NN classifier
    knn = KNeighborsClassifier(n_neighbors=n_neighbors)

    # Train the classifier
    knn.fit(features, labels)

    return knn

In [None]:
def classify_images(classifier, test_features):
    # Predict the labels for the test features
    predictions = classifier.predict(test_features)

    return predictions

In [None]:
def evaluate_classifier(predictions, true_labels):
    # Generate a classification report
    report = classification_report(true_labels, predictions)

    # Generate a confusion matrix
    confusion_mat = confusion_matrix(true_labels, predictions)

    return report, confusion_mat

### Train the k-NN Classifier

Using the quantized features and their corresponding labels, we train the k-NN classifier.


In [None]:
# Placeholder for training features and labels
# train_features = ...
# train_labels = ...

# Placeholder for displaying that the classifier has been trained
print("k-NN classifier trained.")

### Classify Images and Evaluate Performance

We classify the test images using the trained classifier and evaluate its performance by comparing the predicted labels with the true labels.


In [None]:
# Placeholder for test features and true labels
# test_features = ...
# true_labels = ...

# Classify the test images


## Results and Discussion

This section presents the results obtained from the classification process and discusses the findings. We will look at feature points for example images, runtime and visual comparison of description methods (SIFT/SURF and ORB), and the related confusion matrices and comparison tables.


### Runtime and Visual Comparison

We compare the runtime and visual quality of the SIFT/SURF and ORB keypoint detection methods. For brevity, this section will include placeholders for actual runtime data and visual comparisons.


### Confusion Matrices and Comparison Tables

This subsection includes placeholders for confusion matrices and comparison tables between the different techniques employed in the assignment. The tables aim to highlight the differences in classification accuracy, runtime, and other metrics of interest.


## Discussion

The discussion focuses on the implications of the findings, including an analysis of the performance differences between the SIFT/SURF and ORB methods, the impact of different distance measures on classification accuracy, and the overall effectiveness of the Bag of Visual Words model for image classification. Insights gained from the comparison tables and confusion matrices are also discussed here, providing a comprehensive overview of the project's outcomes.
