Image Classification using Bag of Visual Words (BoVW)
====
- In this notebook, we will classify the images into five categories (aeroplane, backgrounds, car, horse, motorcycle, person) using Bag of Visual Word (BoVW) and Support Vector Machine (SVM).

- We will extract the SIFT descriptors from the images and construct a codebook. After that, we will encode the images to histogram features using codebook, and train the classifier using those features.

-  We will then extract dense SIFT descriptors from the images, reconstruct the codebook, then train the classifier again using the new codebook. Then using this codebook we will also test the performance of spatial pyramid matching by training the classifier using that method

-  Finally, we will use Non-Linear SVM to train the classifier, and test its performance with it.

## Step 0: Set the enviroments
First we set up the enviroments to train this model


###  0-1: Download cyvlfeat library & conda

First we download cyvlfeat library and conda

In [None]:
# install conda on colab
!pip install -q condacolab numpy==1.26.4
import condacolab
condacolab.install()
!conda install -c conda-forge cyvlfeat==0.7.1  -y

✨🍰✨ Everything looks OK!
Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | / done
Solving environment: \ | / - \ done


    current version: 24.11.3
    latest version: 25.3.1

Please update conda by running

    $ conda update -n base -c conda-forge conda



# All requested packages already installed.



###  0-2: Connecting to Google Drive.

It is required for loading the data.


In [None]:
# mount drive https://datascience.stackexchange.com/questions/29480/uploading-images-folder-from-my-system-into-google-colab
import os
from google.colab import drive
drive.mount('/gdrive')

Mounted at /gdrive


### 0-3: Import modules

In [None]:
# Import libraries
import os
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import glob
import cyvlfeat
import time
import scipy
import multiprocessing
from tqdm import tqdm

## Helper functions

In [None]:
def euclidean_dist(x, y):
    """
    :param x: [m, d]
    :param y: [n, d]
    :return:[m, n]
    """
    m, n = x.shape[0], y.shape[0]
    eps = 1e-6

    xx = np.tile(np.power(x, 2).sum(axis=1), (n,1)) #[n, m]
    xx = np.transpose(xx) # [m, n]
    yy = np.tile(np.power(y, 2).sum(axis=1), (m,1)) #[m, n]
    xy = np.matmul(x, np.transpose(y)) # [m, n]
    dist = np.sqrt(xx + yy - 2*xy + eps)

    return dist

def read_img(image_path):
    img = Image.open(image_path).convert('L')
    img = img.resize((240, 240))
    return np.float32(np.array(img)/255.)

def read_txt(file_path):
    with open(file_path, "r") as f:
        data = f.read()
    return data.split()

def dataset_setup(data_dir):
    train_file_list = []
    val_file_list = []

    for class_name in ['aeroplane','horse','motorbike']:
        train_txt_path = os.path.join(data_dir, class_name+'_train.txt')
        train_file_list.append(np.array(read_txt(train_txt_path)))
        val_txt_path = os.path.join(data_dir, class_name+'_val.txt')
        val_file_list.append(np.array(read_txt(val_txt_path)))

    train_file_list = np.unique(np.concatenate(train_file_list))
    val_file_list = np.unique(np.concatenate(val_file_list))

    f = open(os.path.join(data_dir, "train.txt"), 'w')
    non_existing_data = []
    for i in range(train_file_list.shape[0]):
        if os.path.exists(os.path.join(data_dir+'/images', train_file_list[i]+'.jpg')):
            data = "%s\n" % train_file_list[i]
            f.write(data)
        else:
            non_existing_data.append(train_file_list[i])
    f.close()
    print(f"{len(non_existing_data)} images missing: {non_existing_data}/{train_file_list.shape[0]}")

    f = open(os.path.join(data_dir, "val.txt"), 'w')
    non_existing_data = []
    for i in range(val_file_list.shape[0]):
        if os.path.exists(os.path.join(data_dir+'/images', val_file_list[i]+'.jpg')):
            data = "%s\n" % val_file_list[i]
            f.write(data)
        else:
            non_existing_data.append(val_file_list[i])
    f.close()
    print(f"{len(non_existing_data)} images missing: {non_existing_data}/{val_file_list.shape[0]}")

def load_train_data(data_dir):
    dataset_setup(data_dir)
    num_proc = 12 # num_process

    txt_path = os.path.join(data_dir, 'train.txt')
    file_list = read_txt(txt_path)
    image_paths = [os.path.join(data_dir+'/images', file_name+'.jpg') for file_name in file_list]
    with multiprocessing.Pool(num_proc) as pool:
      imgs = pool.map(read_img, image_paths)
      imgs = np.array(imgs)
      idxs = np.array(file_list)

    return imgs, idxs

def load_val_data(data_dir):
    dataset_setup(data_dir)
    num_proc = 12 # num_process

    txt_path = os.path.join(data_dir, 'val.txt')
    file_list = read_txt(txt_path)
    image_paths = [os.path.join(data_dir+'/images', file_name+'.jpg') for file_name in file_list]
    with multiprocessing.Pool(num_proc) as pool:
      imgs = pool.map(read_img, image_paths)
      imgs = np.array(imgs)
      idxs = np.array(file_list)

    return imgs, idxs

def get_labels(idxs, target_idxs):
    """
    Get the labels from file index(name).

    :param idxs(numpy.array): file index(name). shape:[num_images, ]
    :param target_idxs(numpy.array): target index(name). shape:[num_target,]
    :return(numpy.array): Target label(Binary label consisting of True and False). shape:[num_images,]
    """
    return np.isin(idxs, target_idxs)

def load_train_idxs(data_dir):
    txt_path = os.path.join(data_dir, 'train.txt')
    train_idxs = np.array(read_txt(txt_path))
    return train_idxs

def load_val_idxs(data_dir):
    txt_path = os.path.join(data_dir, 'val.txt')
    val_idxs = np.array(read_txt(txt_path))
    return val_idxs

## Step 1: Load the data

In [None]:
'''
Setting the data path for loading images & labels.
'''

%env CS_DATA_DIR= /gdrive/MyDrive/Computer vision

!mkdir -p $CS_DATA_DIR

# MODIFY_THIS
os.chdir(os.environ["CS_DATA_DIR"])
!wget http://www.di.ens.fr/willow/events/cvml2013/materials/practicals/category-level/practical-category-recognition-2013a-data-only.tar.gz
!tar -zxf practical-category-recognition-2013a-data-only.tar.gz

env: CS_DATA_DIR=/gdrive/MyDrive/Computer vision
--2025-04-09 05:17:17--  http://www.di.ens.fr/willow/events/cvml2013/materials/practicals/category-level/practical-category-recognition-2013a-data-only.tar.gz
Resolving www.di.ens.fr (www.di.ens.fr)... 129.199.99.14
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.di.ens.fr/willow/events/cvml2013/materials/practicals/category-level/practical-category-recognition-2013a-data-only.tar.gz [following]
--2025-04-09 05:17:17--  https://www.di.ens.fr/willow/events/cvml2013/materials/practicals/category-level/practical-category-recognition-2013a-data-only.tar.gz
Connecting to www.di.ens.fr (www.di.ens.fr)|129.199.99.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘practical-category-recognition-2013a-data-only.tar.gz’

practical-category-     [                <=> ] 964.15M  4.1

In [None]:
category = ['aeroplane', 'horse', 'motorbike']
data_dir = os.path.join(os.environ["CS_DATA_DIR"], "practical-category-recognition-2013a", "data")

## Step 2: Bag of Visual Words (BoVW) Construction

### 2-1: SIFT descriptor extraction & Saving the descriptors

In [None]:
def SIFT_extraction(imgs):
    """
    Extract Local SIFT descriptors from images using cyvlfeat.sift.sift().
    Refering to to https://github.com/menpo/cyvlfeat

    :param imgs(numpy.array): Gray-scale images in Numpy array format. shape:[num_images, width_size, height_size]
    :return(numpy.array): SIFT descriptors. shape:[num_images, ], **ndarray with object(descripotrs)**
    """
    descriptors = []
    for i in range(len(imgs)):
        frame, descriptor = cyvlfeat.sift.sift(imgs[i], compute_descriptor=True, float_descriptors=True)
        descriptors.append(descriptor)
    return np.array(descriptors, dtype=object)

### 2-2: Codebook(Bag of Visual Words) construction
In this step, we will construct the codebook using K-means clustering.

In [None]:
def get_codebook(des, k):
    """
    Constructing the codebook with visual codewords using k-means clustering.
    In this step, we use cyvlfeat.kmeans.kmeans().
    Refering to to https://github.com/menpo/cyvlfeat

    :param des(numpy.array): Descriptors. shape:[num_images, ]
    :param k(int): Number of visual words.
    :return(numpy.array): Bag of visual words shape:[k, 128]
    """
    descriptors = np.concatenate(des, axis = 0)
    codebook = cyvlfeat.kmeans.kmeans(descriptors, k)
    return codebook

### 2-3: Encoding images to histogram feature based on codewords

In [None]:
def extract_features(des, codebook):
    """
    Constructing the Bag-of-visual-Words histogram features for images using the codebook.

    :param des(numpy.array): Descriptors.  shape:[num_images,]
    :param codebook(numpy.array): Bag of visual words. shape:[k, 128]
    :return(numpy.array): Bag of visual words shape:[num_images, k]
    """
    histogram = np.zeros(shape=(len(des), len(codebook)))
    for i in range(len(des)):
        descriptors = des[i]
        distances = euclidean_dist(codebook, descriptors)
        index = np.argmin(distances, axis = 0)
        for j in index:
          histogram[i][j] += 1
    return histogram

## Step 3: Training the classifiers
Training a classifier using the sklearn library (SVC)

In [None]:
from sklearn.svm import SVC

In [None]:
def train_classifier(features, labels, svm_params):
    """
    Training the SVM classifier using sklearn.svm.svc()
    Refering to https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

    :param features(numpy.array): Historgram representation. shape:[num_images, dim_feature]
    :param labels(numpy.array): Target label(binary). shape:[num_images,]
    :param svm_params(dict): parameters for classifier training.
        ['C'](float): Regularization parameter.
        ['kernel'](str): Specifies the kernel type to be used in the algorithm.
    :return(sklearn.svm.SVC): Trained classifier
    """
    Classifier = SVC(**svm_params)
    Classifier.fit(features, labels)
    return Classifier

In [None]:
def Trainer(feat_params, svm_params):
    """
    Training the SVM classifier.

    :param feat_params(dict): parameters for feature extraction.
        ['extractor'](function pointer): function for extrat local descriptoers. (e.g. SIFT_extraction, DenseSIFT_extraction, etc)
        ['num_codewords'](int):
        ['result_dir'](str): Diretory to save codebooks & results.

    :param svm_params(dict): parameters for classifier training.
        ['C'](float): Regularization parameter.
        ['kernel'](str): Specifies the kernel type to be used in the algorithm.

    :return(sklearn.svm.SVC): trained classifier
    """

    extractor = feat_params['extractor']
    k = feat_params['num_codewords']
    result_dir = feat_params['result_dir']

    if not os.path.isdir(result_dir):
        os.mkdir(result_dir)

    print("Load the training data...")
    start_time = time.time()
    train_imgs, train_idxs = load_train_data(data_dir)
    print("{:.4f} seconds".format(time.time()-start_time))

    try:
      train_des = np.load(os.path.join(result_dir, 'train_des.npy'),
                          allow_pickle=True)
      print("Successfully loaded the local descriptors")
    except:
      print("Extract the local descriptors...")
      start_time = time.time()
      train_des = extractor(train_imgs)
      np.save(os.path.join(result_dir, 'train_des.npy'), train_des)
      print("{:.4f} seconds".format(time.time()-start_time))

    if train_des.dtype not in [np.float32, np.float64]:
      try:
        train_des = train_des.astype(np.float32)
      except:
        pass

    del train_imgs

    try:
      codebook = np.load(os.path.join(result_dir, 'codebook.npy'),
                          allow_pickle=True)
      print("Successfully loaded the bag of visual words")
    except:
      print("Construct the bag of visual words...")
      start_time = time.time()
      codebook = get_codebook(train_des, k)
      np.save(os.path.join(result_dir, 'codebook.npy'), codebook)
      print("{:.4f} seconds".format(time.time()-start_time))

    print("Extract the image features...")
    start_time = time.time()
    train_features = extract_features(train_des, codebook)
    np.save(os.path.join(result_dir, 'train_features.npy'), train_features)
    print("{:.4f} seconds".format(time.time()-start_time))

    del train_des, codebook

    print('Train the classifiers...')
    accuracy = 0
    models = {}

    for class_name in tqdm(category):
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_train.txt'.format(class_name)))])
        target_labels = get_labels(train_idxs, target_idxs)

        models[class_name] = train_classifier(train_features, target_labels, svm_params)
        train_accuracy = models[class_name].score(train_features, target_labels)
        print('{} Classifier train accuracy:  {:.4f}'.format(class_name, train_accuracy))
        accuracy += train_accuracy

    print('Average train accuracy: {:.4f}'.format(accuracy/len(category)))
    del train_features, target_labels, target_idxs

    return models

In [None]:
feat_params = {'extractor': SIFT_extraction, 'num_codewords':1024, 'result_dir':os.path.join(data_dir,'sift_1024')}
svm_params = {'C': 1, 'kernel': 'linear'}

- Below code will take about 2~10 minutes.

In [None]:
models = Trainer(feat_params, svm_params)

Load the training data...
0 images missing: []/371
0 images missing: []/399
2.9784 seconds
Successfully loaded the local descriptors
Successfully loaded the bag of visual words
Extract the image features...
2.3665 seconds
Train the classifiers...


 33%|███▎      | 1/3 [00:00<00:00,  8.44it/s]

aeroplane Classifier train accuracy:  1.0000


100%|██████████| 3/3 [00:00<00:00,  9.63it/s]

horse Classifier train accuracy:  1.0000
motorbike Classifier train accuracy:  1.0000
Average train accuracy: 1.0000





## Step 4: Testing the classifier on validation set



In [None]:
def Test(feat_params, models):
    """
    Test the SVM classifier.

    :param feat_params(dict): parameters for feature extraction.
        ['extractor'](function pointer): function for extrat local descriptoers. (e.g. SIFT_extraction, DenseSIFT_extraction, etc)
        ['num_codewords'](int):
        ['result_dir'](str): Diretory to load codebooks & save results.

    :param models(dict): dict of classifiers(sklearn.svm.SVC)
    """

    extractor = feat_params['extractor']
    k = feat_params['num_codewords']
    result_dir = feat_params['result_dir']

    print("Load the validation data...")
    start_time = time.time()
    val_imgs, val_idxs = load_val_data(data_dir)
    print("{:.4f} seconds".format(time.time()-start_time))

    try:
      val_des = np.load(os.path.join(result_dir, 'val_des.npy'),
                        allow_pickle=True)
    except:
      print("Extract the local descriptors...")
      start_time = time.time()
      val_des = extractor(val_imgs)
      np.save(os.path.join(result_dir, 'val_des.npy'), val_des)
      print("{:.4f} seconds".format(time.time()-start_time))

    if val_des.dtype not in [np.float32, np.float64]:
      try:
        val_des = val_des.astype(np.float32)
      except:
        pass

    del val_imgs
    codebook = np.load(os.path.join(result_dir, 'codebook.npy'),
                       allow_pickle=True)

    print("Extract the image features...")
    start_time = time.time()

    val_features = extract_features(val_des, codebook)
    np.save(os.path.join(result_dir, 'val_features.npy'), val_features)
    print("{:.4f} seconds".format(time.time()-start_time))

    del val_des, codebook

    print('Test the classifiers...')
    accuracy = 0
    for class_name in tqdm(category):
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_val.txt'.format(class_name)))])
        target_labels = get_labels(val_idxs, target_idxs)

        val_accuracy = models[class_name].score(val_features, target_labels)
        print('{} Classifier validation accuracy:  {:.4f}'.format(class_name, val_accuracy))
        accuracy += val_accuracy

    del val_features, target_idxs, target_labels

    print('Average validation accuracy: {:.4f}'.format(accuracy/len(category)))

In [None]:
Test(feat_params, models)

Load the validation data...
0 images missing: []/371
0 images missing: []/399
2.9411 seconds
Extract the image features...
2.0054 seconds
Test the classifiers...


 67%|██████▋   | 2/3 [00:00<00:00, 13.56it/s]

aeroplane Classifier validation accuracy:  0.7970
horse Classifier validation accuracy:  0.6491


100%|██████████| 3/3 [00:00<00:00, 14.95it/s]

motorbike Classifier validation accuracy:  0.7168
Average validation accuracy: 0.7210





## Step 5: Implementing Dense SIFT
Modifying the feature extractor using the dense SIFT and evaluate the performance.

In [None]:
def DenseSIFT_extraction(imgs):
    """
    Extracting Dense SIFT descriptors from images using cyvlfeat.sift.dsift().
    Refering to https://github.com/menpo/cyvlfeat

    :param train_imgs(numpy.array): Gray-scale images in Numpy array format. shape:[num_images, width_size, height_size]
    :return(numpy.array): Dense SIFT descriptors. shape:[num_images, num_des_of_each_img, 128]
    """
    descriptors = []
    for i in range(len(imgs)):
        frame, descriptor = cyvlfeat.sift.dsift(imgs[i], step = 12, float_descriptors=True)
        descriptors.append(descriptor)
    return np.array(descriptors, dtype=object)

In [None]:
feat_params = {'extractor': DenseSIFT_extraction, 'num_codewords':1024, 'result_dir':os.path.join(data_dir,'dsift_1024')}
svm_params = {'C': 1, 'kernel': 'linear'}

In [None]:
models = Trainer(feat_params, svm_params)

Load the training data...
0 images missing: []/371
0 images missing: []/399
2.4614 seconds
Successfully loaded the local descriptors
Successfully loaded the bag of visual words
Extract the image features...
2.0652 seconds
Train the classifiers...


 67%|██████▋   | 2/3 [00:00<00:00, 12.59it/s]

aeroplane Classifier train accuracy:  1.0000
horse Classifier train accuracy:  1.0000


100%|██████████| 3/3 [00:00<00:00, 11.99it/s]

motorbike Classifier train accuracy:  1.0000
Average train accuracy: 1.0000





In [None]:
Test(feat_params, models)

Load the validation data...
0 images missing: []/371
0 images missing: []/399
3.7039 seconds
Extract the image features...
2.6293 seconds
Test the classifiers...


100%|██████████| 3/3 [00:00<00:00, 24.49it/s]

aeroplane Classifier validation accuracy:  0.8596
horse Classifier validation accuracy:  0.7343
motorbike Classifier validation accuracy:  0.7393
Average validation accuracy: 0.7778





## Step 6: Implementing the Spatial Pyramid
Modifying the feature extractor using the spatial pyramid matching and evaluate the performance.


In [None]:
def SpatialPyramid(des, codebook):
    """
    Extracting image representation with Spatial Pyramid Matching using the DenseSIFT descriptors & codebook.

    :param des: numpy.array, DenseSIFT Descriptors.  Shape: [num_images, num_des_of_each_img, 128]
    :param codebook: numpy.array, Bag of visual words. Shape: [k, 128]
    :return: numpy.array, Image feature using Spatial Pyramid Matching. Shape: [num_images, features_dim]
    """
    features_dim = 0
    dimension = 1
    while dimension != 16:
        features_dim += dimension ** 2
        dimension = dimension * 2

    histogram = np.zeros((len(des), len(codebook) * features_dim))

    for i in range(len(des)):
        descriptors = np.array(des[i], dtype=float)
        num_des, dim = descriptors.shape
        Normalizer = 1
        level = 0
        while Normalizer != 16:
            for j in range(Normalizer):
                for k in range(Normalizer):
                    cell_descriptors = descriptors[j*(descriptors.shape[0]//Normalizer): (j + 1) * (descriptors.shape[0]//Normalizer), k * (descriptors.shape[1]//Normalizer) : (k + 1) * (descriptors.shape[1]//Normalizer)]

                    distances = euclidean_dist(cell_descriptors, codebook[:, k *  (descriptors.shape[1]//Normalizer): (k + 1) * (descriptors.shape[1]//Normalizer)])
                    indices = np.argmin(distances, axis=1)
                    cell_hist = np.bincount(indices, minlength=len(codebook)) / Normalizer

                    offset = 0
                    for l in range(level):
                        offset += (2 ** l) ** 2 * len(codebook)
                    cell_index = j * Normalizer + k
                    pos = offset + cell_index * len(codebook)
                    histogram[i, pos : pos + len(codebook)] += cell_hist
            Normalizer = Normalizer * 2
            level += 1
    return histogram


In [None]:
def SP_Trainer(feat_params, svm_params):
    """
    Train the SVM classifier.

    :param feat_params(dict): parameters for feature extraction.
        ['extractor'](function pointer): function for extrat local descriptoers. (e.g. SIFT_extraction, DenseSIFT_extraction, etc)
        ['num_codewords'](int):
        ['result_dir'](str): Diretory to save codebooks & results.

    :param svm_params(dict): parameters for classifier training.
        ['C'](float): Regularization parameter.
        ['kernel'](str): Specifies the kernel type to be used in the algorithm.

    :return(sklearn.svm.SVC): trained classifier
    """

    extractor = feat_params['extractor']
    k = feat_params['num_codewords']
    result_dir = feat_params['result_dir']

    if not os.path.isdir(result_dir):
        os.mkdir(result_dir)

    print("Load the training data...")
    start_time = time.time()
    train_imgs, train_idxs = load_train_data(data_dir)
    print("{:.4f} seconds".format(time.time()-start_time))

    print("Extract the local descriptors...")
    start_time = time.time()
    #train_des = extractor(train_imgs)
    #np.save(os.path.join(result_dir, 'train_des.npy'), train_des)
    train_des = np.load(os.path.join(result_dir, 'train_des.npy'), allow_pickle=True)
    print("{:.4f} seconds".format(time.time()-start_time))

    del train_imgs

    if train_des.dtype not in [np.float32, np.float64]:
      try:
        train_des = train_des.astype(np.float32)
      except:
        pass

    print("Construct the bag of visual words...")
    start_time = time.time()
    codebook = np.load(os.path.join(result_dir, 'codebook.npy'),
                       allow_pickle=True)
    print("{:.4f} seconds".format(time.time()-start_time))

    print("Extract the image features...")
    start_time = time.time()
    train_features = SpatialPyramid(train_des, codebook)
    np.save(os.path.join(result_dir, 'train_features.npy'), train_features)
    print("{:.4f} seconds".format(time.time()-start_time))

    del train_des, codebook

    print('Train the classifiers...')
    accuracy = 0
    models = {}

    for class_name in tqdm(category):
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_train.txt'.format(class_name)))])
        target_labels = get_labels(train_idxs, target_idxs)

        models[class_name] = train_classifier(train_features, target_labels, svm_params)
        train_accuracy = models[class_name].score(train_features, target_labels)
        print('{} Classifier train accuracy:  {:.4f}'.format(class_name, train_accuracy))
        accuracy += train_accuracy

    print('Average train accuracy: {:.4f}'.format(accuracy/len(category)))
    del train_features, target_labels, target_idxs

    return models


In [None]:
def SP_Test(feat_params, models):
    """
    Test the SVM classifier.

    :param feat_params(dict): parameters for feature extraction.
        ['extractor'](function pointer): function for extrat local descriptoers. (e.g. SIFT_extraction, DenseSIFT_extraction, etc)
        ['num_codewords'](int):
        ['result_dir'](str): Diretory to load codebooks & save results.

    :param models(dict): dict of classifiers(sklearn.svm.SVC)
    """

    extractor = feat_params['extractor']
    k = feat_params['num_codewords']
    result_dir = feat_params['result_dir']

    print("Load the validation data...")
    start_time = time.time()
    val_imgs, val_idxs = load_val_data(data_dir)
    print("{:.4f} seconds".format(time.time()-start_time))

    print("Extract the local descriptors...")
    start_time = time.time()
    val_des = extractor(val_imgs)
    np.save(os.path.join(result_dir, 'val_des.npy'), val_des)
    print("{:.4f} seconds".format(time.time()-start_time))

    if val_des.dtype not in [np.float32, np.float64]:
      try:
        val_des = val_des.astype(np.float32)
      except:
        pass

    del val_imgs
    codebook = np.load(os.path.join(result_dir, 'codebook.npy'),
                       allow_pickle=True)

    print("Extract the image features...")
    start_time = time.time()
    val_features = SpatialPyramid(val_des, codebook)
    np.save(os.path.join(result_dir, 'val_features.npy'), val_features)
    print("{:.4f} seconds".format(time.time()-start_time))

    del val_des, codebook

    print('Test the classifiers...')
    accuracy = 0
    for class_name in tqdm(category):
        target_idxs = np.array([read_txt(os.path.join(data_dir, '{}_val.txt'.format(class_name)))])
        target_labels = get_labels(val_idxs, target_idxs)

        val_accuracy = models[class_name].score(val_features, target_labels)
        print('{} Classifier validation accuracy:  {:.4f}'.format(class_name, val_accuracy))
        accuracy += val_accuracy

    del val_features, target_idxs, target_labels

    print('Average validation accuracy: {:.4f}'.format(accuracy/len(category)))

In [None]:
feat_params = {'extractor': DenseSIFT_extraction, 'num_codewords':1024, 'result_dir':os.path.join(data_dir,'dsift_1024')}
svm_params = {'C': 1, 'kernel': 'linear'}
models = SP_Trainer(feat_params, svm_params)
SP_Test(feat_params, models)

Load the training data...
0 images missing: []/371
0 images missing: []/399
2.3087 seconds
Extract the local descriptors...
0.2613 seconds
Construct the bag of visual words...
0.0029 seconds
Extract the image features...
50.0738 seconds
Train the classifiers...


 33%|███▎      | 1/3 [00:31<01:03, 31.88s/it]

aeroplane Classifier train accuracy:  1.0000


 67%|██████▋   | 2/3 [00:57<00:28, 28.23s/it]

horse Classifier train accuracy:  1.0000


100%|██████████| 3/3 [01:24<00:00, 28.06s/it]

motorbike Classifier train accuracy:  1.0000
Average train accuracy: 1.0000
Load the validation data...
0 images missing: []/371
0 images missing: []/399





2.4698 seconds
Extract the local descriptors...
23.0163 seconds
Extract the image features...
61.5619 seconds
Test the classifiers...


 33%|███▎      | 1/3 [00:06<00:12,  6.13s/it]

aeroplane Classifier validation accuracy:  0.8822


 67%|██████▋   | 2/3 [00:33<00:18, 18.67s/it]

horse Classifier validation accuracy:  0.8170


100%|██████████| 3/3 [00:46<00:00, 15.48s/it]

motorbike Classifier validation accuracy:  0.8020
Average validation accuracy: 0.8338





## Step 7: Classification using non-linear SVM
Modifying the classifier using the non-linear SVM and evaluate the performance.


In [None]:
feat_params = {'extractor': DenseSIFT_extraction, 'num_codewords':1024, 'result_dir':os.path.join(data_dir,'dsift_1024')}
svm_params = {'C': 1, 'kernel': 'sigmoid'}

models = Trainer(feat_params, svm_params)
Test(feat_params, models)

Load the training data...
0 images missing: []/371
0 images missing: []/399
2.4460 seconds
Successfully loaded the local descriptors
Successfully loaded the bag of visual words
Extract the image features...
2.2834 seconds
Train the classifiers...


 67%|██████▋   | 2/3 [00:00<00:00, 11.09it/s]

aeroplane Classifier train accuracy:  0.9084
horse Classifier train accuracy:  0.8868


100%|██████████| 3/3 [00:00<00:00, 10.93it/s]

motorbike Classifier train accuracy:  0.8571
Average train accuracy: 0.8841
Load the validation data...





0 images missing: []/371
0 images missing: []/399
4.0932 seconds
Extract the image features...
2.3811 seconds
Test the classifiers...


100%|██████████| 3/3 [00:00<00:00, 23.10it/s]

aeroplane Classifier validation accuracy:  0.8471
horse Classifier validation accuracy:  0.8045
motorbike Classifier validation accuracy:  0.7494
Average validation accuracy: 0.8003



