# Baseline
# Traditional Features with SVMs

## Introduction

In this notebook, we extract traditional features, namely HOG from RGB images and LBP for infrared images and depth images using [`skimage`](https://scikit-image.org/) package. Then we will be using support vector machines (SVMs) and one-vs-one scheme implemented in [`sklearn`](https://scikit-learn.org/) to build a multiclass classifier.

For the datasets and classification goals, please refer to the "Topic Three" in README from [gitee repository](https://gitee.com/guqingxiang/Pattern_recognition_dataset_download/blob/main/README.md) or [github repository](https://github.com/qingxiangjia/Pattern_recognition_dataset_download/blob/main/README.md).

In [1]:
# used for data loading
from utils.data_loader import DataLoader

# used for feature extraction
import features.extractors as extractors

# used for model building
from models.svm import SVMModel

# used for late fusion
from models.late_fusion import LateFusion

# used for model pipeline
from utils.pipeline import TrainingPipeline

# other imports
import numpy as np

## 1 Load the data and Extract Features

We will use `DataLoader` in [`utils/data_loader.py`](utils/data_loader.py) to load data and `FeatureExtractor` in [`features/extractors.py`](features/extractors.py) to extract features.

Note that in `DataLoader`, we randomly spilt dataset into training set (375 samples) and validation set (125 samples).

In [2]:
data_loader = DataLoader()
feature_extractor = extractors.FeatureExtractor(data_loader)

Samples that are used for training have those numbers:  [1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 37, 38, 39, 41, 42, 43, 45, 46, 47, 48, 53, 54, 55, 56, 58, 59, 60, 61, 62, 63, 64, 65, 66, 68, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 86, 87, 90, 92, 94, 95, 97, 99, 100, 101, 102, 104, 105, 106, 107, 109, 110, 112, 113, 115, 116, 117, 119, 120, 121, 122, 123, 124, 126, 127, 128, 129, 130, 133, 134, 136, 137, 138, 139, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 167, 169, 170, 171, 172, 173, 176, 177, 179, 180, 181, 183, 184, 185, 186, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 206, 207, 208, 209, 210, 211, 213, 214, 216, 217, 221, 222, 223, 224, 225, 226, 227, 228, 230, 231, 232, 233, 234, 235, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 248, 249, 250, 252, 253, 254, 255, 257, 258, 260, 261, 262, 263,

Then, we extract features respectively from RGB, infrared and depth images. We will try different fusion strategies later.

In [3]:
# extract features for training set and validation set
rgb_train_features, depth_train_features, infrared_train_features = feature_extractor.separate_fusion(set_type='train')

# extract features for validation set
rgb_val_features, depth_val_features, infrared_val_features = feature_extractor.separate_fusion(set_type='val')

Using `DataLoader`, we also load the labels for training set and validation set.

In [4]:
# load labels for training set
train_labels = data_loader.get_train_labels()

# load labels for validation set
val_labels = data_loader.get_val_labels()

## 2 Train the SVMs

We will use `SVMModel` in [`models/svm.py`](models/svm.py) to train multiclass classifiers for three different modalities respectively. Below we use the linear kernel function in SVM. 

In [5]:
# train SVM model for RGB modality
rgb_svm_model = SVMModel()
rgb_svm_model.train(rgb_train_features, train_labels)

# train SVM model for depth modality
depth_svm_model = SVMModel()
depth_svm_model.train(depth_train_features, train_labels)

# train SVM model for infrared modality
infrared_svm_model = SVMModel()
infrared_svm_model.train(infrared_train_features, train_labels)

We assess those SVMs on the training and validation set afterwards.

In [6]:
# evaluate SVM model for RGB modality on training set and on validation set
rgb_training_accuracy = rgb_svm_model.evaluate(rgb_train_features, train_labels)
print("Training Accuracy (RGB modality): {:.2f}%".format(rgb_training_accuracy * 100))
rgb_validation_accuracy = rgb_svm_model.evaluate(rgb_val_features, val_labels)
print("Validation Accuracy (RGB modality): {:.2f}%".format(rgb_validation_accuracy * 100))

# evaluate SVM model for depth modality on training set and on validation set
depth_training_accuracy = depth_svm_model.evaluate(depth_train_features, train_labels)
print("Training Accuracy (Depth modality): {:.2f}%".format(depth_training_accuracy * 100))
depth_validation_accuracy = depth_svm_model.evaluate(depth_val_features, val_labels)
print("Validation Accuracy (Depth modality): {:.2f}%".format(depth_validation_accuracy * 100))

# evaluate SVM model for infrared modality on training set and on validation set
infrared_training_accuracy = infrared_svm_model.evaluate(infrared_train_features, train_labels)
print("Training Accuracy (Infrared modality): {:.2f}%".format(infrared_training_accuracy * 100))
infrared_validation_accuracy = infrared_svm_model.evaluate(infrared_val_features, val_labels)
print("Validation Accuracy (Infrared modality): {:.2f}%".format(infrared_validation_accuracy * 100))

Training Accuracy (RGB modality): 100.00%
Validation Accuracy (RGB modality): 95.20%
Training Accuracy (Depth modality): 94.93%
Validation Accuracy (Depth modality): 84.00%
Training Accuracy (Infrared modality): 94.93%
Validation Accuracy (Infrared modality): 87.20%


Save all the svm model just in case.

In [7]:
rgb_svm_model.save_model("rgb_svm_model.pkl")
depth_svm_model.save_model("depth_svm_model.pkl")
infrared_svm_model.save_model("infrared_svm_model.pkl")

Moreover, we could adjust `C` and kernel functions in SVMs. Below we use RBF kernel function and different `C`, `gamma` parameters in order to figure out the best models.

In [8]:
def dataset3params(X_train, y_train, X_val, y_val, candidate_C, candidate_sigma, kernel='linear'):
    best_C = None
    best_sigma = None
    best_accuracy = 0

    for C in candidate_C:
        for sigma in candidate_sigma:
            model = SVMModel(C=C, kernel=kernel, gamma=sigma)
            model.train(X_train, y_train)
            accuracy = model.evaluate(X_val, y_val)

            if accuracy > best_accuracy:
                best_accuracy = accuracy
                best_C = C
                best_sigma = sigma

    return best_C, best_sigma, best_accuracy

Below we try different `C` parameter in SVM using linear kernel.

In [9]:
candidate = [0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30]

rgb_best_C, nop, rgb_best_accuracy = dataset3params(rgb_train_features, train_labels, rgb_val_features, val_labels, candidate_C=candidate,
            candidate_sigma=['scale'],
            kernel='linear')

depth_best_C, nop, depth_best_accuracy = dataset3params(depth_train_features, train_labels, depth_val_features, val_labels, candidate_C=candidate,
            candidate_sigma=['scale'],
            kernel='linear')

infrared_best_C, nop, infrared_best_accuracy = dataset3params(infrared_train_features, train_labels, infrared_val_features, val_labels, candidate_C=candidate,
            candidate_sigma=['scale'],
            kernel='linear')

print("Best Validation Accuracy (RGB modality): {:.2f}% with C={}".format(rgb_best_accuracy * 100, rgb_best_C))
print("Best Validation Accuracy (Depth modality): {:.2f}% with C={}".format(depth_best_accuracy * 100, depth_best_C))
print("Best Validation Accuracy (Infrared modality): {:.2f}% with C={}".format(infrared_best_accuracy * 100, infrared_best_C))

Best Validation Accuracy (RGB modality): 96.00% with C=0.01
Best Validation Accuracy (Depth modality): 84.80% with C=0.01
Best Validation Accuracy (Infrared modality): 88.00% with C=0.3


What if we change our kernel to RBF?

```python
rgb_best_C, rgb_best_sigma, rgb_best_accuracy = dataset3params(rgb_train_features, train_labels, rgb_val_features, val_labels, candidate_C=candidate,
            candidate_sigma=candidate,
            kernel='rbf')

depth_best_C, depth_best_sigma, depth_best_accuracy = dataset3params(depth_train_features, train_labels, depth_val_features, val_labels, candidate_C=candidate,
            candidate_sigma=candidate,
            kernel='rbf')

infrared_best_C, infrared_best_sigma, infrared_best_accuracy = dataset3params(infrared_train_features, train_labels, infrared_val_features, val_labels, candidate_C=candidate,
            candidate_sigma=candidate,
            kernel='rbf')

print("Best Validation Accuracy (RGB modality with RBF): {:.2f}% with C={} and sigma={}".format(rgb_best_accuracy * 100, rgb_best_C, rgb_best_sigma))
print("Best Validation Accuracy (Depth modality with RBF): {:.2f}% with C={} and sigma={}".format(depth_best_accuracy * 100, depth_best_C, depth_best_sigma))
print("Best Validation Accuracy (Infrared modality with RBF): {:.2f}% with C={} and sigma={}".format(infrared_best_accuracy * 100, infrared_best_C, infrared_best_sigma))
```

Well, it seems RBF kernel doesn't perform well, we will use linear kernel instead.

## 3 Late Fusion

Now, we try five different late fusion strategies using `LateFusion` class in [`model/late_fusion.py`](model/late_fusion.py) so as to lift our cross validation set accuracy. Below, we prepare all the needed variables for late fusion.

In [None]:
# Generate predictions on validation set
rgb_predictions = rgb_svm_model.predict(rgb_val_features)
depth_predictions = depth_svm_model.predict(depth_val_features)
infrared_predictions = infrared_svm_model.predict(infrared_val_features)
prediction_list = [rgb_predictions, depth_predictions, infrared_predictions]

# Generate probabilities on validation set
rgb_probabilities = rgb_svm_model.probability(rgb_val_features)
depth_probabilities = depth_svm_model.probability(depth_val_features)
infrared_probabilities = infrared_svm_model.probability(infrared_val_features)
probability_list = [rgb_probabilities, depth_probabilities, infrared_probabilities]

# Prepare accuracy list
accuracy_list = [rgb_training_accuracy, depth_training_accuracy, infrared_training_accuracy]

# Prepare prior weight list
weights = [0.4, 0.25, 0.35]  # weights for RGB, Depth, Infrared

Now we try five different methods and output the accuracy to see which of them is the best.

In [11]:
# Now, we can perform late fusion using the LateFusion class
late_fusion = LateFusion(n_classes=20)

# Perform majority vote late fusion
majority_vote_predictions = late_fusion.majority_vote(prediction_list)
majority_vote_accuracy = np.mean(majority_vote_predictions == val_labels)
print("Majority Vote Accuracy: {:.2f}%".format(majority_vote_accuracy * 100))

# Perform weighted vote based on accuracy late fusion
weighted_vote_predictions = late_fusion.weighted_vote(prediction_list, accuracy_list)
weighted_vote_accuracy = np.mean(weighted_vote_predictions == val_labels)
print("Weighted Vote Accuracy: {:.2f}%".format(weighted_vote_accuracy * 100))

# Perform average probability late fusion
average_probability_predictions = late_fusion.average_probability(probability_list)
average_probability_accuracy = np.mean(average_probability_predictions == val_labels)
print("Average Probability Accuracy: {:.2f}%".format(average_probability_accuracy * 100))

# Perform weighted probability late fusion
weighted_probability_predictions = late_fusion.weighted_probability(probability_list, accuracy_list)
weighted_probability_accuracy = np.mean(weighted_probability_predictions == val_labels)
print("Weighted Probability Accuracy: {:.2f}%".format(weighted_probability_accuracy * 100))

# Perform prior weighted probability late fusion
prior_weighted_probability_predictions = late_fusion.prior_weighted_probability(probability_list, weights)
prior_weighted_probability_accuracy = np.mean(prior_weighted_probability_predictions == val_labels)
print("Prior Weighted Probability Accuracy: {:.2f}%".format(prior_weighted_probability_accuracy * 100))

# Based on the results, we can choose the best late fusion strategy for our final model.
late_fusion_accuracies = [majority_vote_accuracy, weighted_vote_accuracy, average_probability_accuracy, weighted_probability_accuracy, prior_weighted_probability_accuracy ]
late_fusion_methods = ['Majority Vote', 'Weighted Vote', 'Average Probability', 'Weighted Probability', 'Prior Weighted Probability']
best_index = np.argmax(late_fusion_accuracies)
print("Best Late Fusion Method: {} with Accuracy: {:.2f}%".format(late_fusion_methods[best_index], late_fusion_accuracies[best_index] * 100))

Majority Vote Accuracy: 93.60%
Weighted Vote Accuracy: 93.60%
Average Probability Accuracy: 93.60%
Weighted Probability Accuracy: 93.60%
Prior Weighted Probability Accuracy: 93.60%
Best Late Fusion Method: Majority Vote with Accuracy: 93.60%


## 4 Early Fusion

Instead of adopting late fusion, we could also employ early fusion strategy. I have already implemented `early_fusion` in [`features/extractors.py`](features/extractors.py), so we are able to call it directly in our next cell:

In [12]:
# fused feature
fused_train_features = feature_extractor.early_fusion(set_type='train')
fused_val_features = feature_extractor.early_fusion(set_type='val')

# train SVM model on fused features
fused_svm_model = SVMModel()
fused_svm_model.train(fused_train_features, train_labels)

# evaluate SVM model on fused features
fused_training_accuracy = fused_svm_model.evaluate(fused_train_features, train_labels)
print("Training Accuracy (Fused features): {:.2f}%".format(fused_training_accuracy * 100))
fused_validation_accuracy = fused_svm_model.evaluate(fused_val_features, val_labels)
print("Validation Accuracy (Fused features): {:.2f}%".format(fused_validation_accuracy * 100))

Training Accuracy (Fused features): 100.00%
Validation Accuracy (Fused features): 89.60%


Well, it seems that early fusion doesn't work well. 

## 5 Different Frame Aggregation Strategies

Interestingly, since a video contains many frames, if we flatten all frames of a video into vectors and concatenate them to form the feature vector for that video, the dimensionality of this feature vector would become extremely high, which is impractical. 

Previously, we have been taking the maximum values across all frame vectors; now we will experiment with the other two approaches and employ accuracy-based weighted probability fusion.

The first approach is taking the maximum values across all frame vectors. In order not to write the full code in this notebook as we did before, we wrap the code in [`utils/pipeline.py`](utils/pipeline.py). Below we just call the functions there.

In [13]:
max_model = TrainingPipeline(aggregation_method='max')
max_model.run_validation()

Samples that are used for training have those numbers:  [2, 3, 4, 7, 8, 9, 10, 13, 14, 16, 18, 19, 20, 21, 23, 24, 25, 27, 28, 29, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 47, 50, 51, 52, 53, 54, 55, 57, 58, 61, 62, 63, 64, 66, 69, 70, 71, 72, 73, 74, 76, 77, 78, 79, 81, 83, 84, 86, 87, 88, 89, 90, 92, 93, 94, 97, 98, 100, 101, 102, 103, 105, 106, 107, 108, 110, 111, 113, 116, 118, 119, 121, 122, 123, 124, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 143, 145, 147, 148, 149, 151, 152, 153, 159, 160, 161, 162, 163, 164, 168, 170, 171, 172, 175, 176, 177, 178, 180, 181, 182, 183, 184, 185, 188, 189, 191, 192, 193, 194, 195, 196, 198, 199, 200, 201, 202, 203, 204, 205, 208, 209, 210, 212, 214, 215, 216, 217, 218, 219, 220, 221, 224, 226, 227, 229, 231, 234, 237, 239, 240, 243, 244, 247, 249, 250, 252, 253, 254, 255, 256, 257, 260, 262, 263, 264, 265, 267, 268, 269, 270, 272, 273, 274, 275, 276, 277, 278, 279, 282, 284, 285, 287, 288, 290, 291, 292,

Next, we adopt the method of assembling statistical features from the frame vectors, specifically by concatenating the mean vector, the maximum value vector, and the variance vector.

In [14]:
assembly_model = TrainingPipeline(aggregation_method='stat_concat')
assembly_model.run_validation()

Samples that are used for training have those numbers:  [4, 9, 10, 12, 13, 14, 15, 17, 19, 21, 22, 23, 24, 25, 27, 28, 29, 30, 31, 33, 34, 37, 38, 39, 40, 41, 42, 43, 45, 46, 48, 49, 50, 52, 53, 54, 55, 56, 57, 59, 60, 61, 62, 63, 65, 66, 67, 68, 69, 70, 71, 73, 74, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 94, 95, 96, 97, 98, 100, 102, 103, 105, 106, 107, 110, 111, 113, 114, 116, 117, 119, 120, 121, 123, 125, 127, 130, 131, 136, 137, 138, 139, 140, 141, 142, 144, 146, 147, 148, 149, 150, 153, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 166, 167, 168, 169, 170, 172, 173, 175, 176, 177, 180, 181, 183, 186, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 217, 219, 220, 221, 222, 223, 224, 226, 227, 228, 229, 230, 231, 232, 233, 235, 236, 238, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 256, 257, 259, 261, 262, 263, 264, 265, 266, 267, 268, 270, 271, 272, 273, 274, 275

It seems that there is not much difference between these methods.

## 6 Final Model

Now we decide to train our final model and output the accuracy on test set, also using [`utils/pipeline.py`](utils/pipeline.py).

In [2]:
final_model = TrainingPipeline(aggregation_method='mean')
final_model.run(save=False)

Samples that are used for training have those numbers:  [1, 2, 3, 4, 7, 8, 9, 13, 16, 20, 21, 22, 23, 24, 25, 29, 32, 33, 34, 35, 36, 37, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 53, 54, 56, 57, 58, 59, 60, 61, 62, 64, 65, 66, 68, 69, 71, 72, 73, 74, 75, 76, 77, 79, 80, 81, 82, 84, 85, 87, 88, 89, 90, 91, 93, 94, 95, 96, 97, 98, 99, 100, 101, 104, 105, 106, 107, 108, 109, 111, 112, 115, 116, 117, 118, 119, 120, 122, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 136, 139, 140, 141, 142, 146, 147, 148, 149, 150, 152, 153, 154, 155, 159, 160, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 181, 182, 185, 188, 189, 190, 192, 193, 195, 196, 197, 198, 199, 200, 202, 203, 204, 206, 207, 208, 209, 211, 212, 215, 216, 217, 219, 220, 221, 222, 223, 224, 227, 228, 229, 230, 231, 232, 233, 234, 237, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 252, 253, 254, 255, 256, 257, 258, 259, 260, 263, 266, 267, 268, 271, 272, 273, 274, 275, 276

(array([16, 15,  3, 11,  7,  2,  8,  9,  4, 16, 17,  8, 15, 10, 15,  2, 15,
         3,  3,  1,  3,  3, 19, 13, 14,  1, 16,  1, 14, 16, 18,  4, 15, 10,
        18, 13, 13,  7,  4, 19,  6,  3,  2, 16, 15,  9, 14,  2, 19,  2,  8,
        13, 15,  9,  1,  2, 14, 15,  2, 19,  2,  8, 19,  6,  3,  3,  3,  9,
         8,  4, 14, 15,  9,  1,  9,  2,  3, 16, 14, 19, 15,  1,  1, 13,  3,
         5, 15, 14,  8, 16,  1, 15, 19,  7,  7, 16,  8,  2, 13,  9,  5, 10,
         3,  1,  2, 19,  4, 13, 12, 18, 19, 10, 11,  1, 15, 17, 13, 14,  1,
        10,  7, 18, 16, 16, 15, 15, 19,  8,  2,  3,  3,  8,  3,  3,  5,  1,
        13, 14, 14, 17,  2, 17, 11,  3,  3, 15, 18, 18, 13, 15,  6,  8, 19,
         4, 17,  5,  9, 16,  4, 17,  7, 14,  8, 19,  3, 16, 14, 19, 19, 18,
         8,  4, 19, 17,  1,  8,  1, 17,  9,  3,  7, 13, 16, 19, 16,  6, 13,
         7,  7, 13,  2,  1,  2,  6,  2, 15, 19, 15,  3,  8]),
 array([16, 15, 11, 11,  7,  2,  8,  9,  4, 16, 17,  8,  0, 10,  2,  0,  9,
         3, 11, 15,  3,  8