# 🔍 Feature Extraction

Now comes the heavy lifter of this project; which will extract highly descriptive features from the preprocessed images.

Let's start by importing the necessary functions and classes

In [3]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier
from sklearn import svm
from sklearn.metrics import classification_report
from sklearn.preprocessing import MinMaxScaler
import joblib
import pickle

from FeatureExtraction import FeatureExtraction
from utils import *

Load the SIFT + BoVW features extracted before

In [4]:
X_train, y_train = load_features_from_file(f"Saved Features/BoVW_features_training_2.csv")

Now that our training data is ready, let's train a number of random classifiers (without any hyperparameter tuning) and print out their classification reports; to see how good our results really are

In [5]:
# Define the classifiers
classifiers = {
    "Logistic Regression": LogisticRegression(),
    "LDA": LinearDiscriminantAnalysis(),
    "QDA": QuadraticDiscriminantAnalysis(),
    "SVM": svm.SVC(),
    "Decision Tree": DecisionTreeClassifier()
}

# Train and test each classifier
for name, clf in classifiers.items():
    # Train the classifier
    clf.fit(X_train, y_train)
    
    # Predict on the training data
    y_train_pred = clf.predict(X_train)
    
    # Calculate and print the result statistics
    print(f"Classification Report on {name}:\n", classification_report(y_train, y_train_pred))

Classification Report on Logistic Regression:
               precision    recall  f1-score   support

         0.0       0.99      0.94      0.97       575
         1.0       0.97      0.99      0.98       605
         2.0       0.86      0.99      0.92       621
         3.0       0.99      0.86      0.92       597

    accuracy                           0.95      2398
   macro avg       0.95      0.95      0.95      2398
weighted avg       0.95      0.95      0.95      2398

Classification Report on LDA:
               precision    recall  f1-score   support

         0.0       1.00      1.00      1.00       575
         1.0       1.00      1.00      1.00       605
         2.0       0.99      1.00      1.00       621
         3.0       1.00      0.99      1.00       597

    accuracy                           1.00      2398
   macro avg       1.00      1.00      1.00      2398
weighted avg       1.00      1.00      1.00      2398





Classification Report on QDA:
               precision    recall  f1-score   support

         0.0       1.00      1.00      1.00       575
         1.0       1.00      1.00      1.00       605
         2.0       1.00      1.00      1.00       621
         3.0       1.00      1.00      1.00       597

    accuracy                           1.00      2398
   macro avg       1.00      1.00      1.00      2398
weighted avg       1.00      1.00      1.00      2398

Classification Report on SVM:
               precision    recall  f1-score   support

         0.0       1.00      1.00      1.00       575
         1.0       1.00      1.00      1.00       605
         2.0       0.99      1.00      1.00       621
         3.0       1.00      0.99      1.00       597

    accuracy                           1.00      2398
   macro avg       1.00      1.00      1.00      2398
weighted avg       1.00      1.00      1.00      2398

Classification Report on Decision Tree:
               precision    

Quite an amazing job on the training set!
It's not fair game yet though; we still need to check whether overfitting has occured or not. To that extent, let's evaluate on the validation set

Load the features of the validation data

In [6]:
X_val, y_val = load_features_from_file(f"Saved Features/BoVW_features_validation_2.csv")

### 🏛 Judgement Time

In [7]:
# Train and test each classifier
for name, clf in classifiers.items():    
    # Predict on the training data
    y_val_pred = clf.predict(X_val)
    
    # Calculate and print the result statistics
    print(f"Classification Report on {name}:\n", classification_report(y_val, y_val_pred))

Classification Report on Logistic Regression:
               precision    recall  f1-score   support

         0.0       0.99      0.93      0.96       225
         1.0       0.96      0.98      0.97       193
         2.0       0.83      0.99      0.90       179
         3.0       0.99      0.87      0.93       203

    accuracy                           0.94       800
   macro avg       0.94      0.94      0.94       800
weighted avg       0.95      0.94      0.94       800

Classification Report on LDA:
               precision    recall  f1-score   support

         0.0       1.00      0.99      0.99       225
         1.0       1.00      1.00      1.00       193
         2.0       0.97      0.99      0.98       179
         3.0       1.00      0.99      1.00       203

    accuracy                           0.99       800
   macro avg       0.99      0.99      0.99       800
weighted avg       0.99      0.99      0.99       800

Classification Report on QDA:
               precisi

<img src="https://media1.tenor.com/m/QxqYH15_UxYAAAAd/wow-omg.gif" width="250">

Look at this! Even before any hyperparameter tuning, we already have an SVM model that has achieved on accuracy of approximately 100%!!!
There is still room for improvement though, since the f1-scores on the final 2 classes are not perfect; there are some very rare incorrect classifications. They shall be addressed in the next folder

<div align="center">
    <img src="https://i.imgur.com/LMiA2O5.gif" width=800/>
</div>

### ❓ Question: Can you guess, from these results, which model we shall try first in the "Model Selection & Training" phase?