# Final Project - Training Data

This Notebook loads and displays some examples from the training dataset.

The training dataset contains a total of 8443 samples. All groups have been given the same training dataset and the final report will be graded based on the performance on this training data.

* You should expect the test dataset to have the same format as the training data: $270,000\times M$ ```numpy``` array, where $M$ is the number of test samples.
* This means that *any* pre-processing applied in the training data should also be applied in the test data.

In [2]:
import cv2
from skimage.feature import hog
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import GridSearchCV

In [3]:
# Load data
data_array = np.load('data.npy').T
label_array = np.load('labels.npy')

train_df, test_df, train_df_labels, test_df_labels = train_test_split(pd.DataFrame(data_array), pd.DataFrame(label_array), test_size=0.2, random_state=42)

In [4]:
# reshaping data to 300x300,3 for RGB images of 300 by 300 pixels
X_train = train_df.values.reshape(-1,300,300,3)
X_test=test_df.values.reshape(-1,300,300,3)

y_train = train_df_labels.values.ravel()
y_test=test_df_labels.values.ravel()

In [5]:
def extract_hog_features(images):
    hog_features = []
    for image in images:
        # Convert the image to grayscale
        gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        # Compute HOG features
        features, _ = hog(gray_image, pixels_per_cell=(8, 8), cells_per_block=(2, 2), visualize=True)
        hog_features.append(features)
    return hog_features

# Extract HOG features for training and testing sets
X_train_hog = extract_hog_features(X_train)
X_test_hog = extract_hog_features(X_test)

In [6]:
scaler = StandardScaler()
X_train_hog = scaler.fit_transform(X_train_hog)
X_test_hog = scaler.transform(X_test_hog)

In [7]:
print(X_train_hog.shape)
print(X_test_hog.shape)

(216, 46656)
(54, 46656)


In [11]:
# Define the parameter grid
param_grid = {
    'kernel': ['linear', 'rbf', 'poly', 'sigmoid'],
    'C': [0.1, 1, 10, 100]  # Adjust the values based on your preferences
}

# Create an SVM classifier
svm_classifier = SVC()

# Create a GridSearchCV object
grid_search = GridSearchCV(svm_classifier, param_grid, cv=5, scoring='accuracy')

# Fit the GridSearchCV object to the data
grid_search.fit(X_train_hog, y_train)

# Print the best parameters
print("Best Parameters:", grid_search.best_params_)

Best Parameters: {'C': 10, 'kernel': 'sigmoid'}


In [12]:
# Get the best SVM classifier
best_svm_classifier = grid_search.best_estimator_

In [15]:
y_pred_train = best_svm_classifier.predict(X_train_hog)
accuracy = accuracy_score(y_train, y_pred_train)
print(f"Accuracy in Training: {accuracy * 100:.2f}%")

Accuracy in Training: 100.00%


In [16]:
y_pred = best_svm_classifier.predict(X_test_hog)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy in Test: {accuracy * 100:.2f}%")

Accuracy in Test: 59.26%


---