#Neural Networks - Deep Learning

##Ιntermediate Assignment- Multiclass Classification using Nearest Neighbor and Nearest Class Centroid Models
###Dimitrios Tikvinas AEM: 9998

#**Imports**

In [34]:
import tensorflow as tf

from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier, NearestCentroid
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectPercentile, chi2

import numpy as np
import time


In [35]:
# Load the CIFAR-10 dataset (only way found to load them efficiently)
(X_train, Y_train), (X_test, Y_test) = tf.keras.datasets.cifar10.load_data()

# Store the images to be left unattached
X_train_un, X_test_un = X_train, X_test

# Define class labels as strings for CIFAR-10
class_labels = [
    'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'
]


#Data Preprocessing

The training data consists of *32x32 RGB images* with pixels' values being in the range of [0, 255]. We normalize this range to be [0, 1], due to the preferance of small input values for faster convergence, equal contribution of each pixel in the mix and for better generalization overall

In [36]:
# Convert to float32 array for easier management
X_train, X_test = np.array(X_train, dtype=np.float32), np.array(X_test, dtype=np.float32)

#  Normalize pixel values to be between 0 and 1
X_train, X_test = X_train / 255.0, X_test / 255.0

Each image has 32x32 pixels, with each one having 3 values for each RGB color. To be able to handle this, we will *flatten* the images into 1-D vector


In [37]:
# Flatten the images
X_train, X_test = X_train.reshape(X_train.shape[0], -1), X_test.reshape(X_test.shape[0], -1)

Feature Extraction

Here, PCA is used to reduce the number of features (dimensions) in the dataset while retaining as much variance as possible. It transforms the original data into a new set of uncorrelated variables called principal components. We choose as components's percentage the value **0.9**


In [38]:
pca = PCA(n_components=0.9).fit(X_train)
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
print("We extract {} feautures from the original {}.".format(X_train_pca.shape[1],X_train.shape[1]))

We extract 99 feautures from the original 3072.


Standardize features by removing the mean and scaling to unit variance using the StandardScaler from sklearn for robustness and interpretability

In [39]:
scaler = StandardScaler()
X_train_pca_scaled = scaler.fit_transform(X_train_pca)
X_test_pca_scaled = scaler.transform(X_test_pca)

Feature Selection

Using a criterion (such as information gain, chi square test or other statistical criteria) we can eliminate the redundant features and keep only the important ones. In this case, we use the chi square test and keep only a percentage of the features with the best scores.

In [40]:
features_perc = SelectPercentile(chi2,percentile=60).fit(X_train, Y_train)
X_train_feat_sel = features_perc.transform(X_train)
X_test_feat_sel = features_perc.transform(X_test)
print("We extract {} feautures from the original {}.".format(X_train_feat_sel.shape[1],X_train.shape[1]))

We extract 1843 feautures from the original 3072.


#1-Nearest Neighbors Classifier

In [41]:
# Number of nearest neighbors
k = 1

# Initialize the K-Nearest Neighbors classifier
knn_1 = KNeighborsClassifier(n_neighbors=k)

# Start the timer
start = time.time()

# Train the K-Nearest Neighbors model on the training data
knn_1.fit(X_train_pca, Y_train.ravel())

# Stop the timer
end = time.time()

print("Training time: {}s\n".format(end-start))

start = time.time()

# Make predictions on the test data
Y_pred = knn_1.predict(X_test_pca)

end = time.time()

print("Testing time: {}s\n".format(end-start))

# Calculate and store the accuracy of the model on both train and test set
accuracy_train = accuracy_score(knn_1.predict(X_train_pca), Y_train)
accuracy_test = accuracy_score(Y_pred, Y_test)

print(f'Accuracy on train set: {accuracy_train:.2f}')
print(f'Accuracy on test set: {accuracy_test:.2f}')


Training time: 0.014717817306518555s

Testing time: 5.956500291824341s

Accuracy on train set: 1.00
Accuracy on test set: 0.39


We can see that the kNN algorithm has a small training time but a rather big evaluation-on-the-test-set time, reaching the accuracy of the Nearest Class Centroid model.

#3-Nearest Neighbors Classifier

In [42]:
# Number of nearest neighbors
k = 3

# Initialize the K-Nearest Neighbors classifier
knn_3 = KNeighborsClassifier(n_neighbors=k)

# Start the timer
start = time.time()

# Train the K-Nearest Neighbors model on the training data
knn_3.fit(X_train_pca, Y_train.ravel())

# Stop the timer
end = time.time()

print("Training time: {}s\n".format(end-start))

start = time.time()

# Make predictions on the test data
Y_pred = knn_3.predict(X_test_pca)

end = time.time()

print("Testing time: {}s\n".format(end-start))

# Calculate and store the accuracy of the model on both train and test set
accuracy_train = accuracy_score(knn_3.predict(X_train_pca), Y_train)
accuracy_test = accuracy_score(Y_pred, Y_test)

print(f'Accuracy on train set: {accuracy_train:.2f}')
print(f'Accuracy on test set: {accuracy_test:.2f}')



Training time: 0.006695985794067383s

Testing time: 4.830379247665405s

Accuracy on train set: 0.62
Accuracy on test set: 0.37


As we can see, adding more neighbors really doesn't help us since the performance is actually worse than that of the 1-NN model and the evaluation time is bigger.

# Nearest Class Centroid Classifier


In [43]:
# Initialize the Nearest Centroid classifier
ncc = NearestCentroid()

# Start the timer
start = time.time()

# Train the Nearest Centroid model on the training data
ncc.fit(X_train_pca_scaled, Y_train.ravel())

# Stop the timer
end = time.time()

print("Training time: {}s\n".format(end-start))

start = time.time()

# Make predictions on the test data
Y_pred = ncc.predict(X_test_pca_scaled)

end = time.time()

print("Testing time: {}s\n".format(end-start))

# Calculate and store the accuracy of the model on both train and test set
accuracy_train = accuracy_score(ncc.predict(X_train_pca_scaled), Y_train)
accuracy_test = accuracy_score(Y_pred, Y_test)

print(f'Accuracy on train set: {accuracy_train:.2f}')
print(f'Accuracy on test set: {accuracy_test:.2f}')




Training time: 0.01882338523864746s

Testing time: 0.01365351676940918s

Accuracy on train set: 0.40
Accuracy on test set: 0.40


The benefit of this algorithm is its really fast training and evaluation time. It ended up having the highest accuracy on the test set

#Conclusions

On the tables given below we gathered the accuracy on the test data, the training and the testing durations of each of the 3 models presented above for different cases of data preprocessing
1. 1-Nearest Neighbors

Data Processing Technique  | Accuracy Score | Training time | Testing Time
-------------------|------------------|-------|-------
Raw training       | 0.35 | 0.0984s  | 146.5682s
With Normalization       | 0.35 | 0.0980s | 135.9891s
With PCA 0.9 | 0.39 | 0.0313s  | 6.3179s
With SelectPecentile 60% | 0.34 |0.5854s |80.5707s
With One-Hot Encoding, PCA | 0.39 | 0.0716s  | 11.0615s
With Standard Scaler, PCA | 0.33 | 0.0127s | 6.5207s

2. 3-Nearest Neighbors

Data Processing Technique  | Accuracy Score | Training time | Testing Time
-------------------|------------------|-------|-------
Raw training       | 0.33 | 0.0822s | 129.7064s
With Normalization       | 0.33 | 0.0896s  | 135.6514s
With PCA 0.9 | 0.37 | 0.0120s | 5.3075s
With SelectPecentile 60% | 0.32 |0.5844s | 95.9935s
With One-Hot Encoding, PCA | 0.31 |  0.0539s | 11.0070s
With Standard Scaler, PCA | 0.30 | 0.0150s | 5.4616s


3. Nearest Class Centroid

Data Processing Technique  | Accuracy Score | Training time | Testing Time
-------------------|------------------|-------|-------
Raw training       | 0.28 | 0.5726s  | 0.4469s
With Normalization       | 0.28 | 0.3470s  | 0.3088s
With PCA 0.9 | 0.28 | 0.0260s | 0.0089s
With SelectPecentile 60% | 0.27 | 1.4529s | 0.4553s
With One-Hot Encoding, PCA | 0.28 | 0.0230s | 0.0087s
With Standard Scaler, PCA | 0.40 | 0.0246s | 0.0110s

  As we can see, each model's accuracy remains pretty bad in the range 30-40%,
even after implementing every data preprocessing method mentioned above. The dimensionality reduction accomplished by PCA reduced the prediction time in the Nearest Neighbors' Classifiers by 95%!!!, which can be easily understood by the methodology of PCA.

We end up with choosing for the 1- and 3 - Nearest Neighbors Classifiers *Normalization* and *PCA 0.9* and for the Nearest Centroid Classifier *Normalization*, *PCA 0.9* and *Standard Scaler*