#### The objective of the project is to learn how to implement a simple image classification pipeline based on the k-Nearest Neighbour and a deep neural network. The goals of this assignment are as follows:

● Understand the basic Image Classification pipeline and the data-driven approach (train/predict stages)

● Data fetching and understand the train/val/test splits.

● Implement and apply an optimal k-Nearest Neighbor (kNN) classifier (7.5 points)

● Print the classification metric report (2.5 points)

● Implement and apply a deep neural network classifier including (feedforward neural network, RELU activations) (5 points)

● Understand and be able to implement (vectorized) backpropagation (cost stochastic gradient descent, cross entropy loss, cost functions) (2.5 points)

● Implement batch normalization for training the neural network (2.5 points)

● Understand the differences and trade-offs between traditional and NN classifiers with the help of classification metrics (5 points)

In [23]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import keras as ks
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
import h5py
from sklearn import metrics
from sklearn.metrics import confusion_matrix

In [3]:
img_df = h5py.File('SVHN_single_grey1.h5','r+') 

In [14]:
# List all groups
print("Keys: %s" % img_df.keys())

Keys: <KeysViewHDF5 ['X_test', 'X_train', 'X_val', 'y_test', 'y_train', 'y_val']>


### Fetching Data

In [15]:
x_text = img_df['X_test']
x_train = img_df['X_train']
x_val = img_df['X_val']
y_test = img_df['y_test']
y_train = img_df['y_train']
y_val = img_df['y_val']

In [17]:
print (x_text.shape)
print (x_train.shape)
print (x_val.shape)
print (y_test.shape)
print (y_train.shape)
print (y_val.shape)

(18000, 32, 32)
(42000, 32, 32)
(60000, 32, 32)
(18000,)
(42000,)
(60000,)


####  Implement and apply an optimal k-Nearest Neighbor (kNN) classifier (7.5 points)

In [18]:
# Using KNN Classifier
model_knn = KNeighborsClassifier(n_neighbors=17)

In [19]:
# flattening the data
X_train = np.reshape(x_train,(42000,32*32))

In [20]:
model_knn.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=17, p=2,
           weights='uniform')

In [22]:
# Change the dimension from 3D to 2D and predict
X_text = np.reshape(x_text,(18000,32*32))
prediction = model_knn.predict(X_text)

#### Print the classification metric report (2.5 points)

In [24]:
print("Accuracy:", metrics.accuracy_score(prediction,y_test))

Accuracy: 0.5287222222222222


In [25]:
print("Confusion Metrix:   \n", metrics.confusion_matrix(prediction,y_test))

Confusion Metrix:   
 [[1274   99   94  129  105  165  319  100  247  323]
 [  66 1335  226  260  249  172  128  209  119  142]
 [  34   55 1000  135   41   47   40  110   68   71]
 [  35   95   95  736   60  269   67   88  114   89]
 [  46   69   44   47 1177   58  135   27   94   71]
 [  45   36   37  160   16  693  116   37  115   93]
 [  98   31   35   36   54  137  744   37  265   55]
 [  39   46  141   57   19   31   21 1128   25   73]
 [  78   26   52   94   43  119  202   24  654  111]
 [  99   36   79   65   48   77   60   48  111  776]]


In [27]:
cr = metrics.classification_report(y_test,prediction)
print(cr)

              precision    recall  f1-score   support

           0       0.45      0.70      0.55      1814
           1       0.46      0.73      0.56      1828
           2       0.62      0.55      0.59      1803
           3       0.45      0.43      0.44      1719
           4       0.67      0.65      0.66      1812
           5       0.51      0.39      0.44      1768
           6       0.50      0.41      0.45      1832
           7       0.71      0.62      0.67      1808
           8       0.47      0.36      0.41      1812
           9       0.55      0.43      0.48      1804

   micro avg       0.53      0.53      0.53     18000
   macro avg       0.54      0.53      0.52     18000
weighted avg       0.54      0.53      0.52     18000



#### Implement and apply a deep neural network classifier including (feedforward neural network, RELU activations) (5 points)

In [32]:
#Initialize Sequential model
model_nn = ks.models.Sequential()

In [33]:
#Reshape data from 2D to 1D -> 32X32 to 1024
model_nn.add(ks.layers.Reshape((1024,),input_shape=(32,32)))

#### Implement batch normalization for training the neural network

In [34]:
#Normalize the data
model_nn.add(ks.layers.BatchNormalization())

In [35]:
model_nn.add(ks.layers.Dense(150, activation='relu'))
model_nn.add(ks.layers.Dense(100, activation='relu'))

In [36]:
#Output layer
model_nn.add(ks.layers.Dense(10, activation='softmax', name='Output'))

#### Understand and be able to implement (vectorized) backpropagation (cost stochastic gradient descent, cross entropy loss, cost functions) (2.5 points)

In [38]:
model_nn.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

In [39]:
# Change train and test labels into one-hot vectors
trainY = ks.utils.to_categorical(y_train, num_classes=10)
testY = ks.utils.to_categorical(y_test, num_classes=10)

In [40]:
#Train the model
features_val_arr = np.array(x_text)
model_nn.fit(x_train,trainY,          
          validation_data=(features_val_arr, testY),
          epochs=75,
          batch_size=150,validation_split = 0.01,
         shuffle='batch')

Instructions for updating:
Use tf.cast instead.
Train on 42000 samples, validate on 18000 samples
Epoch 1/75
Epoch 2/75
Epoch 3/75
Epoch 4/75
Epoch 5/75
Epoch 6/75
Epoch 7/75
Epoch 8/75
Epoch 9/75
Epoch 10/75
Epoch 11/75
Epoch 12/75
Epoch 13/75
Epoch 14/75
Epoch 15/75
Epoch 16/75
Epoch 17/75
Epoch 18/75
Epoch 19/75
Epoch 20/75
Epoch 21/75
Epoch 22/75
Epoch 23/75
Epoch 24/75
Epoch 25/75
Epoch 26/75
Epoch 27/75
Epoch 28/75
Epoch 29/75
Epoch 30/75
Epoch 31/75
Epoch 32/75
Epoch 33/75
Epoch 34/75
Epoch 35/75
Epoch 36/75
Epoch 37/75
Epoch 38/75
Epoch 39/75
Epoch 40/75
Epoch 41/75
Epoch 42/75
Epoch 43/75
Epoch 44/75
Epoch 45/75
Epoch 46/75
Epoch 47/75
Epoch 48/75
Epoch 49/75
Epoch 50/75
Epoch 51/75
Epoch 52/75
Epoch 53/75
Epoch 54/75
Epoch 55/75
Epoch 56/75
Epoch 57/75


Epoch 58/75
Epoch 59/75
Epoch 60/75
Epoch 61/75
Epoch 62/75
Epoch 63/75
Epoch 64/75
Epoch 65/75
Epoch 66/75
Epoch 67/75
Epoch 68/75
Epoch 69/75
Epoch 70/75
Epoch 71/75
Epoch 72/75
Epoch 73/75
Epoch 74/75
Epoch 75/75


<keras.callbacks.History at 0x1bdce153ef0>

In [41]:
# model evaluation
evaluate = model_nn.evaluate(x_text, testY)
print(evaluate)

[0.5984484396908019, 0.8418888888888889]


In [43]:
# predict the model
y_predict=model_nn.predict_classes(x_text)

In [44]:
cr=metrics.classification_report(y_test,y_predict)
print(cr)

              precision    recall  f1-score   support

           0       0.85      0.89      0.87      1814
           1       0.83      0.86      0.85      1828
           2       0.87      0.85      0.86      1803
           3       0.80      0.78      0.79      1719
           4       0.88      0.87      0.88      1812
           5       0.81      0.83      0.82      1768
           6       0.84      0.83      0.83      1832
           7       0.89      0.87      0.88      1808
           8       0.81      0.81      0.81      1812
           9       0.84      0.83      0.83      1804

   micro avg       0.84      0.84      0.84     18000
   macro avg       0.84      0.84      0.84     18000
weighted avg       0.84      0.84      0.84     18000



#### Understand the differences and trade-offs between traditional and NN classifiers with the help of classification metrics (5 points)

 ##### KNN does not perform well compared to DNN
 
KNN: 

Accuracy - 52.87%

Precision - 54%

Recall - 53% 

F1 Score - 52%


DNN:

Accuracy - 84.18%

Precision - 84%

Recall - 84% 

F1 Score - 84%

It is clear from the above that DNN performs better than KNN. Neural network the hidden layers learn the image features by adjusting the weights and is able to classify better than KNN