### Digit Recognier with ANN and Hyperparameter tuning with Keras.


Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.

The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.

In [43]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

#### Data Loading and pre-processing

In [48]:
digits= pd.read_csv('train_digit.csv')

In [49]:
digits.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


##### Coverting y target variable values into one-hot encoding vectors.

In [53]:
digits=pd.get_dummies(digits, columns=['label'])

In [56]:
digits.head()

Unnamed: 0,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,label_0,label_1,label_2,label_3,label_4,label_5,label_6,label_7,label_8,label_9
0,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


In [60]:
#y variale for dependent feature
y= digits.iloc[: , -10:]
y

Unnamed: 0,label_0,label_1,label_2,label_3,label_4,label_5,label_6,label_7,label_8,label_9
0,0,1,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,0,0
3,0,0,0,0,1,0,0,0,0,0
4,1,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...
41995,1,0,0,0,0,0,0,0,0,0
41996,0,1,0,0,0,0,0,0,0,0
41997,0,0,0,0,0,0,0,1,0,0
41998,0,0,0,0,0,0,1,0,0,0


In [61]:
x= digits.iloc[:,:-10]

In [62]:
x.head()

Unnamed: 0,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [12]:
#uniwue values and frequencies of target variable
y.value_counts()

1    4684
7    4401
3    4351
9    4188
2    4177
6    4137
0    4132
4    4072
8    4063
5    3795
Name: label, dtype: int64

In [18]:
digits.info
digits.isnull().sum()

pixel0      0
pixel1      0
pixel2      0
pixel3      0
pixel4      0
           ..
pixel779    0
pixel780    0
pixel781    0
pixel782    0
pixel783    0
Length: 784, dtype: int64

In [63]:
digits.shape

(42000, 794)

In [65]:
print(y.shape)

(42000, 10)


Scaling and normalizing each pixel value by dividing by 255.0 as every value is between 0 and 255 and converting to float32 for Keras to work properly.

In [66]:
x_scaled= x.values/255.0

In [67]:
x_scaled.shape

(42000, 784)

In [68]:
x_sclaed=x_scaled.astype('float32')

In [69]:
y=y.astype('float32')

In [77]:
y_ = pd.DataFrame.to_numpy(y)

In [78]:
y_

array([[0., 1., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.]], dtype=float32)

In [79]:
#Splitting into train and test.

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test= train_test_split(x_scaled, y_, test_size=0.2, random_state=2)

In [81]:
[i.shape for i in (x_train,y_train,x_test,y_test)]

[(33600, 784), (33600, 10), (8400, 784), (8400, 10)]

#### Building Model and Tuning Hyperparameters

In [104]:
from tensorflow import keras
from tensorflow.keras.models import Sequential
#from tensorflow.keras.layers.core import Dense, Activation

In [105]:
from tensorflow.keras.layers import Dense, Flatten, Dropout

In [106]:
from kerastuner import HyperModel #It will help tuning HyperParameters

Hyperparameter tuning is heart of  ANN model and it directly affects the performance of the model. We can tune hyperparameters such as:

Learning Rate : It determines how quick the model will learn. It should be selected carefully. If it is small, the model speed will be very slow which means the derivative 
of loss function goes to its minimum point very slowly. If it is very high, the derivative of the loss fucntion cannot reach to its global minimum point. 
Therefore I preffered to choose its values as 1e-2, 1e-3, 1e-4.

The number of nodes: Nodes are points of the Layers on ANN. We need to optimize them and it is very general that they can be choosen as 32,64,128,256 and 512. 

The number of layer: It determines the complexity of the ANN model like the nodes. If you choose very high number, it can result in "BİAS". If you choose very small like 2,
it may not be good to solve complex and non-linear problems. We use dense function to create layers.

Activation function: Normally ANN is the linear method (Z=W*X+b), but we use activation function to make ANN non-linear. The most famous activation functions are relu and 
tanh for layers. If you use binary classification, you need to use "sigmoid" funcion. If you classify more than 2 classes, you need to use "softmax" function.

L2 Regularization: Regularization is used to reduce "VARİANCE" problem. One of the regularization techniques is L2 that is added to loss function to punish the weights.
By doing this weights getting closer to zero which reduces the model complexity.

Dropout: It is another regularization techniues. It is based on to close some of nodes randomly in determined layers. It uses "BERNOULLİ PROBABLİTY" to determine which nodes 
is getting closed. It is very effective like L2 regularization. It is very commen to use both L2 and Dropout regularization.

Adam optimazation: There are different optimazation methods like "momentum" and "RMSProp" to speed the model and increase the model performance. "ADAM" optimization technique
uses noth momentum and RMSProp(Root Mean Square Prop)

Batch-size : It is based on to divide data into small datasets and train them. It increase the performance beside speeding model training time. Exponentially weighted avarages 
statistical technique is used to calculate avarage loss on this technique. 

In [120]:
# subclassing HyperModel class of keras Tuner API
    
class AnnHyperModel(HyperModel): 
    def __init__(self, input_shape):
        self.input_shape = input_shape
        
        
    def build(self, hp):
        model = Sequential()
        model.add(
            layers.Dense(
                units=hp.Int('units', 8, 64, 4, default=8), 
                activation=hp.Choice(
                    'dense_activation', 
                    values=['relu', 'tanh', 'elu'],
                    default='relu'), 
                activity_regularizer=tf.keras.regularizers.l2(0.001),
                input_shape=input_shape) )
        
        model.add( 
            layers.Dense(
            units=hp.Int('units', 8, 64, 4, default=16), 
            activation=hp.Choice(
                'dense_activation', 
                values=['relu', 'tanh', 'elu'], 
                default='relu')))

        model.add( 
            layers.Dropout(
                hp.Float( 
                    'dropout',
                    min_value=0.0, 
                    max_value=0.1, 
                    default=0.005, 
                    step=0.01)))
        
    
        model.add(layers.Dense(10,activation = "softmax"))
        
        model.compile(
            optimizer=keras.optimizers.Adam(
            hp.Choice('learning_rate',
                      values=[1e-2, 1e-3, 1e-4]),
                      beta_1=0.9,
                      beta_2=0.999,
                      epsilon=1e-07)
            ,loss='mse',
            metrics=['accuracy'] )

        return model

In [121]:
x_train.shape[1]

784

In [122]:
# Ceate the object from the class
input_shape = (x_train.shape[1],)
hypermodel = AnnHyperModel(input_shape)

In [123]:
from keras_tuner.tuners import RandomSearch
# randomsearch for doing hyperparameter search(one of the keras tuner)

tuner_rs = RandomSearch( hypermodel,
                        objective='val_accuracy', 
                        seed=42, 
                        max_trials=12)

INFO:tensorflow:Reloading Oracle from existing project .\untitled_project\oracle.json


###### Summary of Search space

In [124]:
tuner_rs.search_space_summary()

Search space summary
Default search space size: 4
units (Int)
{'default': 8, 'conditions': [], 'min_value': 8, 'max_value': 64, 'step': 4, 'sampling': None}
dense_activation (Choice)
{'default': 'relu', 'conditions': [], 'values': ['relu', 'tanh', 'elu'], 'ordered': False}
dropout (Float)
{'default': 0.005, 'conditions': [], 'min_value': 0.0, 'max_value': 0.1, 'step': 0.01, 'sampling': None}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}


##### Fitting model to find best model

In [126]:
tuner_rs.search(x_train,y_train,epochs=10,validation_data=(x_test, y_test) )

Trial 13 Complete [00h 00m 32s]
val_accuracy: 0.82833331823349

Best val_accuracy So Far: 0.9551190733909607
Total elapsed time: 00h 09m 21s
INFO:tensorflow:Oracle triggered exit


In [174]:
#choosing best model among the models
best_model = tuner_rs.get_best_models(num_models=1)[0]
loss,mse= best_model.evaluate(x_test,y_test)
print(loss,mse)

0.009337542578577995 0.9551190733909607


In [128]:
#show layers of the model
best_model.layers

[<tensorflow.python.keras.layers.core.Dense at 0x26435bc7310>,
 <tensorflow.python.keras.layers.core.Dense at 0x2643a7bf9d0>,
 <tensorflow.python.keras.layers.core.Dropout at 0x26496a23670>,
 <tensorflow.python.keras.layers.core.Dense at 0x26496a23d90>]

In [129]:
#show weights of the model
best_model.weights

[<tf.Variable 'dense/kernel:0' shape=(784, 44) dtype=float32, numpy=
 array([[-0.06263542,  0.04531138, -0.0748824 , ...,  0.01153156,
          0.01775699,  0.08125643],
        [-0.04914296,  0.00234093, -0.03429755, ..., -0.04680659,
          0.08407092,  0.00454393],
        [ 0.02987475,  0.02771106, -0.02359401, ..., -0.06580382,
         -0.05348879, -0.02779204],
        ...,
        [ 0.04090622,  0.06988665, -0.0724809 , ..., -0.05513179,
         -0.05573845, -0.04813696],
        [ 0.0077439 , -0.08369192, -0.01565094, ...,  0.02709211,
          0.02910966,  0.00770351],
        [-0.0163511 , -0.05442271,  0.05490044, ..., -0.03002822,
         -0.08381375,  0.01675011]], dtype=float32)>,
 <tf.Variable 'dense/bias:0' shape=(44,) dtype=float32, numpy=
 array([-0.09811703,  0.06942645,  0.04843093, -0.04841722, -0.2773593 ,
         0.0653546 ,  0.11984874,  0.05258157, -0.04089143, -0.07047771,
        -0.14025517, -0.01583971, -0.00043606,  0.01668673,  0.07405388,
      

In [131]:
#Summary of the best model
best_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 44)                34540     
_________________________________________________________________
dense_1 (Dense)              (None, 44)                1980      
_________________________________________________________________
dropout (Dropout)            (None, 44)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                450       
Total params: 36,970
Trainable params: 36,970
Non-trainable params: 0
_________________________________________________________________


In [133]:
# model fitting
# batch_size_step = X_train/batch_size 
best_model.fit(x_train, y_train,
          batch_size=100, epochs=10)   # epochs = number of iterations

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x2649059fbe0>

In [172]:
#making predictions

y_pred= best_model.predict(x_test)
print(y_pred) #gives probabilities of each class per record
print(y_pred.shape)

[[5.72151795e-04 1.43536163e-04 1.87523983e-04 ... 2.57512129e-05
  4.56654932e-04 2.00557071e-04]
 [4.89107094e-07 2.76402176e-07 1.96263343e-07 ... 1.68985847e-04
  4.11843757e-06 9.92800117e-01]
 [4.21707558e-10 2.19558416e-08 8.04107270e-09 ... 4.22072510e-08
  6.71924694e-09 2.73353589e-06]
 ...
 [4.34491980e-08 4.15987688e-06 6.42450977e-05 ... 1.75582613e-06
  5.85219588e-08 4.38976855e-08]
 [2.20294183e-04 1.40872013e-04 1.04570245e-04 ... 6.99141992e-06
  1.30251155e-03 3.10806645e-05]
 [4.12998052e-05 3.43647685e-06 3.06684524e-06 ... 1.20361015e-01
  4.32738807e-06 8.79384518e-01]]
(8400, 10)


In [142]:
#converting predictions into actual output classes
y_pred_classes= np.argmax(y_pred,axis=1)
print(y_pred_classes,y_pred_classes.shape)

#converting y_test into actual output classes to compare with corresponding predictioons
y_test_classes= np.argmax(y_test,axis=1)
print(y_test_classes,y_test_classes.shape)


[6 9 5 ... 3 6 9] (8400,)
[6 9 5 ... 3 6 9] (8400,)


###### Analysing model's performance

In [146]:
from sklearn.metrics import confusion_matrix, classification_report

In [147]:
cmf= confusion_matrix(y_test_classes,y_pred_classes)
cr = classification_report(y_test_classes,y_pred_classes)

In [165]:
print('Confusion Matrix is : \n\n{} \n\nClassification Report is : \n\n{}' .format(cmf,cr))

Confusion Matrix is : 

[[804   1   0   1   0   0   1   0  13   1]
 [  0 936   6   3   0   0   1   4  11   1]
 [  1   1 795   6   7   0   1  10   6   2]
 [  1   2  11 793   1  10   1   8  19  18]
 [  5   2   3   1 813   0   3   3   6  20]
 [  5   0   4   9   1 684   1   4  14   7]
 [ 10   2   5   2   0  11 793   0  16   0]
 [  1   1   7   1   3   0   0 846   5   9]
 [  6   2   4   2   0   8   1   3 764   3]
 [  4   1   0   2  10   5   0  17  17 778]] 

Classification Report is : 

              precision    recall  f1-score   support

           0       0.96      0.98      0.97       821
           1       0.99      0.97      0.98       962
           2       0.95      0.96      0.96       829
           3       0.97      0.92      0.94       864
           4       0.97      0.95      0.96       856
           5       0.95      0.94      0.95       729
           6       0.99      0.95      0.97       839
           7       0.95      0.97      0.96       873
           8       0.88    