# MNIST data set processing - classifying handwritten digits


Let's classify some handwritten digits! In this kernel, I will be implementing two neural nets; one classical, LeNet-5 architecture, and another architecture not too advanced. 


## Prepare data, define functions etc.
First, let's preprocess and prepare it a bit. 

In [1]:
import pandas as pd
import numpy as np

def create_submission(test_preds, file_name = "submission.csv"):
    submission = pd.concat([pd.Series(np.arange(1,len(test_preds) + 1)),pd.Series(test_preds)], axis = 1)
    submission.columns = ['ImageId','Label']
    submission.to_csv(file_name, index = False)

  return f(*args, **kwds)
  return f(*args, **kwds)


In [2]:
import pandas as pd
import numpy as np
train = pd.read_csv("digit-recognizer/train.csv")
test = pd.read_csv("digit-recognizer/test.csv")

In [3]:
import keras
from keras.regularizers import l2 # Perhaps not use?
from keras.models import Model, Sequential
from keras.preprocessing.sequence import pad_sequences
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import Dropout, Dense, Activation, Embedding, Input, Reshape, Flatten, UpSampling2D,AveragePooling2D,Layer
from keras.utils import to_categorical


x_train = np.array(train.drop(["label"], axis = 1))
y_train = np.array(train['label'])
x_test = np.array(test)
# Convert into a nice input shape for the neural net. 
x_train = x_train.reshape(42000,28,28,1)
x_test = x_test.reshape(28000,28,28,1)

# Convert the training labels to categorical. 
y_train = to_categorical(y_train)
#y_test = to_categorical(y_test)

Using TensorFlow backend.
  return f(*args, **kwds)
  _config = json.load(open(_config_path))


## Implementing the first architecture

Let's implement the first architecture in the task given. I am not sure who first came up with this architecture, but if you know, I would be happy to receive information to reference him or her. The implementation can be seen below. Below we can also see a summary of the network which makes its structure quite clear. 

In [4]:
model = Sequential()
model.add(Conv2D(filters=4, kernel_size = (5,5), strides = 1, padding = "same",  input_shape=(28,28,1), activation ="tanh"))
model.add(Conv2D(filters=8, kernel_size = (4,4), strides = 2, padding = "same", activation = "relu"))
model.add(Conv2D(filters=12, kernel_size = (4,4), strides = 2, padding = "same", activation = "relu"))
model.add(Flatten())
model.add(Dense(units = 200))
model.add(Dropout(0.5))
model.add(Dense(units = 10, activation = "softmax"))
model.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adam(), metrics=['accuracy'])
model.summary()

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 28, 28, 4)         104       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 14, 14, 8)         520       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 7, 7, 12)          1548      
_________________________________________________________________
flatten_1 (Flatten)          (None, 588)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               117800    
_________________________________________________________________
dropout_1 (Dropout)  



In [5]:
n_epochs = 10
model.fit(x_train,y_train, epochs = n_epochs)

Instructions for updating:
Use tf.cast instead.
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x123583cc0>

In [7]:
y_preds = model.predict(x_test)
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np
def get_preds(preds):
    return np.apply_along_axis(np.argmax, 1, preds)


create_submission(get_preds(y_preds), file_name = "submission_regular_CNN.csv")

Okay, a great score actually! However, those last few percentages could probably be reached:

- By applying regularization method, such as l2-regularization and dropout layers. 
- Using a more thorough validation method such as K-fold cross-validation during training, or simply having a validation set. 
- Optimizing hypteparameters, such as the learning rate for *ADAM*. 

I leave those improvements for a day with more time. Also, I think that this architecture is not as good as the coming one, so I think I will leave this one for now. 

## LeNet-5 Architecture

Okay, now we should implement the LeNet-5 architecture. This architecture is a known one, and more information on it can be found [here](https://engmrk.com/lenet-5-a-classic-cnn-architecture/). However, that link seems to have implemented it differently from the way I have; it uses a SoftMax activation function in the last layer, which I am not sure whether it is correct. 


First, some googling led me to realize that there is no RBFLayer predefined in Keras which is the activation function used in the last layer in the LeNet-5 archticeture, so we should define our own. There was a [finished implementation on StackOverflow](https://stackoverflow.com/questions/53855941/how-to-implement-rbf-activation-function-in-keras), so I simply took that. Thank you [today@StackOverflow](https://stackoverflow.com/users/2099607/today). 

In [8]:
# This implementation is taken from StackOverflow and was posted by today@StackOverflow. All kudos to him. 
# It can be found at https://stackoverflow.com/questions/53855941/how-to-implement-rbf-activation-function-in-keras
from keras.layers import Layer
from keras import backend as K

class RBFLayer(Layer):
    def __init__(self, units, gamma, **kwargs):
        super(RBFLayer, self).__init__(**kwargs)
        self.units = units
        self.gamma = K.cast_to_floatx(gamma)

    def build(self, input_shape):
        self.mu = self.add_weight(name='mu',
                                  shape=(int(input_shape[1]), self.units),
                                  initializer='uniform',
                                  trainable=True)
        super(RBFLayer, self).build(input_shape)

    def call(self, inputs):
        diff = K.expand_dims(inputs) - self.mu
        l2 = K.sum(K.pow(diff,2), axis=1)
        res = K.exp(-1 * self.gamma * l2)
        return res

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.units)


In [9]:
le_net5 = Sequential()
le_net5.add(Conv2D(filters = 6, kernel_size = (5,5), strides = 1,activation = "tanh", input_shape=(28,28,1), padding = "same"))
le_net5.add(AveragePooling2D(pool_size=(2,2), strides = 2, padding = "valid"))
le_net5.add(Conv2D(filters = 16, kernel_size = (5,5), strides = 1, activation = "tanh"))
le_net5.add(Dropout(0.05))
le_net5.add(AveragePooling2D(pool_size=(2,2), strides = 2, padding = "same"))
le_net5.add(Conv2D(filters = 120, kernel_size = (5,5), strides = 1, activation = "tanh"))
le_net5.add(Dropout(0.05))
le_net5.add(Flatten())
le_net5.add(Dense(units = 84, activation = "tanh"))
le_net5.add(RBFLayer(10, gamma=0.5))
le_net5.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adam(), metrics=['accuracy'])
le_net5.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 28, 28, 6)         156       
_________________________________________________________________
average_pooling2d_1 (Average (None, 14, 14, 6)         0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 10, 10, 16)        2416      
_________________________________________________________________
dropout_2 (Dropout)          (None, 10, 10, 16)        0         
_________________________________________________________________
average_pooling2d_2 (Average (None, 5, 5, 16)          0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 1, 1, 120)         48120     
_________________________________________________________________
dropout_3 (Dropout)          (None, 1, 1, 120)         0         
__________

In [10]:
le_net5.fit(x_train, y_train, epochs = n_epochs)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x127dd6160>

In [12]:
y_preds = le_net5.predict(x_test)

create_submission(get_preds(y_preds), "submission_leNet55.csv")

This network also had a great accuracy. Nice. However, the accuracy seems to land at around 0.989, close to 0.99 on the training set, indicating it might get stuck in a local optima. I applied dropout, but none seem to increase the test score to go above 0.985. If anyone has any feedback and improvement suggestions, I would be delighted to receive it! 

Thank you!