# Introduction

The goal in this competition is to take an image of a handwritten single digit, and determine what that digit is.  

The data is taken from the MNIST dataset. The MNIST ("Modified National Institute of Standards and Technology") dataset is a classic within the Machine Learning community that has been extensively studied.  More detail about the dataset, including Machine Learning algorithms that have been tried on it and their levels of success, can be found [here][1].


  [1]: http://yann.lecun.com/exdb/mnist/index.html

# Loading the data

In [2]:
import numpy as np # Array manipulation
import pandas as pd # Dataframe manipulation
import tensorflow as tf
# Multilayer perceptron Neural Network
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils

In [4]:
from google.colab import files
uploaded = files.upload()

Saving train.csv to train.csv


In [5]:
# Load data
train = pd.read_csv('train.csv') 
test = pd.read_csv('test.csv')

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

Extract the features matrix X and transform it to an array of float numbers. And also extract the labels.

In [6]:
# Extract images pixels
images = train.iloc[:,1:].values
images = images.astype(np.float)

# Extract numbers Labels
labels = train.iloc[:,0].values

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  This is separate from the ipykernel package so we can avoid doing imports until


# Multilayer Perceptron

## Preprocessing

The pixel values are gray scale between 0 and 255. It is almost always a good idea to perform some scaling of input values when using neural network models. Because the scale is well known and well behaved, we can very quickly **normalize** the pixel values to the range 0 and 1 by dividing each value by the maximum of 255.

Also, the output variable is an integer from 0 to 9. This is a multi-class classification problem. As such, it is good practice to use a **one hot encoding** of the class values, transforming the vector of class integers into a binary matrix. We can easily do this using the built-in np_utils.to_categorical() helper function in Keras.

In [7]:
# Normalize input from 0-255 to 0-1
images = images / 255.0
num_pixels =  images.shape[1]

# one hot encode outputs
labels = np_utils.to_categorical(labels)
num_classes = labels.shape[1]

We are now ready to create our simple neural network model. We will define our model in a function. This is handy if you want to extend the example later and try and get a better score.

The model is a **simple neural network** with **one hidden layer** with the same **number of neurons as there are inputs (784)**. A **rectifier activation function** is used for the neurons in the hidden layer.

A **softmax activation function** is used on the output layer to turn the outputs into probability-like values and allow one class of the 10 to be selected as the model’s output prediction. **Logarithmic loss** is used as the loss function (called **categorical_crossentropy** in Keras) and the efficient **ADAM gradient descent algorithm** is used to **learn the weights**.

## Model

In [17]:
# define baseline model
def mlp_model():

		# create model
		#TODO
		model = tf.keras.models.Sequential()
		model.add(Dense(784,activation=tf.keras.activations.relu))
		model.add(tf.keras.layers.Dense(
		units=10,
		activation=tf.keras.activations.softmax))

		# Compile model
		#TODO
		#adam_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

		model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

		return model

We can now fit and evaluate the model. The model is fit **over 10 epochs with updates every 200 images**. A verbose value of 2 is used to reduce the output to one line for each training epoch.

In [18]:
# build the model
model = mlp_model()
# Fit the model
model.fit(images,labels,
          batch_size=200,
          epochs=10,
          verbose=2)

Epoch 1/10
210/210 - 3s - loss: 0.3179 - accuracy: 0.9113 - 3s/epoch - 14ms/step
Epoch 2/10
210/210 - 2s - loss: 0.1312 - accuracy: 0.9625 - 2s/epoch - 11ms/step
Epoch 3/10
210/210 - 2s - loss: 0.0878 - accuracy: 0.9754 - 2s/epoch - 10ms/step
Epoch 4/10
210/210 - 2s - loss: 0.0599 - accuracy: 0.9826 - 2s/epoch - 10ms/step
Epoch 5/10
210/210 - 2s - loss: 0.0441 - accuracy: 0.9878 - 2s/epoch - 10ms/step
Epoch 6/10
210/210 - 2s - loss: 0.0326 - accuracy: 0.9914 - 2s/epoch - 10ms/step
Epoch 7/10
210/210 - 2s - loss: 0.0245 - accuracy: 0.9940 - 2s/epoch - 11ms/step
Epoch 8/10
210/210 - 2s - loss: 0.0187 - accuracy: 0.9957 - 2s/epoch - 10ms/step
Epoch 9/10
210/210 - 2s - loss: 0.0134 - accuracy: 0.9975 - 2s/epoch - 10ms/step
Epoch 10/10
210/210 - 2s - loss: 0.0107 - accuracy: 0.9981 - 2s/epoch - 10ms/step


<keras.callbacks.History at 0x7fb6ec4fced0>

Finally, we predict the model, we change our one hot encoded (binary matrix) results to a vector of labels from 0 to 9, and we save our results in a submission file

## Evaluation

In [20]:
import pickle
import joblib

saved_model = pickle.dumps(model)
joblib.dump(model, 'model.pkl', compress=9)

INFO:tensorflow:Assets written to: ram://d41d128e-59b5-44e4-9a84-5f0ac54cee24/assets
INFO:tensorflow:Assets written to: ram://7a18c1e3-a132-4f37-ad84-8a7d15d22f18/assets


['model.pkl']