# MNIST Baseline Convolutional Neural Network
*Anders Poirel 04-10-2019

Data from the Kannada Mnist competition on Kaggle. Here, similar the original MNIST, the goal is to correctly classify handwritten digits in the Kannada script.

In [1]:
import pandas as pd
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D
from tensorflow.keras.utils import to_categorical, normalize
import matplotlib.pyplot as plt 
import seaborn as sns

## Preparing the data

In [6]:
data = pd.read_csv('../data/raw/train.csv')

In [7]:
data.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


As in the simple NN example, we normalize and use one-hot-encoding

In [8]:
y_train = data['label']
y_train = to_categorical(y_train)
X_train = data.drop('label', axis = 1).values
X_train = normalize(X_train)

Convolutional neural nets expect data to be fed in the form of tensors (pixel_width, pixel_height, number_of_colors). In its current form, each image is in the form of a 1D array hence we'll need 
to reshape them. The data description says that each image is monochrome, 28x28 thus each data point is a 28x28x1 tensor.

In [11]:
X_train = np.reshape(X_train, (len(data.index), 28, 28, 1))

In [12]:
X_train.shape

(60000, 28, 28, 1)

## Training the model

Here a very standard CNN architecture is used (The 16-32-64 architecture is known to perform well on simple image classification tasks). Dropout is added to reduce overfitting, though we maybe have sufficient data (60k in the training sample) that this regardless won't be much of an issue 

In [13]:
model = Sequential()
model.add(Conv2D(16, (3, 3), activation = 'relu', input_shape = (28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(32, (3,3), activation = 'relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3,3), activation = 'relu'))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(64, activation = 'relu'))
model.add(Dense(10, activation = 'softmax'))

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


In [14]:
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

In [15]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 16)        160       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 16)        0         
_________________________________________________________________
dropout (Dropout)            (None, 13, 13, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 32)        4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 32)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 5, 5, 32)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          1

In [16]:
history = model.fit(X_train, y_train,
         validation_split = 0.2, epochs = 15)

Train on 48000 samples, validate on 12000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<tensorflow.python.keras.callbacks.History at 0x20e8121cc18>

### Evaluating model performance

For evaluating performance we can use the same code as in the dense neural network example.

We examine how training and validation set loss and accuracy evolve over time. Note: for this to display, add validation_split = 0.2 as a parameter to model.fit above. Otherwise, we will want to remove the parameter to train the final model on the entire dataset.

In [17]:
sns.set()

def plot_loss(history):
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='upper right')
    return

def plot_acc(history):
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('Model accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Test'], loc='lower right')
    return 

In [18]:
plot_loss(history)

NameError: name 'history' is not defined

In [None]:
plot_acc(history)

### Making predicitons for Kaggle

In [23]:
submission = pd.read_csv('../data/raw/sample_submission.csv')
X_test = pd.read_csv('../data/raw/test.csv')
X_test.drop('id', axis = 1, inplace = True)
X_test = normalize(X_test)

ValueError: Unable to coerce to DataFrame, shape must be (5000, 784): given (5000, 1)

In [None]:
preds = model.predict(X_test)
submission['label'] = pd.DataFrame(preds,
                                   columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']).idxmax(axis = 1)

In [None]:
submission.to_csv('../output/base_submission.csv')

#### Predicitons on alternate validation set

We can now check if the CNN architecture performs better on the alternate validation set.

In [24]:
val = pd.read_csv('../data/raw/Dig-MNIST.csv')
X_val = val.drop('label', axis = 1)
y_val = val['label']

In [26]:
X_val.shape

(10240, 784)

In [27]:
X_val = normalize(X_val.values)

In [31]:
X_val = np.reshape(X_val, (len(val), 28, 28, 1))

In [32]:
y_pred = model.predict(X_val)
y_pred = pd.DataFrame(y_pred, columns = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
y_pred = y_pred.idxmax(axis = 1).values.astype(np.int)
y_val = y_val.values

In [34]:
from sklearn.metrics import accuracy_score
accuracy_score(y_val, y_pred)

0.789453125

Which compares to the ~62% accuracy of the standard NN model.