<h1 id="Contents">Contents<a href="#Contents"></a></h1>
        <ol>
        <li><a class="" href="#Imports">Imports</a></li>
<li><a class="" href="#Loading-Data">Loading Data</a></li>
<ol><li><a class="" href="#Getting-X-and-y">Getting X and y</a></li>
<li><a class="" href="#Getting-Train-and-Test-Datasets">Getting Train and Test Datasets</a></li>
<li><a class="" href="#Converting-the-Array-to-Tensorflow-Dataset">Converting the Array to Tensorflow Dataset</a></li>
</ol><li><a class="" href="#Modeling">Modeling</a></li>
<ol><li><a class="" href="#Creating-Callbacks">Creating Callbacks</a></li>
<li><a class="" href="#Base-Model">Base Model</a></li>
<li><a class="" href="#Model_1">Model 1</a></li>
<li><a class="" href="#Model-2">Model 2</a></li>
<li><a class="" href="#Final-Model">Final Model</a></li>
<li><a class="" href="#Making-Submission-File">Making Submission File</a></li>
</ol>

# Imports

In [2]:
import matplotlib.pyplot as plt
import os
import numpy as np
import pandas as pd
from sklearn.metrics import ConfusionMatrixDisplay, classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from tensorflow.keras.models import Sequential, Model
import tensorflow.keras.layers as tfl
import tensorflow as tf

import warnings
warnings.filterwarnings('ignore')

# Loading Data

In [3]:
df = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
test_df = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')

## Getting X and y

In [4]:
X = df.iloc[:, 1:].values
y = df.iloc[:,0].values

In [5]:
test_data = test_df.values
test_data.shape

(28000, 784)

## Getting Train and Test Datasets

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

We'll normalize the data:

In [7]:
X_train=X_train/255.
X_test = X_test/255.
test_data = test_data/255.

In [8]:
X_train.max(), X_test.max(), test_data.max()

(1.0, 1.0, 1.0)

## Converting the Array to Tensorflow Dataset

In [13]:
X_train_ds = tf.data.Dataset.from_tensor_slices(X_train)
y_train_ds = tf.one_hot(y_train, depth=10)
y_train_ds = tf.data.Dataset.from_tensor_slices(y_train_ds)

X_test_ds = tf.data.Dataset.from_tensor_slices(X_test)
y_test_ds = tf.one_hot(y_test, depth=10)
y_test_ds = tf.data.Dataset.from_tensor_slices(y_test_ds)

train_ds = tf.data.Dataset.zip((X_train_ds, y_train_ds))
test_ds = tf.data.Dataset.zip((X_test_ds, y_test_ds))

test_data_ds = tf.data.Dataset.from_tensor_slices(test_data)

In [14]:
batch_size = 64
train_ds = train_ds.batch(batch_size)
test_data_ds = test_data_ds.batch(batch_size)
test_ds = test_ds.batch(batch_size)

In [15]:
for data, label in train_ds.take(1):
    input_shape = data[0].shape
    output_shape = label[0].shape
    print(data[0].shape)
    print(label[0].shape)

(784,)
(10,)


Excellent!

# Modeling

## Creating Callbacks

We'll be creating two callbacks:
1. `ModelCheckpoint`: This will save the weights of the best model which can be loaded later.
2. `EarlyStopping`: This callback will stop the training if the performance is not improving.

In [16]:
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.callbacks import EarlyStopping
def model_checkpoint(name, directory="MNIST"):
    dir = os.path.join(directory, name)
    mch = ModelCheckpoint(
        dir, monitor='val_accuracy', verbose=1, save_best_only=True,
        save_weights_only=False, mode='auto', save_freq='epoch',
    )
    return mch

est = EarlyStopping(
    monitor="val_loss",
    verbose=1,
    restore_best_weights=True,
)

## Base Model

We'll start with a base NN model (We already had a base ML model and even that was performing great). The base model here will be a simple feed forward network.

In [12]:
inputs = tfl.Input(shape= input_shape, name="input")
x = tfl.Dense(64, name="dense_1", activation="relu")(inputs)
x = tfl.Dense(128, name="dense_2", activation="relu")(x)
outputs = tfl.Dense(10, activation="softmax", name="output")(x)

model_0 = Model(inputs=inputs, outputs=outputs, name="base_model")

model_0.summary()

Model: "base_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(None, 784)]             0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                50240     
_________________________________________________________________
dense_2 (Dense)              (None, 128)               8320      
_________________________________________________________________
output (Dense)               (None, 10)                1290      
Total params: 59,850
Trainable params: 59,850
Non-trainable params: 0
_________________________________________________________________


In [13]:
model_0.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss="categorical_crossentropy",
    metrics=['accuracy'],
)

In [14]:
EPOCHS = 10
history = model_0.fit(
    train_ds,
    validation_data=test_ds,
    epochs=EPOCHS,
    callbacks=[model_checkpoint("model_0")],
)

Epoch 1/10


2022-06-23 11:29:02.662595: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)



Epoch 00001: val_accuracy improved from -inf to 0.93357, saving model to MNIST/model_0


2022-06-23 11:29:06.070469: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


Epoch 2/10

Epoch 00002: val_accuracy improved from 0.93357 to 0.95214, saving model to MNIST/model_0
Epoch 3/10

Epoch 00003: val_accuracy improved from 0.95214 to 0.96000, saving model to MNIST/model_0
Epoch 4/10

Epoch 00004: val_accuracy improved from 0.96000 to 0.96095, saving model to MNIST/model_0
Epoch 5/10

Epoch 00005: val_accuracy did not improve from 0.96095
Epoch 6/10

Epoch 00006: val_accuracy did not improve from 0.96095
Epoch 7/10

Epoch 00007: val_accuracy improved from 0.96095 to 0.96429, saving model to MNIST/model_0
Epoch 8/10

Epoch 00008: val_accuracy improved from 0.96429 to 0.96452, saving model to MNIST/model_0
Epoch 9/10

Epoch 00009: val_accuracy improved from 0.96452 to 0.96571, saving model to MNIST/model_0
Epoch 10/10

Epoch 00010: val_accuracy improved from 0.96571 to 0.96976, saving model to MNIST/model_0


In [17]:
model_0.evaluate(test_ds), model_0.evaluate(train_ds)



([0.12471525371074677, 0.9697619080543518],
 [0.029692815616726875, 0.9896560907363892])

We see that even this simple a model performs excellent.

## Model_1

This will be a shallow CNN. We'll reshape the one column data into $28\times 28\times 1$ so that a CNN can work on it.

In [21]:
inputs = tfl.Input(shape= input_shape, name="input")
x = tfl.Reshape((28, 28, 1), name="reshape")(inputs)
x = tfl.Conv2D(32, 5, name="conv_1")(x)
x = tfl.Conv2D(64, 5, name="conv_2")(x)
x = tfl.GlobalAveragePooling2D(name="global_avg_pooling")(x)
x = tfl.Dense(64, activation="relu")(x)
outputs = tfl.Dense(10, activation="softmax", name="output")(x)
model_1 = Model(inputs=inputs, outputs=outputs, name="simple_conv_model")

model_1.summary()

Model: "simple_conv_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(None, 784)]             0         
_________________________________________________________________
reshape (Reshape)            (None, 28, 28, 1)         0         
_________________________________________________________________
conv_1 (Conv2D)              (None, 24, 24, 32)        832       
_________________________________________________________________
conv_2 (Conv2D)              (None, 20, 20, 64)        51264     
_________________________________________________________________
global_avg_pooling (GlobalAv (None, 64)                0         
_________________________________________________________________
dense (Dense)                (None, 64)                4160      
_________________________________________________________________
output (Dense)               (None, 10)          

In [22]:
EPOCHS = 10
model_1.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss="categorical_crossentropy",
    metrics=['accuracy'],
)


history_1 = model_1.fit(
    train_ds,
    validation_data=test_ds,
    epochs=EPOCHS,
#     callbacks=[model_checkpoint(model_1.name)],
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [23]:
model_1.evaluate(test_ds), model_1.evaluate(train_ds)



([0.6770902276039124, 0.7752380967140198],
 [0.6592001914978027, 0.7746560573577881])

It seems we need to go deeper. This will be our next model. But before that let's try the same model as `model_1` but `GlobalAveragePooling2D` replaced with `GlobalMaxPooling2D`.

In [24]:
inputs = tfl.Input(shape= input_shape, name="input")
x = tfl.Reshape((28, 28, 1), name="reshape")(inputs)
x = tfl.Conv2D(32, 5, name="conv_1")(x)
x = tfl.Conv2D(64, 5, name="conv_2")(x)
x = tfl.GlobalMaxPooling2D(name="global_avg_pooling")(x)
x = tfl.Dense(64, activation="relu")(x)
outputs = tfl.Dense(10, activation="softmax", name="output")(x)
model_1 = Model(inputs=inputs, outputs=outputs, name="simple_conv_model")


EPOCHS = 10
model_1.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss="categorical_crossentropy",
    metrics=['accuracy'],
)

history_1 = model_1.fit(
    train_ds,
    validation_data=test_ds,
    epochs=EPOCHS,
#     callbacks=[model_checkpoint(model_1.name)],
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [25]:
model_1.evaluate(test_ds), model_1.evaluate(train_ds)



([0.1085703894495964, 0.9671428799629211],
 [0.07677440345287323, 0.9744709134101868])

OMG! Just one change is resulting in a massive improvement. Well, it seems that from now one we should use be using `GlobalMaxPooling2D` instead of `GlobalAveragePooling2D`.

## Model 2

In [46]:
inputs = tfl.Input(shape= input_shape, name="input")
x = tfl.Reshape((28, 28, 1), name="reshape")(inputs)
x = tfl.Conv2D(32, 3, name="conv_11", activation='relu')(x)
x = tfl.Conv2D(64, 3, name="conv_12", activation='relu')(x)
x = tfl.MaxPool2D(pool_size=(2, 2), name="maxpool_1")(x)
x = tfl.BatchNormalization(axis=-1, name="batchnorm_1")(x)

x = tfl.Conv2D(128, 3, name="conv_21", activation='relu')(x)
x = tfl.Conv2D(256, 3, name="conv_22", activation='relu')(x)
x = tfl.MaxPool2D(pool_size=(2, 2), name="maxpool_2")(x)
x = tfl.Dropout(0.25, name="dropout_21")(x)
x = tfl.BatchNormalization(axis=-1, name="batchnorm_2")(x)

x = tfl.Conv2D(512, 2, name="conv_31", activation='relu')(x)
x = tfl.Conv2D(1024, 2, name="conv_32", activation='relu')(x)
# x = tfl.Conv2D(1024, 2, name="conv_33")(x)
# x = tfl.MaxPool2D(pool_size=(2, 2), name="maxpool3")(x)
# x = tfl.BatchNormalization(axis=-1, name="batchnorm3")(x)

# x = tfl.Conv2D(512, 2, name="conv_31")(x)
# x = tfl.Conv2D(1024, 2, name="conv_32")(x)
# x = tfl.MaxPool2D(pool_size=(2, 2), name="maxpool3")(x)


x = tfl.GlobalMaxPooling2D(name="global_avg_pooling")(x)
# x = tfl.Flatten(name="flatten")(x)
# x = tfl.BatchNormalization(axis=-1, name="batchnorm3")(x)
# x = tfl.Dense(64, activation="relu", name="dense_1")(x)
x = tfl.Dense(128, activation="relu", name="dense_1")(x)
x = tfl.Dropout(0.25, name="dropout_1")(x)
x = tfl.Dense(256, activation="relu", name="dense_2")(x)
x = tfl.Dropout(0.4, name="dropout_2")(x)
outputs = tfl.Dense(10, activation="softmax", name="output")(x)
model_2 = Model(inputs=inputs, outputs=outputs, name="deep_conv_model")

model_2.summary()

Model: "deep_conv_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(None, 784)]             0         
_________________________________________________________________
reshape (Reshape)            (None, 28, 28, 1)         0         
_________________________________________________________________
conv_11 (Conv2D)             (None, 26, 26, 32)        320       
_________________________________________________________________
conv_12 (Conv2D)             (None, 24, 24, 64)        18496     
_________________________________________________________________
maxpool_1 (MaxPooling2D)     (None, 12, 12, 64)        0         
_________________________________________________________________
batchnorm_1 (BatchNormalizat (None, 12, 12, 64)        256       
_________________________________________________________________
conv_21 (Conv2D)             (None, 10, 10, 128)   

In [47]:
EPOCHS = 10
model_2.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss="categorical_crossentropy",
    metrics=['accuracy'],
)


history_2 = model_2.fit(
    train_ds,
    validation_data=test_ds,
    epochs=EPOCHS,
#     callbacks=[model_checkpoint(model_2.name)],
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Let's train for some more epochs.

In [48]:
EPOCHS = 15

history_2 = model_2.fit(
    train_ds,
    validation_data=test_ds,
    epochs=EPOCHS,
    callbacks=[model_checkpoint(model_2.name)],
)

Epoch 1/15

Epoch 00001: val_accuracy improved from -inf to 0.98810, saving model to MNIST/deep_conv_model
Epoch 2/15

Epoch 00002: val_accuracy improved from 0.98810 to 0.99000, saving model to MNIST/deep_conv_model
Epoch 3/15

Epoch 00003: val_accuracy improved from 0.99000 to 0.99143, saving model to MNIST/deep_conv_model
Epoch 4/15

Epoch 00004: val_accuracy did not improve from 0.99143
Epoch 5/15

Epoch 00005: val_accuracy did not improve from 0.99143
Epoch 6/15

Epoch 00006: val_accuracy improved from 0.99143 to 0.99167, saving model to MNIST/deep_conv_model
Epoch 7/15

Epoch 00007: val_accuracy did not improve from 0.99167
Epoch 8/15

Epoch 00008: val_accuracy did not improve from 0.99167
Epoch 9/15

Epoch 00009: val_accuracy did not improve from 0.99167
Epoch 10/15

Epoch 00010: val_accuracy did not improve from 0.99167
Epoch 11/15

Epoch 00011: val_accuracy did not improve from 0.99167
Epoch 12/15

Epoch 00012: val_accuracy did not improve from 0.99167
Epoch 13/15

Epoch 00013

In [49]:
model_2.evaluate(test_ds), model_2.evaluate(train_ds)



([0.06591577082872391, 0.9902380704879761],
 [0.011813807301223278, 0.9976984262466431])

In [50]:
best_model_2 = tf.keras.models.load_model("MNIST/deep_conv_model")

best_model_2.evaluate(test_ds), best_model_2.evaluate(train_ds)



([0.048807624727487564, 0.9916666746139526],
 [0.009835764765739441, 0.9976190328598022])

In [25]:
y_pred = model_2.predict(test_data_ds)
y_pred = np.argmax(y_pred, axis = -1)
submission = pd.DataFrame({'ImageId': range(1, test_data.shape[0]+1), 'Label': y_pred})
submission.groupby('Label').count()

In [26]:
y_pred

array([2, 0, 9, ..., 3, 9, 2])

## Final Model

In [51]:
model_3 = Sequential(name="final_model")
model_3.add(tfl.Reshape((28, 28, 1), name="reshape", input_shape=input_shape))
model_3.add(tfl.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1),
             padding='same')),
model_3.add(tfl.Conv2D(32, kernel_size=(3, 3), activation='relu',
                padding='same')),
model_3.add(tfl.AveragePooling2D(pool_size=(2, 2)))
model_3.add(tfl.BatchNormalization())
model_3.add(tfl.Dropout(0.3))
model_3.add(tfl.Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
model_3.add(tfl.Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
model_3.add(tfl.AveragePooling2D(pool_size=(2, 2)))
model_3.add(tfl.BatchNormalization())
model_3.add(tfl.Dropout(0.3))
model_3.add(tfl.Conv2D(128, kernel_size=(3, 3), activation='relu' , padding='same'))
model_3.add(tfl.Conv2D(128, kernel_size=(3, 3), activation='relu' , padding='same'))
model_3.add(tfl.AveragePooling2D(pool_size=(2, 2)))
model_3.add(tfl.BatchNormalization())

model_3.add(tfl.Flatten())
model_3.add(tfl.Dropout(0.35))
model_3.add(tfl.Dense(128, activation='relu'))
model_3.add(tfl.BatchNormalization())
model_3.add(tfl.Dense(10, activation='softmax'))

model_3.compile(optimizer='adam', loss='categorical_crossentropy', 
    metrics=['accuracy'])

model_3.summary()

Model: "final_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
reshape (Reshape)            (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 28, 28, 32)        320       
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 28, 28, 32)        9248      
_________________________________________________________________
average_pooling2d_9 (Average (None, 14, 14, 32)        0         
_________________________________________________________________
batch_normalization_12 (Batc (None, 14, 14, 32)        128       
_________________________________________________________________
dropout_9 (Dropout)          (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 14, 14, 64)        

In [52]:
history_3 = model_3.fit(train_ds, epochs=10, validation_data=test_ds,
#         callbacks=[model_checkpoint(model_3.name)]
                      )

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [53]:
history_3 = model_3.fit(train_ds, epochs=30, validation_data=test_ds,
        callbacks=[model_checkpoint(model_3.name)]
                      )

Epoch 1/30

Epoch 00001: val_accuracy improved from -inf to 0.99167, saving model to MNIST/final_model
Epoch 2/30

Epoch 00002: val_accuracy did not improve from 0.99167
Epoch 3/30

Epoch 00003: val_accuracy did not improve from 0.99167
Epoch 4/30

Epoch 00004: val_accuracy improved from 0.99167 to 0.99286, saving model to MNIST/final_model
Epoch 5/30

Epoch 00005: val_accuracy improved from 0.99286 to 0.99357, saving model to MNIST/final_model
Epoch 6/30

Epoch 00006: val_accuracy did not improve from 0.99357
Epoch 7/30

Epoch 00007: val_accuracy improved from 0.99357 to 0.99429, saving model to MNIST/final_model
Epoch 8/30

Epoch 00008: val_accuracy did not improve from 0.99429
Epoch 9/30

Epoch 00009: val_accuracy did not improve from 0.99429
Epoch 10/30

Epoch 00010: val_accuracy did not improve from 0.99429
Epoch 11/30

Epoch 00011: val_accuracy did not improve from 0.99429
Epoch 12/30

Epoch 00012: val_accuracy did not improve from 0.99429
Epoch 13/30

Epoch 00013: val_accuracy d

In [54]:
model_3.evaluate(test_ds), model_3.evaluate(train_ds)



([0.02114895172417164, 0.9947618842124939],
 [0.0009168467367999256, 0.9997090101242065])

## Making Submission File

In [55]:
y_pred = model_2.predict(test_data_ds)
y_pred = np.argmax(y_pred, axis = -1)
submission = pd.DataFrame({'ImageId': range(1, test_data.shape[0]+1), 'Label': y_pred})
submission.groupby('Label').count()

Unnamed: 0_level_0,ImageId
Label,Unnamed: 1_level_1
0,2759
1,3200
2,2806
3,2776
4,2753
5,2519
6,2735
7,2880
8,2808
9,2764


In [58]:
submission.to_csv("submission.csv", index = False)

In [59]:
from IPython.display import FileLink
FileLink("submission.csv")