This notebook is purely for my own learning.

**Aims:**

- Implement a CNN using the Kaggle ['Digit Recogniser'](https://www.kaggle.com/c/digit-recognizer/overview) competition dataset. 
- Use TPU.
- Explore data augmentation.


**Outcomes so far:**

Successful implentation of CNN using TPU. 

Unable to augment data in model as suggested by the TensorFlow [documentation](https://www.tensorflow.org/tutorials/images/data_augmentation). I want to have the benefits of saving the augmentations as these seem to me to be an integral part of the model.

I have however managed to successfully augment the data (not in this notebook) without the use of the TPU, which coincidentally had the best results.

One consideration is that the model test/train accuracy is optimal at around 4 epochs. This suggests that the TPU might be overkill and the model might be better using the CPU and having data augmentation.

**Future Endeavours:**

- Get data augmentation and TPU working optimally.
- Save best model in training.

**Resourses:**

[One.](https://www.kaggle.com/bryanb/keras-cnn-for-mnist-digit-recognition-with-tpus#1.-Load-libraries-and-check-TPU-settings)
[Two.](https://www.kaggle.com/mgornergoogle/getting-started-with-100-flowers-on-tpu)
[Three.](https://www.tensorflow.org/tutorials/load_data/images)

# Imports

In [None]:
from keras.preprocessing.image import ImageDataGenerator
import math
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import pandas as pd
from numpy.random import randint
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
import tensorflow_datasets as tfds
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras.callbacks import ModelCheckpoint
import time, os
import torch
from torch.utils import data
import torchvision
import torchvision.transforms as transforms

# TPU

In [None]:
# Detect and init the TPU
tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()

# Instantiate a distribution strategy
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

AUTO = tf.data.experimental.AUTOTUNE

# Variables

In [None]:
BATCH_SIZE = 32 * tpu_strategy.num_replicas_in_sync # this is 8 on TPU v3-8, it is 1 on CPU and GPU
IMAGE_SIZE = [28, 28]
HEIGHT = 28
WIDTH = 28
CHANNELS = 1
EPOCHS = 30

# Load data in

In [None]:
# Load in data.
train_data = pd.read_csv('../input/digit-recognizer/train.csv')
test_data = pd.read_csv('../input/digit-recognizer/test.csv')

In [None]:
# Split and reshape data.
train_y = train_data.label.to_numpy()
train_x = train_data.to_numpy()[0:,1:].reshape(len(train_data),28,28,1)
test_x = test_data.to_numpy().reshape(len(test_data),28,28,1)

# Normalise - to speed up model. Better if values are between [0,1] then [0,255].
train_x = train_x/255
test_x = test_x/255

train_x, dev_x, train_y, dev_y = train_test_split(train_x, train_y, test_size=0.2, random_state=42)

# Exploration

The competition description reads as follows:

"MNIST ("Modified National Institute of Standards and Technology") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

In this competition, your goal is to correctly identify digits from a dataset of tens of thousands of handwritten images. We’ve curated a set of tutorial-style kernels which cover everything from regression to neural networks. We encourage you to experiment with different algorithms to learn first-hand what works well and how techniques compare."

In [None]:
# Check one image and its actual label
index = randint(0, len(train_x))
image = train_x[index]
plt.imshow(image.squeeze())
print('Label =', train_y[index])

In [None]:
# Check labels and the distribution of them.
x, y = np.unique(train_y, return_counts=True)
plt.figure(figsize = (10,7))
plt.title('Class Distribution')
plt.bar(x,y)
plt.ylabel('Number')
plt.xlabel('Counts')
plt.xticks(np.arange(0, 10, step=1))
plt.show()

# Make Tensorflow dataset

In [None]:
# Put data in a tensor format for parallelization

train_dataset = (
    tf.data.Dataset.from_tensor_slices((train_x.astype(np.float32), train_y.astype(np.float32)))
    .repeat()
    .shuffle(2048)
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
)

dev_dataset = (
    tf.data.Dataset.from_tensor_slices((dev_x.astype(np.float32), dev_y.astype(np.float32)))
    .batch(BATCH_SIZE)
    .cache()
    .prefetch(AUTO)
)

test_dataset = (
    tf.data.Dataset.from_tensor_slices(test_x.astype(np.float32)).batch(BATCH_SIZE)
)

# Model

In [None]:
# instantiating the model in the strategy scope creates the model on the TPU
with tpu_strategy.scope():
    model = keras.Sequential([
                layers.InputLayer(input_shape=[28,28,1]),
                
                # Preprocessing - Augmentation
                preprocessing.RandomContrast(factor=0.10),

# #                 preprocessing.RandomWidth(factor=0.15), # horizontal stretch
#                 preprocessing.RandomRotation(factor=0.20),
# #                 preprocessing.RandomTranslation(height_factor=0.1, width_factor=0.1),
#                 layers.experimental.preprocessing.Rescaling(1./255),
        
                # First Convolutional Block
                layers.BatchNormalization(renorm=True),
                layers.Conv2D(filters=64, kernel_size=3, activation="relu", padding='same'),
                layers.MaxPool2D(2),

                # Second Convolutional Block
                layers.BatchNormalization(renorm=True),
                layers.Conv2D(filters=128, kernel_size=3, activation="relu", padding='same'),
                layers.MaxPool2D(2),

                # Third Convolutional Block
                layers.BatchNormalization(renorm=True),
                layers.Conv2D(filters=256, kernel_size=3, activation="relu", padding='same'),
                layers.MaxPool2D(2),

                # Fourth Convolutional Block
                layers.BatchNormalization(renorm=True),
                layers.Conv2D(filters=512, kernel_size=3, activation="relu", padding='same'),
                layers.MaxPool2D(2),

                layers.Flatten(),
                layers.Dropout(.2),

                layers.Dense(64,activation='relu'),
                layers.Dense(10,activation='sigmoid')])
    
    model.compile(
    optimizer=tf.keras.optimizers.Adam(epsilon=0.01),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'],
    steps_per_execution=32)

# Training

In [None]:
history = model.fit(
    train_dataset,
    validation_data=dev_dataset,
    batch_size = BATCH_SIZE,
    steps_per_epoch = train_x.shape[0]//BATCH_SIZE,
    epochs=4,
    verbose=1,
)

In [None]:
def plot_history(model_history):

    plt.figure(figsize = (20,15))
    
    plt.subplot(221)
    # summarize history for accuracy
    plt.plot(model_history.history['accuracy'])
    plt.plot(model_history.history['val_accuracy'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.grid()
    
    plt.subplot(222)
    # summarize history for loss
    plt.plot(model_history.history['loss'])
    plt.plot(model_history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.grid()
    
    plt.show()

In [None]:
plot_history(history)

It is clear that the model is reasonably well fit as the test and train accuracy and loss have both converged.

In [None]:
dev_preds = model.predict(dev_dataset)
dev_preds = np.argmax(dev_preds, axis=1)

In [None]:
# Plot confusion matrix
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay
fig, ax = plt.subplots(figsize=(12, 12))
cm = confusion_matrix(dev_y,dev_preds, normalize='true')
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels = [0,1,2,3,4,5,6,7,8,9])
disp = disp.plot(ax=ax)
ax.set_title("Confusion Matrix")
plt.show()
%matplotlib inline

While not completely accurate, the results with no data augmentation seems reasonable.

# Submission Predictions

In [None]:
# test_data_new = test_data.to_numpy().reshape(len(test_data),28*28)
test_preds = model.predict(test_x)
test_preds = np.argmax(test_preds, axis=1)
output = pd.DataFrame({'ImageId': range(1,28001), 'Label': test_preds})
output.to_csv('First.csv', index=False)
print("Your submission was successfully saved!")