# A SIMPLE CONVOLUTIONAL NEURAL NETWORK FROM SCRATCH

_**Building a Simple Convolutional Neural Network (CNN) from Scratch on MNIST dataset.**_

**The Experiment:**
- Load MNIST dataset containing of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
- Scales the data to keep the range of pixel values between 0 and 1.
- Split the data into train, validation and test set
- Change the dimension of each of the data set to feed into the neural network.
- Apply one-hot-encoding to labels for model to compare with its output.
- Create an appropriate convolutional neural network.
- Train the model with checkpoint and early stopping and measure performance over validation data during traning.
- Analyze the learning curve and evaluate model's performance on the test set. 

## Importing Packages

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
import matplotlib.pyplot as plt

## Data Acquisition & Analysis

In [22]:
# Loads MNIST dataset
# NOTE: Downloading for the first time may take few minutes to complete

(X_train_full, y_train_full), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

In [None]:
# Checks the shape of the datasets

print("Full training set shape:", # Code to check the train set shape)
print("Test set shape:", # Code to check the test set shape)

In [None]:
# Checks the type of the array
X_train_full.dtype

## Data Preprocessing

In [None]:
# As with dtype 'uint8', the data can range from 0 to 255, transform the data 
# (by dividing the values by 255.) to fit within a specific range like between 0 and 1,
# for effective neural network model training

X_train_full = # Code here
X_test = # Code here

In [None]:
# Seperate out 5000 instances from train set stratifically to be used as validation set

X_train, X_val, y_train, y_val = train_test_split(
    # Code here)

In [None]:
# To match the input shape requirement of the CNN model, 
# add channel as third (and last) dimention to each dataset
# Use np.expand_dim method and -1 to indicate last axis

X_train = # Code here
X_val = # Code here
X_test = # Code here

In [None]:
# Checks for the updated shape
X_train.shape

In [None]:
# Convert class vectors for each of the datasets to binary class matrices 
# (one-hot encoding) calling `tf.keras.utils.to_categorical` method
# passing the dataset and distinct classes in the dataset.

num_classes = 10

y_train = # Code here
y_val = # Code here
y_test = # Code here

In [None]:
# Checks for the updated shape
y_train.shape

## Modeling

In [None]:
# Sets the global random seed for operations that rely on a random seed
tf.random.set_seed(42)

# Create a list of the following layers and pass it to `tf.keras.Sequential` to
# initialize the CNN model.
# Note all the layers could be access through module `tf.keras.layers`
# 1. `Input` layer that accepts input of `shape` (28, 28, 1)
# 2. `Conv2D` layer with 32 filters, `kernel_size` of (3, 3) and "rely" `activation`
# 3. `MaxPooling2D` layer with `pool_size` of (2, 2)
# 4. `Conv2D` layer with 64 filters, `kernel_size` of (3, 3) and "rely" `activation`
# 5. `MaxPooling2D` layer with `pool_size` of (2, 2)
# 6. `Flatten` layer
# 7. `Dropout` layer with 0.5 dropout rate
# 8. `Dense` layer with 10 output unit and "softmax" as `activation`


model = # Code here
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    ...
])

In [None]:
# Shows the model summary
model.summary(show_trainable=True)

## Training the Model

In [None]:
# Code to compile the model with `categorical_crossentropy` as `loss`, 
# `adam` as `optimizer` and `["accuracy"]` as `metrics`

In [None]:
# Configures the callbacks for checkpoints and early stopping

callbacks=[
    tf.keras.callbacks.ModelCheckpoint("./models/mnist/checkpoints/mnist.weights.keras", save_best_only=True),
    tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
]

In [None]:
# Call `fit` metho of the model passing 
# train set (features and labels seperated by comma), 64 to `batch_size`, 
# 50 to `epochs`, configured callback to `callbacks` and 
# validation data (features and labels seperated by comma) to `validation_data`

history = # Code here

In [43]:
# Saves the trained model for later reference
# NOTE: Make sure the folder "models" exists under the current working directory

model.save("./models/mnist/mnist.keras")

## Model Evaluation
Visualizes the learning progress over train and validation set

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True, figsize=(14,4))

ax1.plot(history.history["loss"], "b-", label="Train loss")
ax1.plot(history.history["val_loss"], "r-", label="Validation loss")
ax1.set_xlabel("Epoch")
ax1.set_ylabel("Loss")
ax1.legend()
ax1.set_title("Train vs. Validation Loss")

ax2.plot(history.history["accuracy"], "b-", label="Train accuracy")
ax2.plot(history.history["val_accuracy"], "r-", label="Validation accuracy")
ax2.set_xlabel("Epoch")
ax2.set_ylabel("Accuracy")
ax2.legend()
ax2.set_title("Train vs. Validation Accuracy")

fig.suptitle("Learning Curves")

In [None]:
print(f"Model was its best (lowest validation loss) at epoch {np.argmin(history.history["val_loss"]) + 1}")

In [None]:
# Evaluates the model on test set
model_test_performance = model.evaluate(X_test, y_test)

print(f"Test Performance: \
      Loss: {model_test_performance[1]:.2f}, Accuracy: {model_test_performance[1]*100:.2f}%")

**Observations:**

- Did the model overfit? Write observations on this.

- At the end of which epoch the model was its best (on validation loss)?

- Was the training stopped early? If so, it stopped at which epoch and why?

- How was the test performance?