# Assignment 3
Start with the sample machine learning program/code for hand-written digit recognition using MNIST dataset.
Change the network architecture of the machine learning model to increase the training accuracy of the model to not less than 99.30%.

## Prerequisite
Loading the MNIST dataset in Keras. The 'keras.datasets' module contains datasets that are used for machine learning tasks. The MNIST dataset is a large database of handwritten digits that is used for training image processing systems. The load_data() function loads the dataset into memory and returns a tuple for training data and one for testing data. 

Next the data is preprocessed, the reshape converts each 28x28 image into a single 784-element vector to prepare it for the fully connected layers. Then it changes the data type from integer to float to facilitate better computation in the neural network. And the dividing by 255 normalizes the pixel values to the range [0, 1] by dividing by 255 (the maximum value of a pixel).

One-hot encoding transforms categorical labels into a binary matrix where each class is represented by a vector with a single high (1) value and the rest low (0). The to_categorical function is used for this purpose where the train_labels are thee original labels in integer format and '10' is the number of classes. 

In [2]:
from tensorflow.keras.datasets import mnist
from tensorflow import keras

# Load data
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Preprocess
train_images = train_images.reshape((60000, 28, 28, 1)).astype("float32") / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype("float32") / 255

train_labels = keras.utils.to_categorical(train_labels, 10)
test_labels = keras.utils.to_categorical(test_labels, 10)

## Model one 

### Model architecture
- Convolutional Layers: Three Conv2D layers with increasing filters (32, 64, 128) to capture spatial features.
- Batch Normalization: Applied after each Conv2D layer to stabilize and accelerate training.
- Pooling Layers: MaxPooling2D layers to reduce spatial dimensions and computational complexity.
- Global Average Pooling: Replaces Flatten to reduce the number of parameters and prevent overfitting.
- Dense Layers: A Dense layer with 128 units followed by a Dropout layer to prevent overfitting. Final Dense layer with 10 units for classification.
- Regularization: Dropout layer with a 0.4 rate to prevent overfitting. Early stopping and learning rate reduction to optimize training

In [7]:
import tensorflow as tf

def create_model_one():
    model = keras.Sequential([
        keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        keras.layers.BatchNormalization(),
        keras.layers.MaxPooling2D((2, 2)),

        keras.layers.Conv2D(64, (3, 3), activation='relu'),
        keras.layers.BatchNormalization(),
        keras.layers.MaxPooling2D((2, 2)),

        keras.layers.Conv2D(128, (3, 3), activation='relu'),
        keras.layers.BatchNormalization(),
        keras.layers.GlobalAveragePooling2D(),

        keras.layers.Dense(128, activation='relu'),
        keras.layers.Dropout(0.4),

        keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    
    return model

model_one = create_model_one()
model_one.summary()

### Training model one

In [8]:
import time

def train_model_one(model):

    early_stopping = keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=5, restore_best_weights=True)
    reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3)

    start_time = time.time()
    history = model.fit(train_images, train_labels, 
                        epochs=40, 
                        batch_size=256, 
                        validation_split=0.2, 
                        callbacks=[early_stopping, reduce_lr])

    end_time = time.time()
    elapsed_time = end_time - start_time
    minutes, seconds = divmod(elapsed_time, 60)

    print(f"Training stopped at epoch {early_stopping.stopped_epoch + 1}")
    print(f"Training time: {int(minutes)} minutes and {int(seconds)} seconds")
    
    return history


### Testing model one

In [9]:
model_one = create_model_one()   
history_one = train_model_one(model_one) 
test_loss, test_acc = model_one.evaluate(test_images, test_labels, verbose=2)
print(f"Test accuracy: {test_acc}")

Epoch 1/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 203ms/step - accuracy: 0.8481 - loss: 0.5611 - val_accuracy: 0.1060 - val_loss: 5.2147 - learning_rate: 0.0010
Epoch 2/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m66s[0m 337ms/step - accuracy: 0.9832 - loss: 0.0585 - val_accuracy: 0.3114 - val_loss: 2.3516 - learning_rate: 0.0010
Epoch 3/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 231ms/step - accuracy: 0.9885 - loss: 0.0383 - val_accuracy: 0.9436 - val_loss: 0.1793 - learning_rate: 0.0010
Epoch 4/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 262ms/step - accuracy: 0.9920 - loss: 0.0265 - val_accuracy: 0.9833 - val_loss: 0.0561 - learning_rate: 0.0010
Epoch 5/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 263ms/step - accuracy: 0.9936 - loss: 0.0200 - val_accuracy: 0.9896 - val_loss: 0.0401 - learning_rate: 0.0010
Epoch 6/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0

### Results
| Trainable parameters | Accuracy | Time taken by training    | Optimizer | Loss   | Epochs             | Batch Size |
|----------------------|----------|---------------------------|-----------|--------|--------------------|------------|
| 110,922              | 99.31%   | 23 minutes and 12 seconds | Adam      | 0.0253 | 28 (stopped early) | 256        |

## Model two

In [25]:
import tensorflow as tf

def create_model_two():
    model = keras.Sequential([
        keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        keras.layers.BatchNormalization(),
        keras.layers.MaxPooling2D((2, 2)),

        keras.layers.Conv2D(64, (3, 3), activation='relu'),
        keras.layers.BatchNormalization(),
        keras.layers.MaxPooling2D((2, 2)),

        keras.layers.Conv2D(128, (3, 3), activation='relu'),
        keras.layers.BatchNormalization(),
        keras.layers.GlobalAveragePooling2D(),

        keras.layers.Dense(128, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
        keras.layers.BatchNormalization(),
        keras.layers.Dropout(0.3),

        keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    
    return model

model_two = create_model_two()
model_two.summary()

### Training model two

In [26]:
import time

def train_model_two(model):

    early_stopping = keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=8, restore_best_weights=True)
    reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2)

    start_time = time.time()
    history = model.fit(train_images, train_labels, 
                        epochs=40, 
                        batch_size=256, 
                        validation_split=0.2, 
                        callbacks=[early_stopping, reduce_lr])

    end_time = time.time()
    elapsed_time = end_time - start_time
    minutes, seconds = divmod(elapsed_time, 60)

    print(f"Training stopped at epoch {early_stopping.stopped_epoch + 1}")
    print(f"Training time: {int(minutes)} minutes and {int(seconds)} seconds")
    
    return history

### Testing model two

In [27]:
model_two = create_model_two()   
history_two = train_model_two(model_two) 
test_loss, test_acc = model_two.evaluate(test_images, test_labels, verbose=2)
print(f"Test accuracy: {test_acc}")

Epoch 1/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 176ms/step - accuracy: 0.8587 - loss: 0.5859 - val_accuracy: 0.1060 - val_loss: 5.0469 - learning_rate: 0.0010
Epoch 2/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 168ms/step - accuracy: 0.9816 - loss: 0.1629 - val_accuracy: 0.3362 - val_loss: 1.8166 - learning_rate: 0.0010
Epoch 3/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 165ms/step - accuracy: 0.9885 - loss: 0.1134 - val_accuracy: 0.9166 - val_loss: 0.3296 - learning_rate: 0.0010
Epoch 4/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 165ms/step - accuracy: 0.9910 - loss: 0.0842 - val_accuracy: 0.9843 - val_loss: 0.0962 - learning_rate: 0.0010
Epoch 5/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 174ms/step - accuracy: 0.9929 - loss: 0.0622 - val_accuracy: 0.9868 - val_loss: 0.0733 - learning_rate: 0.0010
Epoch 6/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0

### Results
| Trainable parameters | Accuracy | Time taken by training    | Optimizer | Loss   | Epochs             | Batch Size |
|----------------------|----------|---------------------------|-----------|--------|--------------------|------------|
| 111,178             | 99.39%   | 13 minutes and 19 seconds | Adam      | 0.0259 | 25 (stopped early) | 256        |

## Model Three

In [28]:
import tensorflow as tf

def create_model_three():
    model = keras.Sequential([
        keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        keras.layers.BatchNormalization(),
        keras.layers.MaxPooling2D((2, 2)),

        keras.layers.Conv2D(64, (3, 3), activation='relu'),
        keras.layers.BatchNormalization(),
        keras.layers.MaxPooling2D((2, 2)),

        keras.layers.Conv2D(128, (3, 3), activation='relu'),
        keras.layers.BatchNormalization(),
        keras.layers.GlobalAveragePooling2D(),

        keras.layers.Flatten(),
        keras.layers.Dense(256, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
        keras.layers.BatchNormalization(),
        keras.layers.Dropout(0.5),

        keras.layers.Dense(128, activation='relu', kernel_regularizer=keras.regularizers.l2(0.001)),
        keras.layers.BatchNormalization(),
        keras.layers.Dropout(0.4),

        keras.layers.Dense(10, activation='softmax')
    ])

    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    
    return model

model_three = create_model_three()
model_three.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### Training model three

In [29]:
import time

def train_model_three(model):

    early_stopping = keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=8, restore_best_weights=True)
    reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3)

    start_time = time.time()
    history = model.fit(train_images, train_labels, 
                        epochs=40, 
                        batch_size=256, 
                        validation_split=0.2, 
                        callbacks=[early_stopping, reduce_lr])

    end_time = time.time()
    elapsed_time = end_time - start_time
    minutes, seconds = divmod(elapsed_time, 60)

    print(f"Training stopped at epoch {early_stopping.stopped_epoch + 1}")
    print(f"Training time: {int(minutes)} minutes and {int(seconds)} seconds")
    
    return history

### Testing model three

In [30]:
model_three = create_model_three()   
history_three = train_model_three(model_three) 
test_loss, test_acc = model_three.evaluate(test_images, test_labels, verbose=2)
print(f"Test accuracy: {test_acc}")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 213ms/step - accuracy: 0.7978 - loss: 0.9872 - val_accuracy: 0.1060 - val_loss: 5.2523 - learning_rate: 0.0010
Epoch 2/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 232ms/step - accuracy: 0.9748 - loss: 0.3582 - val_accuracy: 0.1576 - val_loss: 4.4078 - learning_rate: 0.0010
Epoch 3/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 232ms/step - accuracy: 0.9845 - loss: 0.2566 - val_accuracy: 0.8962 - val_loss: 0.4817 - learning_rate: 0.0010
Epoch 4/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 248ms/step - accuracy: 0.9884 - loss: 0.1855 - val_accuracy: 0.9824 - val_loss: 0.1704 - learning_rate: 0.0010
Epoch 5/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 222ms/step - accuracy: 0.9916 - loss: 0.1306 - val_accuracy: 0.9868 - val_loss: 0.1256 - learning_rate: 0.0010
Epoch 6/40
[1m188/188[0m [32m━━━━━━━━━━━━━━━━━━━━[0

### Results
| Trainable parameters | Accuracy | Time taken by training    | Optimizer | Loss   | Epochs             | Batch Size |
|----------------------|----------|---------------------------|-----------|--------|--------------------|------------|
| 161,098             | 99.36%   | 22 minutes and 58 seconds | Adam      | 0.0329 | 34 (stopped early) | 256        |