# Convolutional Neural Networks
You should build an end-to-end machine learning pipeline using a convolutional neural network model. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
- Build an end-to-end machine learning pipeline, including a [convolutional neural network](https://keras.io/examples/vision/mnist_convnet/) model.
- Optimize your pipeline by validating your design decisions.
- Test the best pipeline on the test set and report various [evaluation metrics](https://scikit-learn.org/0.15/modules/model_evaluation.html).  
- Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

### Initialized Libraries

In [49]:
import numpy as np
import keras
from keras import layers
import pandas as pd
from sklearn.model_selection import train_test_split
from keras.layers import BatchNormalization

### Loading data

In [50]:
df = pd.read_csv("https://raw.githubusercontent.com/m-mahdavi/teaching/refs/heads/main/datasets/mnist.csv")

In [51]:
df.head(4)

Unnamed: 0,id,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,31953,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,34452,8,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,60897,5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,36953,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Split dataset

In [52]:
df.drop("id", axis=1, inplace=True)
X = df.drop("class", axis=1).values
y = df["class"].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

### Data Preprocessing

In [53]:
X_train = X_train.astype("float32") / 255
X_test = X_test.astype("float32") / 255
X_train = np.expand_dims(X_train, -1)
X_test = np.expand_dims(X_test, -1)
print("X_train shape:", X_train.shape)
print(X_train.shape[0], "train samples")
print(X_test.shape[0], "test samples")

X_train shape: (3200, 784, 1)
3200 train samples
800 test samples


In [54]:
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0

In [55]:
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

### Building the CNN Model

In [56]:
input_shape = (28, 28, 1)
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        BatchNormalization(),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        BatchNormalization(),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
        BatchNormalization(),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

In [57]:
from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

In [58]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(X_train, y_train, batch_size=batch_size, epochs=150, validation_split=0.1)

Epoch 1/150
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 195ms/step - accuracy: 0.5539 - loss: 1.4667 - val_accuracy: 0.0938 - val_loss: 2.3010
Epoch 2/150
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 148ms/step - accuracy: 0.9291 - loss: 0.2429 - val_accuracy: 0.1375 - val_loss: 2.2980
Epoch 3/150
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 241ms/step - accuracy: 0.9640 - loss: 0.1418 - val_accuracy: 0.1375 - val_loss: 2.2982
Epoch 4/150
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 154ms/step - accuracy: 0.9839 - loss: 0.0814 - val_accuracy: 0.1375 - val_loss: 2.3010
Epoch 5/150
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 226ms/step - accuracy: 0.9921 - loss: 0.0459 - val_accuracy: 0.1375 - val_loss: 2.3062
Epoch 6/150
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 149ms/step - accuracy: 0.9978 - loss: 0.0302 - val_accuracy: 0.1375 - val_loss: 2.3131
Epoch 7/150
[1m23/23

<keras.src.callbacks.history.History at 0x7defc6559010>

In [62]:
score = model.evaluate(X_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.17919975519180298
Test accuracy: 0.9587500095367432


|Model|Changes made|Result|
|---|---|---|
| **Model A** | **Initial model:** 2 Conv2D layers + Dropout(0.5) + Dense(10, activation="softmax") + Epoch(100/100)| **accuracy**: 0.9144, **loss**: 0.2634, **val_accuracy**: 0.9312, **val_loss**: 0.1900 |
| **Model B** |Removed Dropout(0.5) | **accuracy**: 0.9695,  **loss**: 0.1118,  **val_accuracy**: 0.9469,  **val_loss**: 0.1712 |
| **Model C** | Added Third Conv Layer Conv2D(128, (3,3), activation='relu') | **accuracy**: 0.9573, **loss**: 0.1380, **val_accuracy**: 0.9094, **val_loss**: 0.3078                          |
| **Model D**            | Used BatchNormalization | **accuracy**: 1.0000 - **loss**: 3.1927e-05 - **val_accuracy**: 0.9781 - **val_loss**: 0.0861 |
