 Building a CNN for MNIST Handwritten Digit Classification

## Introduction

Welcome! In this assignment, you will build a Convolutional Neural Network (CNN) to classify handwritten digits from the famous MNIST dataset. This dataset is a classic in the field of computer vision and provides a great starting point for understanding image classification with deep learning.

This notebook is structured to guide you step-by-step through the process. You will load the data, preprocess it, define a CNN model, train it, and evaluate its performance.  Throughout the assignment, you will have opportunities to experiment and deepen your understanding of the concepts.

Remember to:

*   **Read all instructions carefully.**
*   **Execute the code cells in order.**
*   **Fill in the missing code sections marked as "Students: Fill in the blanks".**
*   **Answer the reflection questions in the designated Markdown cell.**
*   **Experiment and explore!**  Change parameters, layers, and observe the effects.

Let's get started and build our MNIST digit classifier!

## Section 1: Setting Up - Imports

Before we dive into building our CNN, we need to import the necessary libraries.  These libraries provide pre-built tools and functions that will make our work much easier.

**Instructions:**

1.  **Carefully review the code cell below.** It imports libraries from TensorFlow and Keras, which are powerful frameworks for building and training neural networks.
2.  **Execute the code cell by selecting it and pressing [Shift + Enter] (or the "Run" button).**
3.  **Ensure there are no error messages after running the cell.** If you encounter errors, double-check that you have TensorFlow and Keras installed in your environment.

In [1]:
# Cell 1: Imports and Setup
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import Callback, EarlyStopping, ModelCheckpoint, TensorBoard
import time
import datetime

**Explanation of Imports:**

*   **`tensorflow as tf` and `keras`:** TensorFlow is the main deep learning framework, and Keras is its high-level API that simplifies building and training models. We import TensorFlow as `tf` and Keras directly for easy access to their functionalities.
*   **`from tensorflow.keras import layers`:**  This imports the `layers` module from Keras, which provides various layers for building neural networks (like convolutional layers, dense layers, etc.).
*   **`from tensorflow.keras.datasets import mnist`:**  This imports the MNIST dataset directly from Keras datasets.  This is very convenient for loading and using the MNIST data.
*   **`from tensorflow.keras.utils import to_categorical`:**  This imports the `to_categorical` function, which we will use to perform one-hot encoding of our labels.

## Section 2: Data Loading and Preprocessing

In this section, we will load the MNIST dataset and prepare it for training our CNN model.  Preprocessing steps are crucial to ensure our data is in the right format for the model to learn effectively.

**Instructions:**

1.  **Read through the code in the cell below.**  Understand how it loads the MNIST dataset and what preprocessing steps are applied.
2.  **Execute the code cell.**
3.  **Examine the comments in the code** to understand each preprocessing step in detail.

In [2]:
## Cell 2: Data Loading and Preprocessing
# -------------------------------
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize the pixel values to the range [0, 1]
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Reshape the data to include the channel dimension (28x28x1)
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# One-hot encode the labels
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step


**Explanation of Data Preprocessing:**

*   **Loading the MNIST dataset:** `mnist.load_data()` loads the MNIST dataset, which is already split into training and testing sets (`(x_train, y_train), (x_test, y_test)`). `x_train` and `x_test` contain the images (pixel data), and `y_train` and `y_test` contain the corresponding labels (digits 0-9).
*   **Normalization:** `x_train = x_train.astype("float32") / 255.0` and `x_test = x_test.astype("float32") / 255.0` normalize the pixel values.  Pixel values in images are typically in the range 0-255. Dividing by 255 scales them to the range 0-1. This normalization helps the neural network train faster and more effectively.
*   **Adding Channel Dimension:** `x_train = x_train.reshape(-1, 28, 28, 1)` and `x_test = x_test.reshape(-1, 28, 28, 1)` reshape the data to add a channel dimension.  Even though MNIST images are grayscale (single channel), CNNs in Keras expect input data to have a channel dimension.  We reshape from `(number_of_images, 28, 28)` to `(number_of_images, 28, 28, 1)`. The `-1` in `reshape` means "infer the dimension based on the size of the array."
*   **One-Hot Encoding:** `y_train = to_categorical(y_train, num_classes)` and `y_test = to_categorical(y_test, num_classes)` perform one-hot encoding on the labels.  Instead of representing the digit '3' as a single number, one-hot encoding converts it into a vector `[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]`, where the 4th position (index 3) is 'hot' (value 1), and all other positions are 'cold' (value 0). This is a standard way to represent categorical labels for neural networks in multi-class classification problems. `num_classes = 10` specifies that we have 10 classes (digits 0-9).

## Section 3: Model Definition - Building the CNN

Now we will define the architecture of our Convolutional Neural Network (CNN).  You will be building a sequential model using Keras layers.

**Instructions:**

1.  **Carefully examine the code in the cell below.** Notice the structure of the `keras.Sequential` model.
2.  **Fill in the missing parts** marked with `# Students: Fill in the blanks` to complete the model definition.
3.  **Experiment!** You are encouraged to try different configurations for the layers, such as changing the number of filters in the convolutional layers, or adding more layers.

In [3]:
## Cell 3: Data Augmentation
# -------------------------------
# Set up data augmentation to expand the training dataset
datagen = keras.preprocessing.image.ImageDataGenerator(
    rotation_range=10,         # Rotate images up to 10 degrees
    zoom_range=0.1,            # Zoom images up to 10%
    width_shift_range=0.1,     # Shift images horizontally by 10%
    height_shift_range=0.1     # Shift images vertically by 10%
)
datagen.fit(x_train)

# Model Definition

model = keras.Sequential([
    keras.Input(shape=(28, 28, 1)),  # Input layer for 28x28 grayscale images

    # First Convolutional Block
    layers.Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D(pool_size=(2, 2)),

    # Second Convolutional Block
    layers.Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D(pool_size=(2, 2)),

    # Third Convolutional Block
    layers.Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D(pool_size=(2, 2)),

    # Flatten the output and add Dense layers
    layers.Flatten(),
    layers.Dropout(0.5),
    layers.Dense(128, activation='relu'),
    layers.Dense(num_classes, activation='softmax')
])

**Explanation of Layers:**

*   **`keras.Input(shape=(28, 28, 1))`:** This is the input layer of our model. It specifies the shape of the input images, which are 28x28 pixels with 1 channel (grayscale).
*   **`layers.Conv2D(32, kernel_size=(3, 3), activation="relu")`:** This is a 2D Convolutional layer.
    *   `32`: This is the number of filters (also called kernels). Each filter learns to detect specific features in the input image.
    *   `kernel_size=(3, 3)`: This defines the size of the convolutional filter as 3x3 pixels.
    *   `activation="relu"`:  ReLU (Rectified Linear Unit) is the activation function. It introduces non-linearity into the model, allowing it to learn complex patterns.
*   **`layers.MaxPooling2D(pool_size=(2, 2))`:** This is a Max Pooling layer.
    *   `pool_size=(2, 2)`:  It reduces the spatial dimensions of the feature maps by taking the maximum value within each 2x2 window. This helps to reduce the number of parameters, control overfitting, and make the model more robust to small shifts and distortions in the input.
*   **`layers.Flatten()`:** This layer flattens the 2D feature maps from the convolutional and pooling layers into a 1D vector. This is necessary to connect the convolutional part of the network to the fully connected (Dense) layers.
*   **`layers.Dropout(0.5)`:** This is a Dropout layer.
    *   `0.5`: This sets the dropout rate to 50%. During training, this layer randomly sets 50% of the input units to 0 at each update. This is a regularization technique that helps to prevent overfitting.
*   **`layers.Dense(num_classes, activation="softmax")`:** This is the output Dense (fully connected) layer.
    *   `num_classes`:  This is set to 10 because we have 10 classes (digits 0-9).
    *   `activation="softmax"`: Softmax activation ensures that the output values are probabilities, and they sum up to 1 across all classes.  The output will be a vector of 10 probabilities, where each probability represents the model's confidence that the input image belongs to that specific digit class.

## Section 4: Model Compilation - Choosing Loss and Optimizer

Before we can train our model, we need to compile it.  Compilation involves choosing an optimizer, a loss function, and metrics to evaluate the model's performance.

**Instructions:**

1.  **Examine the code cell below.** You need to fill in the blanks for the `loss` and `optimizer` parameters in `model.compile()`.
2.  **Choose an appropriate loss function and optimizer** for this multi-class classification problem.
3.  **In the Markdown cell after the code, explain your choices.** Why are these choices suitable for this task?

In [4]:
# Compile the model with appropriate loss function and optimizer
model.compile(
    loss="categorical_crossentropy",
    optimizer="adam",
    metrics=["accuracy"]
)

**Explanation of Choices (To be filled by students in the reflection section):**

*   **Loss Function:** You need to choose a loss function that is appropriate for multi-class classification. Think about what kind of error we are trying to minimize when classifying digits into 10 categories.
*   **Optimizer:** You need to choose an optimizer that will efficiently update the model's weights to minimize the loss function.  Consider common optimizers used in deep learning.
*   **Metrics:** We are using "accuracy" as a metric to evaluate the model's performance. Accuracy is a common metric for classification tasks, representing the percentage of correctly classified images.

## Section 5: Model Training - Fitting the Model to the Data

Now it's time to train our CNN model using the training data. Training involves feeding the training data to the model and adjusting its weights to minimize the loss function.

**Instructions:**

1.  **Examine the code cell below.** You need to fill in the blanks for `batch_size` and `epochs` in `model.fit()`.
2.  **Choose appropriate values for `batch_size` and `epochs`.**
3.  **Run the code cell to start training.** Observe the training progress, especially the loss and accuracy on both the training and validation sets.
4.  **Experiment!** Change the `batch_size` and `epochs` and see how it affects the training process and the final performance.

In [5]:
# Cell 5: Model Training

# Callback Setup

# Custom callback to log the duration of each epoch
class TimeHistory(Callback):
    def on_epoch_begin(self, epoch, logs=None):
        self.epoch_start_time = time.time()
    def on_epoch_end(self, epoch, logs=None):
        elapsed_time = time.time() - self.epoch_start_time
        print(f"Epoch {epoch + 1} took {elapsed_time:.2f} seconds")

# Early stopping callback to prevent overfitting by stopping training when the validation loss stops improving
early_stop = EarlyStopping(monitor='val_loss', patience=3, verbose=1)

# Model checkpoint callback to save the best model based on validation loss
checkpoint = ModelCheckpoint('best_model.h5', monitor='val_loss', save_best_only=True, verbose=1)

# TensorBoard callback for monitoring training metrics and visualizing the model architecture
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)

# Combine all callbacks into a list
callbacks = [TimeHistory(), early_stop, checkpoint, tensorboard_callback]

# Model Training

# Define training parameters
batch_size = 64
epochs = 15

# Train the model using the augmented data generator
history = model.fit(
    datagen.flow(x_train, y_train, batch_size=batch_size),
    steps_per_epoch=x_train.shape[0] // batch_size,
    epochs=epochs,
    validation_data=(x_test, y_test),
    callbacks=callbacks
)

  self._warn_if_super_not_called()


Epoch 1/15
[1m935/937[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 26ms/step - accuracy: 0.7984 - loss: 0.6131Epoch 1 took 32.88 seconds

Epoch 1: val_loss improved from inf to 0.03365, saving model to best_model.h5




[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 29ms/step - accuracy: 0.7987 - loss: 0.6121 - val_accuracy: 0.9883 - val_loss: 0.0336
Epoch 2/15
[1m  1/937[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m6s[0m 7ms/step - accuracy: 0.9531 - loss: 0.1795



Epoch 2 took 0.66 seconds

Epoch 2: val_loss did not improve from 0.03365
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 825us/step - accuracy: 0.9531 - loss: 0.1795 - val_accuracy: 0.9881 - val_loss: 0.0346
Epoch 3/15
[1m935/937[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 23ms/step - accuracy: 0.9667 - loss: 0.1064Epoch 3 took 23.16 seconds

Epoch 3: val_loss improved from 0.03365 to 0.02443, saving model to best_model.h5




[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 25ms/step - accuracy: 0.9667 - loss: 0.1063 - val_accuracy: 0.9910 - val_loss: 0.0244
Epoch 4/15
[1m  1/937[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m7s[0m 8ms/step - accuracy: 0.9531 - loss: 0.1497Epoch 4 took 1.30 seconds

Epoch 4: val_loss improved from 0.02443 to 0.02295, saving model to best_model.h5




[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9531 - loss: 0.1497 - val_accuracy: 0.9920 - val_loss: 0.0230
Epoch 5/15
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step - accuracy: 0.9753 - loss: 0.0794Epoch 5 took 38.00 seconds

Epoch 5: val_loss did not improve from 0.02295
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 23ms/step - accuracy: 0.9753 - loss: 0.0794 - val_accuracy: 0.9903 - val_loss: 0.0286
Epoch 6/15
[1m  1/937[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m9s[0m 10ms/step - accuracy: 0.9844 - loss: 0.0472Epoch 6 took 1.30 seconds

Epoch 6: val_loss did not improve from 0.02295
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9844 - loss: 0.0472 - val_accuracy: 0.9909 - val_loss: 0.0273
Epoch 7/15
[1m935/937[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 23ms/step - a



[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 24ms/step - accuracy: 0.9796 - loss: 0.0658 - val_accuracy: 0.9931 - val_loss: 0.0218
Epoch 8/15
[1m  1/937[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m13s[0m 14ms/step - accuracy: 1.0000 - loss: 0.0073Epoch 8 took 0.73 seconds

Epoch 8: val_loss did not improve from 0.02181
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 906us/step - accuracy: 1.0000 - loss: 0.0073 - val_accuracy: 0.9929 - val_loss: 0.0220
Epoch 9/15
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step - accuracy: 0.9819 - loss: 0.0572Epoch 9 took 40.10 seconds

Epoch 9: val_loss improved from 0.02181 to 0.02014, saving model to best_model.h5




[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 24ms/step - accuracy: 0.9819 - loss: 0.0572 - val_accuracy: 0.9929 - val_loss: 0.0201
Epoch 10/15
[1m  1/937[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m7s[0m 8ms/step - accuracy: 0.9844 - loss: 0.0330Epoch 10 took 1.30 seconds

Epoch 10: val_loss improved from 0.02014 to 0.02009, saving model to best_model.h5




[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9844 - loss: 0.0330 - val_accuracy: 0.9926 - val_loss: 0.0201
Epoch 11/15
[1m936/937[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 23ms/step - accuracy: 0.9824 - loss: 0.0557Epoch 11 took 22.54 seconds

Epoch 11: val_loss improved from 0.02009 to 0.01936, saving model to best_model.h5




[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 24ms/step - accuracy: 0.9824 - loss: 0.0557 - val_accuracy: 0.9934 - val_loss: 0.0194
Epoch 12/15
[1m  1/937[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m7s[0m 8ms/step - accuracy: 1.0000 - loss: 0.0033Epoch 12 took 1.30 seconds

Epoch 12: val_loss did not improve from 0.01936
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 1.0000 - loss: 0.0033 - val_accuracy: 0.9935 - val_loss: 0.0197
Epoch 13/15
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step - accuracy: 0.9856 - loss: 0.0480Epoch 13 took 22.71 seconds

Epoch 13: val_loss did not improve from 0.01936
[1m937/937[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 24ms/step - accuracy: 0.9856 - loss: 0.0480 - val_accuracy: 0.9929 - val_loss: 0.0209
Epoch 14/15
[1m  1/937[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m7s[0m 8ms/step - ac

**Explanation of Training Parameters:**

*   **`batch_size`:** This determines the number of training samples processed in each mini-batch during training. A larger batch size can speed up training but might require more memory. A smaller batch size can lead to more noisy updates but might generalize better.
*   **`epochs`:**  One epoch represents one complete pass through the entire training dataset.  More epochs can potentially lead to better training but also increase the risk of overfitting, where the model learns the training data too well and performs poorly on unseen data.
*   **`validation_split=0.1`:**  This reserves 10% of the training data as a validation set. During training, the model's performance is evaluated on this validation set after each epoch. This helps to monitor for overfitting and tune hyperparameters.

## Section 6: Model Evaluation - Assessing Performance on Test Data

After training, we need to evaluate our model's performance on the test dataset.  This gives us an estimate of how well the model generalizes to unseen data.

**Instructions:**

1.  **Run the code cell below.**
2.  **Observe the output.**  It will print the test loss and test accuracy.
3.  **Think about the results.** Is the test accuracy satisfactory?  How does it compare to the training and validation accuracy you observed during training?

In [6]:
# Cell 6: Model Evaluation
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Test loss: {test_loss:.4f}")
print(f"Test accuracy: {test_accuracy:.4f}")

Test loss: 0.0200
Test accuracy: 0.9936


## Section 7: Reflection and Answers to Questions
This is an important section! Take some time to reflect on what you have learned and answer the following questions in detail. Your thoughtful answers will demonstrate your understanding of the concepts covered in this assignment.

**Reflection Questions:**

1.  **Conv2D Layer:** What is the role of the Conv2D layer? How do the `kernel_size` and the number of filters affect the learning process? *Hint: Experiment by changing these values in Cell 3.*

- The Conv2D layer is my network’s feature extractor. It scans images using filters to detect edges, textures, and patterns. The kernel_size—for example, 3×3 versus 5×5—determines how much of the image each filter covers; smaller kernels capture fine details, while larger ones pick up more context. Increasing the number of filters (from 32 to 64 to 128) lets the model learn more features, speeding up convergence, though with MNIST the jump from 64 to 128 only gave marginal gains.

2.  **MaxPooling2D Layer:** What is the purpose of the MaxPooling2D layer? How does it contribute to the model's performance?  *Hint:  Try removing or adding a MaxPooling2D layer and see what happens.*

- MaxPooling2D downsamples the feature maps by taking the maximum value in each window (e.g., 2×2). This reduction decreases computation and makes the features less sensitive to small shifts in the image. In my experiments, adjusting the pooling layers changed the training speed and helped control overfitting by focusing on the most important features.

3.  **One-Hot Encoding:** Why do we use one-hot encoding for the labels?

- One-hot encoding converts the labels (digits 0–9) into a binary vector format, so each label becomes a vector with a single 1 and the rest 0s. This is essential for the softmax output in the final Dense layer, ensuring that every class is treated independently.


4.  **Flatten Layer:** Why do we need the Flatten layer before the Dense layer?

- The Flatten layer converts the multi-dimensional output of the Conv2D and pooling layers into a one-dimensional vector, which is necessary before feeding the data into the Dense layers for classification.

5.  **Optimizer and Loss Function:** What optimizer and loss function did you choose in Cell 4? Explain your choices.  Why is categorical cross-entropy a suitable loss function for this task?  Why is Adam a good choice of optimiser?

- I used categorical_crossentropy as the loss function and Adam as the optimizer. Categorical crossentropy works perfectly for multi-class tasks like MNIST, while Adam adapts the learning rate during training, leading to faster and more efficient convergence.

6.  **Batch Size and Epochs:** How did you choose the batch size and number of epochs in Cell 5? What are the effects of changing these parameters?  *Hint:  Experiment!*

- I set the batch size to 64 and trained for 15 epochs. A batch size of 64 strikes a good balance between fast training and stable updates, while 15 epochs were enough to let the model converge without overfitting. Experimenting with different sizes showed how these parameters can affect both training speed and generalization.

7.  **Dropout:**  Why is the Dropout layer included in the model?

- The Dropout layer randomly disables a percentage of neurons during training. This prevents overfitting by forcing the network to learn more robust, redundant features that generalize better to new data.

8.  **Model Architecture:**  Describe the overall architecture of your CNN. How many convolutional layers did you use?  How many max pooling layers?  What is the final dense layer doing?

My CNN architecture consists of:

- Convolutional Layers: Three Conv2D layers (with configurations using 32, 64, or 128 filters in various experiments) for progressively extracting complex features.
- Pooling Layers: Three MaxPooling2D layers to reduce spatial dimensions.
- Flatten Layer: To convert 2D feature maps into a 1D vector.
- Dropout Layer: To mitigate overfitting.
- Dense Layers: A final Dense layer with softmax activation that outputs the probability for each of the 10 digit classes.

9.  **Performance:** What accuracy did you achieve on the test set?  Are you happy with the result? Why or why not?  If you're not happy, what could you try to improve the performance?

-Overall, I achieved around 98%–99% accuracy on the test set, which is solid for MNIST. This performance shows the model generalizes well. For further improvements, I might explore additional data augmentation, adjust learning rates, or try more advanced regularization techniques to push the accuracy even higher.

**Tips and Explanations:**

*   **Normalization:**  Dividing the pixel values by 255 normalizes them to the range [0, 1]. This is important for training neural networks.

*   **Reshaping:**  The `reshape` operation adds a channel dimension to the images.  For grayscale images, the channel dimension is 1.

*   **One-Hot Encoding:** `to_categorical` converts the class labels (0-9) into one-hot encoded vectors.

*   **Conv2D Parameters:** The `kernel_size` determines the size of the convolutional filter (e.g., 3x3). The number of filters determines how many different features are learned.

*   **MaxPooling2D Parameters:** The `pool_size` determines the size of the pooling window (e.g., 2x2).

*   **Optimizer:** The optimizer is the algorithm used to update the model's weights during training.

*   **Loss Function:** The loss function measures the error between the model's predictions and the true labels.

*   **Batch Size:** The batch size is the number of samples processed in each training iteration.

*   **Epochs:** An epoch is one complete pass through the entire training dataset.

*   **Dropout:** Dropout is a regularization technique that helps prevent overfitting.

Remember to run each cell to see its output.  Experiment with the code and try to understand how different parameters affect the model's performance.  Good luck!
"""

# Conclusion and Submission
 Congratulations on completing this notebook assignment! You have successfully built and trained a Convolutional Neural
 Network to classify handwritten digits from the MNIST dataset. You've explored key concepts like convolutional layers, pooling layers, activation functions, optimizers, loss functions, and training procedures. To further solidify your understanding, consider the following:
*   **Review your notebook:** Go back through each section, reread the explanations, and make sure you understand the code and the concepts.
*   **Experiment further:** Try different CNN architectures, add more layers, change hyperparameters, and see how it affects the performance. Explore other optimizers or loss functions.
*   **Reflect on your learning:**  Think about the challenges you faced and how you overcame them. What were the most important takeaways for you from this assignment?

**Submission Instructions**

To submit your assignment:

1.  **Save your notebook:** Ensure all your work, including code cells, outputs, and answers to reflection questions, is saved in the notebook.
2.  **Print the notebook as a `.pdf` file** and submit it to Canvas.

**Deadline:** February, 12th