<a href="https://colab.research.google.com/github/archita924/CSA522_ML/blob/master/examples/vision/ipynb/mnist_convnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup


Every neural network is made up of layers — each performs a specific job:
•	Conv2D: Extracts image features
•	MaxPooling2D: Reduces size
•	Flatten: Converts 2D to 1D
•	Dense: Makes final decisions (classification)




In [34]:
import numpy as np   #  numerical computation eg.Handle image pixels ,labels, and features
import keras        # build train,test neural networks
from keras import layers  # building blocks of neural net i.e. Conv2D ,MaxPooling,Flatten,Dense

## Prepare the data

" why num_classes is 10?"
🧠 Explain:
Because the MNIST dataset has 10 classes — digits 0 to 9.
Each image is a handwritten number.
"And what about (28, 28, 1) — why do we have that extra 1?"
💡 Answer:
•	28 x 28 = image size (pixels).
•	1 = grayscale channel (since MNIST images are black & white).
If it were a color image, it would be (28, 28, 3) for RGB.


In [35]:
num_classes = 10
input_shape = (28, 28, 1)

Keras automatically downloads and loads the MNIST dataset for you!
It gives you:
•	x_train: training images
•	y_train: correct digit labels for training
•	x_test and y_test: for testing the model later


In [36]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

"Why do we divide by 255?"

🧠 Explain:
Each pixel’s intensity ranges from 0 to 255.
Dividing by 255 converts it to 0–1, making it easier for the neural network to learn.
(It trains faster and avoids large number errors.)

"Think of it like normalizing marks from 0–100 to 0–1 — easier to compare!"


In [37]:
# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") /255

**Adding the Channel Dimension **


"Why are we adding one more dimension?"
🧠 Explain:

Originally, x_train shape is (60000, 28, 28) → just height and width.
But CNNs expect input like (height, width, channels).

So we add that 1 channel for grayscale using np.expand_dims.


In [38]:
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

**Checking Data Shapes**

In [39]:
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


**Converting Labels to One-Hot Encoding**

Neural networks work better when each class is represented as a vector, not just a number.
Example:
Digit 3 → [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
Digit 7 → [0, 0, 0, 0, 0, 0, 0, 1, 0, 0]


This is called One-Hot Encoding 🔥

It helps the network treat all classes equally instead of “closer” numbers (like 8 being near 9).


In [40]:
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

## Build the model

Step 1:


model = keras.Sequential([...])

That’s exactly what Sequential means!

Each layer is stacked in order, and the output of one becomes the input of the next.

Step 2:


keras.Input(shape=input_shape)
🧠 Explain:

This defines the shape of each input image — (28, 28, 1) = height, width, and grayscale channel.

💬 Analogy:
“This is like telling the model —

‘Hey, every image you’ll see is 28x28 pixels and black & white!’”




🎨 Step 3: First Convolutional Layer
layers.Conv2D(32, kernel_size=(3, 3), activation="relu")

🧠 Explain :

"Think of this layer as 32 small scanners (filters) sliding over the image — each trying to detect different patterns like edges, curves, or corners."

•	32 → number of filters (features the model will learn)

•	(3, 3) → size of each filter (like a small 3x3 window)

•	ReLU → removes negative values → keeps only useful signals

💬 Analogy:
“It’s like shining 32 tiny flashlights on different parts of the image to detect unique features.”


🌀 Step 4: First Pooling Layer
layers.MaxPooling2D(pool_size=(2, 2))


🧠 Explain:
This layer shrinks the image while keeping the important parts.

•	Takes a 2×2 patch → picks the maximum value

•	Reduces computation and helps the model focus on key patterns


💬 Analogy:
“Think of it like zooming out of a photo — you lose some detail, but you still recognize what’s important.”

🎨 Step 5: Second Convolutional Layer
layers.Conv2D(64, kernel_size=(3, 3), activation="relu")

🧠 Explain:
Now the model learns more complex patterns using 64 filters.

After the first layer learned simple edges, this one can detect shapes, loops, or digit structures.

💬 Analogy:
“The model is now learning to recognize numbers, not just lines — like a student going from alphabets to words.”



🌀 Step 6: Second Pooling Layer
layers.MaxPooling2D(pool_size=(2, 2))


🧠 Explain:
Again reduces the size, keeping only essential patterns.


Now the image is small, but contains deep, meaningful information.


🧾 Step 7: Flatten Layer
layers.Flatten()


🧠 Explain:
This takes the 2D feature maps and flattens them into a 1D vector.



💧 Step 8: Dropout Layer
layers.Dropout(0.5)


🧠 Explain:
This randomly “turns off” 50% of neurons during training to prevent overfitting.


💬 Ask students:
“Why would we want to drop neurons?”


✅ To make sure the model doesn’t memorize the training data and can generalize better



🧠 Step 9: Output Layer
layers.Dense(num_classes, activation="softmax")


🧠 Explain:
•	Dense = fully connected layer (every neuron connects to every output)


•	num_classes = 10 (digits 0–9)

•	Softmax → converts outputs into probabilities (like: 80% chance of being “3”, 15% chance of being “5”, etc.)


In [41]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

**Train the model**




batch_size = 128
epochs = 15


•	Batch size (128):
The model doesn’t look at all 60,000 images at once (that’s too heavy!).

Instead, it studies 128 images at a time, learns from them, updates weights, and repeats.

💬 Analogy:
“Think of it like studying in small groups instead of the entire class at once.”


•	Epochs (15):
One epoch = the model has seen all training images once.


So, with 15 epochs, it studies the dataset 15 times, improving its understanding each round.


💬 Analogy:
“Like rereading your notes 15 times — you understand better with every pass!”

“Why do we need to compile before training?”

🧠 Explain:
This tells the model how to learn — like giving instructions before starting a class.

•	loss="categorical_crossentropy"

→ This measures how wrong the model’s predictions are (for multi-class classification).
The model tries to minimize this loss.



•	optimizer="adam"

→ Adam is a smart algorithm that updates weights automatically and efficiently.

It helps the model converge (learn fast and accurately).

💬 Analogy:
“Adam is like an intelligent coach — it adjusts your learning rate dynamically.”

•	metrics=["accuracy"]
→ We track accuracy during training — how many predictions are correct.



“What do you think happens when we call fit()?”


🧠 Explain:
This is where the real training happens.


•	The model takes input images, predicts outputs, compares them with the correct labels (y_train),
and updates itself to reduce the loss — over and over again.


•	validation_split=0.1 means:
10% of training data is kept aside for validation (to check how well the model generalizes while learning).


In [42]:
batch_size = 128
epochs = 15
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Epoch 1/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 12ms/step - accuracy: 0.7608 - loss: 0.7656 - val_accuracy: 0.9750 - val_loss: 0.0881
Epoch 2/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9622 - loss: 0.1236 - val_accuracy: 0.9843 - val_loss: 0.0567
Epoch 3/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9718 - loss: 0.0914 - val_accuracy: 0.9870 - val_loss: 0.0449
Epoch 4/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9782 - loss: 0.0699 - val_accuracy: 0.9875 - val_loss: 0.0416
Epoch 5/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9790 - loss: 0.0664 - val_accuracy: 0.9880 - val_loss: 0.0423
Epoch 6/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9829 - loss: 0.0545 - val_accuracy: 0.9897 - val_loss: 0.0365
Epoch 7/15
[1m422/422[0m 

<keras.src.callbacks.history.History at 0x79fff05df680>

## Evaluate the trained model

*  “So, we trained the model on training data — but how do we know if it really understands digits and isn’t just memorizing them?”


🧠 Explain:
That’s exactly what model.evaluate() does.
It checks how well the model performs on test data — data it has never seen before.


•	score[0] → Test loss
Measures how much error the model still makes on unseen data.


➤ Lower = better
•	score[1] → Test accuracy
Tells what percentage of images the model classified correctly.


➤ Closer to 1 (or 100%) = better



In [43]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.023872407153248787
Test accuracy: 0.9922999739646912
