# Classifcation on hand writen numbers

In this activity we will be using MNIST dataset: 
https://www.tensorflow.org/datasets/catalog/mnist

To develop a classifer of hand writen numbers as shown in the following figure

<div>
<img src="attachment:image.png" width="500"/>
</div>

This activity is modiffied from https://keras.io/examples/vision/mnist_convnet/


# Title: Simple MNIST convnet

# Description: A simple convnet that achieves ~99% test accuracy on MNIST.


# **Setup**

In [1]:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

# **Prepare the data**

**Model / data parameters**



In [2]:
num_classes = 10
input_shape = (28, 28, 1)

**the data, split between train and test sets**

In [3]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

**Scale images to the [0, 1] range**

In [4]:
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

**Make sure images have shape (28, 28, 1)**


In [5]:
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


1 channel always reers to blanck and white (the gray scale which nis either black, a combination or the black and white or white)

**convert class vectors to binary class matrices**


In [6]:
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

In [7]:
y_train

array([[0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.]], shape=(60000, 10))

# Build the model

In [None]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

# Train the model

In [9]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

Epoch 1/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 45ms/step - accuracy: 0.8949 - loss: 0.3525 - val_accuracy: 0.9780 - val_loss: 0.0781
Epoch 2/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 44ms/step - accuracy: 0.9678 - loss: 0.1067 - val_accuracy: 0.9832 - val_loss: 0.0566
Epoch 3/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 56ms/step - accuracy: 0.9746 - loss: 0.0826 - val_accuracy: 0.9872 - val_loss: 0.0454
Epoch 4/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 53ms/step - accuracy: 0.9784 - loss: 0.0695 - val_accuracy: 0.9888 - val_loss: 0.0385
Epoch 5/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 47ms/step - accuracy: 0.9803 - loss: 0.0615 - val_accuracy: 0.9897 - val_loss: 0.0372
Epoch 6/15
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 46ms/step - accuracy: 0.9830 - loss: 0.0546 - val_accuracy: 0.9900 - val_loss: 0.0357
Epoch 7/15
[1m4

<keras.src.callbacks.history.History at 0x217bec28f70>

# Evaluate the trained model

In [10]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.025071822106838226
Test accuracy: 0.9918000102043152


----

HyperParameter Optimization with Keras.Tuner

In [11]:
pip install KerasTuner

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement KerasTuner (from versions: none)
ERROR: No matching distribution found for KerasTuner


In [15]:
x_train.shape

(60000, 28, 28, 1)

In [None]:
def build_model(hp):
    model = keras.Sequential()
    
    model.add(layers.InputLayer(input_shape=(28,28,1)))

    model.add(layers.Conv2D(
        filters=hp.Int('conv1_filters', 32, 128, step=32),
        kernel_size=hp.choice('conv1_kernel', [3,5]),
        activation='relu'
        #,padding='same'
    ))

    model.add(layers.MaxPooling2D(2))

    model.add(layers.Conv2D(
        filters=hp.Int('conv2_filters', 64, 256, step=32),
        kernel_size=hp.choice('conv2_kernel', [3,5]),
        activation='relu'
        #,padding='same'
    ))

 
    if hp.Boolean('add_conv3'):
        model.add(layers.Conv2D(
            filters=hp.Int('conv3_filters', 32, 256, step=32),
            kernel_size=hp.choice('conv3_kernel', [3,3]),
            activation=hp.choice('activation', ['relu', 'tanh', 'leaky_relu'])
        ))
        model.add(layers.MaxPooling2D(2))

        if hp.Boolean('add_conv4'):
            model.add(layers.Conv2D(
                filters=hp.Int('conv4_filters',64, 128,step=32),
                kernel_size = hp.choice('conv4_kernel', [5,5]),
                activation=hp.choice('activation', ['relu', 'tanh', 'leaky_relu'])
                        ))
    model.add(layers.Flatten())


    if hp.Boolean('add_dense'):
        model.add(layers.Dense(
            units=hp.Int('dense_units', 128, 512, step=64),
            activation='relu'
        ))

    model.add(layers.Dense(10, activation='softmax'))

In [None]:
def build_model(hp):
    model = keras.Sequential()
    model.add(layers.InputLayer(input_shape=(28, 28, 1)))
    model.add(layers.Conv2D(
        filters=hp.Int('conv1_filters', 32, 128, step=32),
        kernel_size=hp.Choice('conv1_kernel', [3, 5]),
        activation=hp.Choice('activation1', ['relu', 'tanh', 'leaky_relu']),
        padding='same'
    ))
    model.add(layers.MaxPooling2D(2))
    model.add(layers.Conv2D(
        filters=hp.Int('conv2_filters', 64, 256, step=32),
        kernel_size=hp.Choice('conv2_kernel', [3, 5]),
        activation=hp.Choice('activation2', ['relu', 'tanh', 'leaky_relu']),
        padding='same'
    ))
    model.add(layers.MaxPooling2D(2))
    model.add(layers.Flatten())
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(
        units=hp.Int('dense_units', 128, 512, step=64),
        activation='relu'
    ))
    model.add(layers.Dense(10, activation='softmax'))
    model.compile(
        optimizer=keras.optimizers.Adam(hp.Choice('learning_rate', [1e-2, 1e-3])),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    return model

In [None]:
def build_improved_model(hp):
    model = keras.Sequential()
    
    # === INPUT LAYER ===
    model.add(layers.InputLayer(input_shape=(28, 28, 1)))
    
    # === CONV BLOCK 1 ===
    model.add(layers.Conv2D(
        filters=hp.Int('conv1_filters', 32, 128, step=32),  # 32, 64, 96, 128
        kernel_size=hp.Choice('conv1_kernel', [3, 5]),      # Must be hp.Choice (not hp.choice)
        activation='relu',
        padding='same'  # ← ADDED: Preserves spatial dimensions
    ))
    model.add(layers.MaxPooling2D(2))  # Reduces 28x28 → 14x14
    
    # === CONV BLOCK 2 ===  
    model.add(layers.Conv2D(
        filters=hp.Int('conv2_filters', 64, 256, step=64),  # 64, 128, 192, 256
        kernel_size=hp.Choice('conv2_kernel', [3, 5]),
        activation='relu',
        padding='same'  # ← ADDED
    ))
    model.add(layers.MaxPooling2D(2))  # Reduces 14x14 → 7x7
    
    # === OPTIONAL DEEPER CONV BLOCKS ===
    if hp.Boolean('add_conv3'):
        model.add(layers.Conv2D(
            filters=hp.Int('conv3_filters', 128, 512, step=64),
            kernel_size=hp.Choice('conv3_kernel', [3, 5]),
            activation=hp.Choice('conv3_activation', ['relu', 'tanh', 'leaky_relu']),
            padding='same'
        ))
        # Optional pooling for conv3
        if hp.Boolean('add_pool_after_conv3'):
            model.add(layers.MaxPooling2D(2))  # Reduces 7x7 → 3x3 (careful!)
    
    # === FLATTEN ===
    model.add(layers.Flatten())  # Converts 2D → 1D for dense layers
    
    # === DROPOUT FOR REGULARIZATION ===
    model.add(layers.Dropout(
        rate=hp.Float('dropout_rate', 0.2, 0.5, step=0.1)  # 0.2, 0.3, 0.4, 0.5
    ))
    
    # === OPTIONAL DENSE LAYER ===
    if hp.Boolean('add_dense'):  # ← FIXED NAME from 'add_conv5'
        model.add(layers.Dense(
            units=hp.Int('dense_units', 128, 512, step=64),
            activation='relu'
        ))
        # Additional dropout for dense layer
        model.add(layers.Dropout(
            rate=hp.Float('dense_dropout', 0.1, 0.4, step=0.1)
        ))
    
    # === OUTPUT LAYER ===
    model.add(layers.Dense(10, activation='softmax'))
    
    # === COMPILE WITH TUNABLE LEARNING RATE ===
    model.compile(
        optimizer=keras.optimizers.Adam(
            hp.Float('learning_rate', 1e-4, 1e-2, sampling='log')
        ),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    return model

In [18]:
from sklearn.model_selection import train_test_split

x_train_final, X_val, y_train_final, y_val =train_test_split(x_train, y_train,test_size=0.1, random_state=2, stratify=y_train)



#loop through a lits for epoch and batchsize

model.fit(x_train_final, y_train_final,
          validation_data=(X_val, y_val),
          epochs=100,
          batch_size=32)

Epoch 1/100
[1m5400/5400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m45s[0m 8ms/step - accuracy: 0.9816 - loss: 0.0591 - val_accuracy: 0.9883 - val_loss: 0.0329
Epoch 2/100
[1m5400/5400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 7ms/step - accuracy: 0.9846 - loss: 0.0486 - val_accuracy: 0.9907 - val_loss: 0.0294
Epoch 3/100
[1m5400/5400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m45s[0m 8ms/step - accuracy: 0.9859 - loss: 0.0456 - val_accuracy: 0.9903 - val_loss: 0.0295
Epoch 4/100
[1m5400/5400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m80s[0m 8ms/step - accuracy: 0.9871 - loss: 0.0408 - val_accuracy: 0.9915 - val_loss: 0.0274
Epoch 5/100
[1m5400/5400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 12ms/step - accuracy: 0.9879 - loss: 0.0372 - val_accuracy: 0.9908 - val_loss: 0.0291
Epoch 6/100
[1m5400/5400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 10ms/step - accuracy: 0.9887 - loss: 0.0364 - val_accuracy: 0.9908 - val_loss: 0.0292
Ep

KeyboardInterrupt: 

In [None]:
#OR

#from sklearn.utils import shuffle
#x_train, y_train = shuffle(x_train, y_train, random_state=2)

#model.fit(x_train, y_train, validation_split=0.1, epochs=15)

#suflle is necessary here if the vlaidation-split will be used instead of train-test-split to ensure the classes are well shiffled and the val datset isnt the LAST 10 samples. 
#But random 10 samples


----

### Why does `input_shape` have (28, 28, 1)?

- **28, 28:**  
  The MNIST dataset images are **28 pixels wide** and **28 pixels tall**.  
  Each image is a small square (28x28 grid).

- **1:**  
  The last dimension is the **number of channels**.  
  For MNIST, images are **grayscale** (not color), so there is **1 channel**.

**Summary:**  
`input_shape = (28, 28, 1)` means each input image is 28x28 pixels, with 1 grayscale channel.

---

### Why divide by 255?

- **Pixel values in MNIST:**  
  Each pixel is an integer between **0** (black) and **255** (white).
- **Dividing by 255:**  
  Converts pixel values to the range **[0, 1]** (float), which helps neural networks train better and faster.

**Example:**  
A pixel value of 128 becomes `128 / 255 ≈ 0.502`.

---

### What does grayscale mean?

- **Grayscale image:**  
  Each pixel represents only **brightness** (no color).
  - 0 = black
  - 255 = white
  - Values in between = shades of gray

- **Color image:**  
  Has **3 channels** (Red, Green, Blue), e.g. `(28, 28, 3)`.

---

### What does "1 channel" mean?

- **Channel:**  
  Represents a layer of information for each pixel.
- **1 channel:**  
  Only brightness (grayscale).
- **3 channels:**  
  RGB color (each pixel has 3 values).

**In MNIST:**  
- Each image is `(28, 28, 1)` → 28x28 pixels, 1 value per pixel (grayscale).

---

**Summary Table:**

| Shape           | Meaning                        |
|-----------------|-------------------------------|
| (28, 28, 1)     | 28x28 pixels, grayscale image |
| (28, 28, 3)     | 28x28 pixels, RGB color image |

Let me know if you want a visual example or code!