# keras example - MNIST dataset

This lesson uses the [MNIST digits dataset](course_datasets.md#mnist-digits).  The keras documentation has a similar tutorial [here](https://keras.io/examples/vision/mnist_convnet/).

In [None]:
#  Required installations
# !pip install tensorflow keras

In [None]:
import numpy as np
import keras
from keras import layers
import matplotlib.pyplot as plt

from sklearn.metrics import confusion_matrix

In [None]:
num_classes = 10
input_shape = (28, 28, 1)

The MNIST dataset is an internal dataset to the keras package.  The labels (y) are values between 0 and 9. Load the data into training and test datasets.

In [None]:
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
print(f'X_train.shape: {X_train.shape}, y_train.shape: {y_train.shape}, X_test.shape: {X_test.shape}, y_test.shape: {y_test.shape}')

We can look at the first (of 60,000) training images.  It is a 28 x 28 array of values between 0 (black) and 255 (white)

In [None]:
print(X_train[0])

We can plot the image using matplotlib.

In [None]:
plt.imshow(X_train[0], cmap="gray")

Let's have a look at some more of these images. The next code cell finds the first image of each class (0 through 9) in the training data and plots it with the label above.

In [None]:
fig, ax = plt.subplots(1, num_classes, figsize=(20,20))  

# Loop through 10 elements from train dataset 
for i in range(num_classes): # 0 to 9
  sample = X_train[y_train == i][0] # Get first image from each class
  ax[i].imshow(sample, cmap="gray") # Show sample image
  ax[i].set_title(f"Label:{i}") # Set title as class label

We need to one-hot encode the y values so that each value (digit) becomes a vector of 10 values (9 values of zero and a single 1). For example 2 becomes [0,0,1,0,0,0,0,0,0,0].

One-hot encoding is a technique used in machine learning and data preprocessing to represent categorical data as binary vectors. It is particularly useful when dealing with categorical features or labels in a machine learning model.

In [None]:
for i in range(7):
    print(f"Before : {y_train[i]}")

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

for i in range(7):
    print(f"After : {y_train[i]}")


We normalise the data to ensure values are between 0 and 1.  This ensure that with several variable with different ranges, one variable does not dominate the ML calculation.

In [None]:
X_train.shape, y_train.shape

In [None]:
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

X_train[0] # now has values between 0 and 1 rather than 0 and 255

In [None]:
X_train =np.expand_dims(X_train, -1)
X_test =np.expand_dims(X_test, -1)
print(f'X_train.shape:\n{X_train.shape}\nX_test.shape:\n{X_test.shape}')

Build the model using a Convolutional Neural Network (CNN).

**How it works:**
1. **Conv2D layers** scan the image with sliding filters to detect patterns (like curves, edges, shapes)
2. **MaxPooling2D** shrinks the image to keep only the most important information
3. **Flatten** converts the processed image into a single list of numbers
4. **Dropout** randomly turns off some neurons to prevent the model from memorizing the training data
5. **Dense (output)** makes the final prediction: which digit (0-9) is in the image

**In simple terms:** The model learns to recognize handwritten digits by extracting features from the image and passing them through layers that gradually learn what each digit looks like.

In [None]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()


Fit the model against the training date.  This sets the weights between nodes at the bias for each node to mimise also function between the values and the training y values.  

BATCH_SIZE and epochs control how long the proicess takes - these value are fairly arbitary

In [None]:
BATCH_SIZE = 512
epochs = 11
model.fit(x=X_train, y=y_train, batch_size = BATCH_SIZE, epochs = epochs )

In [None]:
predicted = model.predict(X_test)
print(f'predicted.shape: {predicted.shape}\n first value:\n{predicted[0]}')

An example of how argmax works

In [None]:
first_prediction = predicted[0]
print(f'first_prediction: {first_prediction}')
first_actual = np.argmax(first_prediction)
print(f'first_actual: {first_actual}')

In [None]:
y_actual = np.argmax(y_test, axis=1)
y_pred_classes = np.argmax(predicted, axis=1)
print(f'y_actual shape: {y_actual.shape}, y_pred_classes shape: {y_pred_classes.shape}')

Plot the image, and label with the actual and predicted values of the ith test image.

In [None]:
i = 5
plt.figure(figsize=(4, 4))
plt.imshow(X_test[i].reshape(28, 28), cmap="gray")
plt.title(f"Actual: {y_actual[i]}, Predicted: {y_pred_classes[i]}")
plt.axis('off')

See how well the model does against the training and more importantly the test data

In [None]:
test_loss, test_acc = model.evaluate(X_test, y_test)

print(f"Test Loss: {test_loss}, Test Accuracy: {test_acc}")

END OF TUTORIAL

This shows what the softmax algorithm does - chooses the biggest of the 10 probability values

In [None]:
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1) # pick the highest probability with "np.argmax()", and turn it into an index uing "axis=1"

# print vector of probabilities
print(f"What Softmax predicted: {y_pred}")

# print predicted number
print(f"What Softmax actually means: {y_pred_classes}")

In [None]:
y_test

In [None]:
random_num = np.random.choice(len(X_test))
X_sample = X_test[random_num]

# save true label of this sample in a variable
y_actual = np.argmax(y_test, axis=1)
y_sample_actual = y_actual[random_num]

# save a predicted label of this sample in a variable
y_sample_pred_class = y_pred_classes[random_num]

In [None]:
plt.title(f"random_num {random_num}  Predicted: {y_sample_pred_class}, True:{y_sample_actual}")
plt.imshow(X_sample.reshape(28, 28), cmap="gray");