# keras example - build a neural net 

This lesson uses the MNIST digits dataset. This is a collection of 70,000 hand-written digits (0-9), with 60,000 for training and 10,000 for testing. Each digit is a 28x28 pixel grayscale image.  The MNIST dataset is a well known dataset in machine learning.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from keras import layers, models
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense

from sklearn.metrics import confusion_matrix

The MNIST dataset is an internal dataset to the keras package.  The labels (y) are values between 0 and 9. Load the data into training and test datasets.

In [None]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [None]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

We can look at the first (of 60,000) training images.  It is a 28 x 28 array of values between 0 (black) and 255 (white)

In [None]:
X_train[0]

We can plot the image using matplotlib.

In [None]:
plt.imshow(X_train[0], cmap="gray")

In [None]:
y_train[y_train == 0] # returns all the zeros in the y_train
y_train[y_train == 1] # returns all the ones in the y_train

Let's have a look at some more of these images. The next code cell finds the first image of each class (0 through 9) in the training data and plots it with the label above.

In [None]:
num_classes = 10

# Create a subplot
fig, ax = plt.subplots(1, num_classes, figsize=(20,20))  

# Loop through 10 classes from train dataset and add labels from test dataset
for i in range(num_classes): # 0 to 9
  sample = X_train[y_train == i][0] # Get first image from each class
  ax[i].imshow(sample, cmap="gray") # Show sample image
  ax[i].set_title(f"Label:{i}") # Set title as class label

We need to one-hot encode the classses so thatwe have a set of 10 0/1 vectors
For example 2 becomes [0,0,1,0,0,0,0,0,0,0].

One-hot encoding is a technique used in machine learning and data preprocessing to represent categorical data as binary vectors. It is particularly useful when dealing with categorical features or labels in a machine learning model.

In [None]:
for i in range(7):
    print(f"Before : {y_train[i]}")

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

for i in range(7):
    print(f"After : {y_train[i]}")


We normalise the data to ensure values are between 0 and 1.  This ensure that with several variable with different ranges, one variable does not dominate the ML calculation.

In [None]:
X_train = X_train/255.0
X_test = X_test/255.0

X_train[0] # now has values between 0 and 1 rather than 0 and 255

In [None]:
X_train.shape

Each image has 28 * 28 = 784values . Flatten the arrays to a single dimension 784 x 1 so that these can match the input layer of the neural net.

In [None]:
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)

In [None]:
X_train.shape, X_test.shape

Build the model.  This has two hideen layers.  The choice of units (nodes) in each layer in arbitary.  The output layer has 10 units corresponding to the 10 classes.

In [None]:
model = Sequential()

model.add(Dense(units=128, input_shape=(784, ), activation="relu"))
model.add(Dense(units=16, activation="relu"))
model.add(Dense(units=10, activation="softmax"))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics="accuracy")
model.summary()

Fit the model against the training date.  This sets the weights between nodes at the bias for each node to mimise also function between the values and the training y values.  

BATCH_SIZE and epochs control how long the proicess takes - these value are fairly arbitary

In [None]:
BATCH_SIZE = 512
epochs = 11
model.fit(x=X_train, y=y_train, batch_size = BATCH_SIZE, epochs = epochs )

See how well the model does against the training and more imporatntly the test data

In [None]:
train_loss, train_acc = model.evaluate(X_train, y_train)
test_loss, test_acc = model.evaluate(X_test, y_test)

# Print results
print(f"Train Loss: {train_loss}, Train Accuracy: {train_acc}")
print(f"Test Loss: {test_loss}, Test Accuracy: {test_acc}")

This shows what the softmax algorithm does - chooses the biggest of the 10 probability values

In [None]:
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1) # pick the highest probability with "np.argmax()", and turn it into an index uing "axis=1"

# print vector of probabilities
print(f"What Softmax predicted: {y_pred}")

# print predicted number
print(f"What Softmax actually means: {y_pred_classes}")

In [None]:
y_test

In [None]:
random_num = np.random.choice(len(X_test))
X_sample = X_test[random_num]

# save true label of this sample in a variable
y_actual = np.argmax(y_test, axis=1)
y_sample_actual = y_actual[random_num]

# save a predicted label of this sample in a variable
y_sample_pred_class = y_pred_classes[random_num]

In [None]:
plt.title(f"random_num {random_num}  Predicted: {y_sample_pred_class}, True:{y_sample_actual}")
plt.imshow(X_sample.reshape(28, 28), cmap="gray");