In [31]:
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np

In [32]:
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Data set for 28x28 grayscale handwritten images

In [33]:
train_images = train_images /255.0
test_images = test_images / 255.0

Since pixels go from 0 - 256.0, dividing by 255.0 will normalize the pixels values to be between 0 and 1

In [34]:
model = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

This builds the Neural Network model.

The first part flattens the 28x28 into a 1D array of 784 pixels.

The second part uses 128 neurons and uses ReLU as an activation. Activation meaning how bright each neuron is 1 meaning that it's activated 0 means it's not. We are comparing edges from our handwritting example to determine if it's considered an "edge."

The last part is the output layer with 10 neurons (0-9 digits) and we activate it with softmax because we want a probability distribution.

In [35]:
model.compile(optimizer='adam',
              loss ='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Adam (Adaptive Moment Estimation) helps accelerate learning by considering past gradients. Also, RMSProp which adjusts the learning rate adaptively based on magnitude of recent gradients. This is used to update the weights and to minimize the loss.

sparse_categorical_crossentropy is a type of loss function that's used for multi-class classification problems. Each input belongs to exactly one class and the labels are provided as integers than vectors. 

    Loss = -log(predicted probability for the true class)
    If true label is 3, model predicts [0.1,0.2,0.1,0.6]
    for [0,1,2,3] the loss is loss = -log(0.6)

In [36]:
print("Model Training")
model.fit(train_images, train_labels, epochs=20)

Model Training
Epoch 1/20
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 1ms/step - accuracy: 0.8765 - loss: 0.4367
Epoch 2/20
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9649 - loss: 0.1191
Epoch 3/20
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 959us/step - accuracy: 0.9762 - loss: 0.0805
Epoch 4/20
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9823 - loss: 0.0592
Epoch 5/20
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.9863 - loss: 0.0444
Epoch 6/20
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.9898 - loss: 0.0349
Epoch 7/20
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9929 - loss: 0.0243
Epoch 8/20
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9939 - loss: 0.0206
Epoch 9

<keras.src.callbacks.history.History at 0x326499040>

Epochs means cycles that the system goes through. For simple models like this, 15-20 is good. 

In [37]:
print("Model Evaluation")
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print(f"Test Accuracy: {test_accuracy:.2f}")

Model Evaluation
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 475us/step - accuracy: 0.9765 - loss: 0.1181
Test Accuracy: 0.98


Basically we're evaluating how accurate this model is. (Should be super accurate since we're using an easy database)

In [38]:
print("Model Predictions")
predictions = model.predict(test_images)
predicted_digit = np.argmax(predictions[0])
print(f"Predicted digit: {predicted_digit}")

Model Predictions
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 352us/step
Predicted digit: 7


Then the model makes predictions on test_images.

In [39]:
model.save('first_model.keras')