# MNIST Classifier

<center>
    <img alt="MNIST Handwritten Digits Image" src="https://upload.wikimedia.org/wikipedia/commons/f/f7/MnistExamplesModified.png">
</center>

In this example, we will predict handwritten digits in the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database) using a [multi-layer perceptron (MLP)](https://en.wikipedia.org/wiki/Multilayer_perceptron). This is similar to the tutorial notebook, but with an added comparison with a standard model.

Of course, other neural network architectures such as [convolutional neural networks (CNNs)](https://en.wikipedia.org/wiki/Convolutional_neural_network) are better suited for this task, but for this example we will stick with MLPs.

## Setup

First, let's prepare the imports and set the keras backend.

In [1]:
import os
os.environ["KERAS_BACKEND"] = "tensorflow"

In [2]:
import keras
import numpy as np

2024-06-21 13:45:55.078372: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-21 13:45:55.078666: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-21 13:45:55.081275: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-21 13:45:55.107951: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Define constants relating to the data.

In [3]:
NUM_CLASSES = 10        # 10 distinct classes, 0 to 9
INPUT_SHAPE = (28, 28)  # 28 x 28 greyscale images

Load the data from the `mnist` dataset.

In [4]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

Perform some preprocessing.

In [5]:
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

y_train = keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test = keras.utils.to_categorical(y_test, NUM_CLASSES)

## Defining the Model

As mentioned, we will be using a MLP for the model. However, instead of using `keras`'s default `Dense` layer, we will use `keras_mml`'s `DenseMML` layer (which stands for Dense Matrix-Multiplication-less). `DenseMML` is designed to be a direct replacement for `Dense` layers in fully-connected layers, so we don't have to change the architecture of the model much.

In [6]:
import keras_mml

Define the `Sequential` model.

In [7]:
model = keras.Sequential(
    [
        keras.Input(shape=INPUT_SHAPE),
        keras.layers.Flatten(),
        keras_mml.layers.DenseMML(256),
        keras_mml.layers.DenseMML(256),
        keras_mml.layers.DenseMML(256),
        keras.layers.Dense(NUM_CLASSES, activation="softmax"),  # The last layer needs to be `Dense` for the output to work
    ],
    name="Classifier-MML"
)

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

In [8]:
model.summary()

We can now train the model.

In [9]:
model.fit(x_train, y_train, batch_size=128, epochs=20, validation_split=0.1)

Epoch 1/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 7ms/step - accuracy: 0.6726 - loss: 1.5881 - val_accuracy: 0.8920 - val_loss: 0.4137
Epoch 2/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.8812 - loss: 0.4276 - val_accuracy: 0.9143 - val_loss: 0.3017
Epoch 3/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.9019 - loss: 0.3471 - val_accuracy: 0.9253 - val_loss: 0.2700
Epoch 4/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.9074 - loss: 0.3183 - val_accuracy: 0.9205 - val_loss: 0.2707
Epoch 5/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.9149 - loss: 0.2957 - val_accuracy: 0.9300 - val_loss: 0.2495
Epoch 6/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.9173 - loss: 0.2861 - val_accuracy: 0.9315 - val_loss: 0.2346
Epoch 7/20
[1m422/422[0m 

<keras.src.callbacks.history.History at 0x7f9e738eded0>

Once the model is trained, let's evaluate it.

In [10]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.24721881747245789
Test accuracy: 0.9258999824523926


Not bad!

## A Comparison - An MLP Using Normal `Dense` Layers

Let's compare our model's performance to a model that uses the regular `Dense` layers.

In [11]:
model = keras.Sequential(
    [
        keras.Input(shape=INPUT_SHAPE),
        keras.layers.Flatten(),
        keras.layers.Dense(256),
        keras.layers.Dense(256),
        keras.layers.Dense(256),
        keras.layers.Dense(NUM_CLASSES, activation="softmax"),
    ],
    name="Classifier-Normal"
)

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

In [12]:
model.summary()

We'll train the model using the same `batch_size`, `epochs`, and `validation_split`.

In [13]:
model.fit(x_train, y_train, batch_size=128, epochs=20, validation_split=0.1)

Epoch 1/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.8436 - loss: 0.5161 - val_accuracy: 0.9205 - val_loss: 0.2837
Epoch 2/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9053 - loss: 0.3372 - val_accuracy: 0.9183 - val_loss: 0.2760
Epoch 3/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9090 - loss: 0.3166 - val_accuracy: 0.9277 - val_loss: 0.2631
Epoch 4/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9105 - loss: 0.3160 - val_accuracy: 0.9240 - val_loss: 0.2677
Epoch 5/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9162 - loss: 0.2970 - val_accuracy: 0.9245 - val_loss: 0.2659
Epoch 6/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9172 - loss: 0.2966 - val_accuracy: 0.9305 - val_loss: 0.2568
Epoch 7/20
[1m422/422[0m 

<keras.src.callbacks.history.History at 0x7f9e65f1b310>

Again, we evaluate the model.

In [14]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.30356961488723755
Test accuracy: 0.9171000123023987


Notice that the accuracy of the normal model is quite close to the accuracy of the MML model. With a slight decrease in accuracy, the model itself does not use matrix multiplications at all, and also reduces memory usage.