# MNIST Classifier

<center>
    <img alt="MNIST Handwritten Digits Image" src="https://upload.wikimedia.org/wikipedia/commons/f/f7/MnistExamplesModified.png">
</center>

In this example, we will predict handwritten digits in the [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database) using a [multi-layer perceptron (MLP)](https://en.wikipedia.org/wiki/Multilayer_perceptron). This is similar to the tutorial notebook, but with an added comparison with a standard model.

Of course, other neural network architectures such as [convolutional neural networks (CNNs)](https://en.wikipedia.org/wiki/Convolutional_neural_network) are better suited for this task, but for this example we will stick with MLPs.

## Setup

First, let's prepare the imports.

In [1]:
import keras
import numpy as np

2024-06-21 14:24:08.621588: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-21 14:24:08.621894: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-21 14:24:08.624185: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-21 14:24:08.651616: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Define constants relating to the data.

In [2]:
NUM_CLASSES = 10        # 10 distinct classes, 0 to 9
INPUT_SHAPE = (28, 28)  # 28 x 28 greyscale images

Load the data from the `mnist` dataset.

In [3]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

Perform some preprocessing.

In [4]:
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

y_train = keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test = keras.utils.to_categorical(y_test, NUM_CLASSES)

## Defining the Model

As mentioned, we will be using a MLP for the model. However, instead of using `keras`'s default `Dense` layer, we will use `keras_mml`'s `DenseMML` layer (which stands for Dense Matrix-Multiplication-less). `DenseMML` is designed to be a direct replacement for `Dense` layers in fully-connected layers, so we don't have to change the architecture of the model much.

In [5]:
import keras_mml

Define the `Sequential` model.

In [6]:
model = keras.Sequential(
    [
        keras.Input(shape=INPUT_SHAPE),
        keras.layers.Flatten(),
        keras_mml.layers.DenseMML(256),
        keras_mml.layers.DenseMML(256),
        keras_mml.layers.DenseMML(256),
        keras.layers.Dense(NUM_CLASSES, activation="softmax"),  # The last layer needs to be `Dense` for the output to work
    ],
    name="Classifier-MML"
)

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

In [7]:
model.summary()

We can now train the model.

In [8]:
model.fit(x_train, y_train, batch_size=128, epochs=20, validation_split=0.1)

Epoch 1/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.6657 - loss: 1.6001 - val_accuracy: 0.8880 - val_loss: 0.4313
Epoch 2/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.8852 - loss: 0.4263 - val_accuracy: 0.9192 - val_loss: 0.3044
Epoch 3/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.9054 - loss: 0.3349 - val_accuracy: 0.9218 - val_loss: 0.2759
Epoch 4/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.9117 - loss: 0.3086 - val_accuracy: 0.9293 - val_loss: 0.2561
Epoch 5/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9150 - loss: 0.2938 - val_accuracy: 0.9285 - val_loss: 0.2475
Epoch 6/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.9177 - loss: 0.2811 - val_accuracy: 0.9282 - val_loss: 0.2503
Epoch 7/20
[1m422/422[0m 

<keras.src.callbacks.history.History at 0x7feaa6720970>

Once the model is trained, let's evaluate it.

In [9]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.25758329033851624
Test accuracy: 0.9247000217437744


## A Comparison - An MLP Using Normal `Dense` Layers

Let's compare our model's performance to a model that uses the regular `Dense` layers.

In [10]:
model = keras.Sequential(
    [
        keras.Input(shape=INPUT_SHAPE),
        keras.layers.Flatten(),
        keras.layers.Dense(256),
        keras.layers.Dense(256),
        keras.layers.Dense(256),
        keras.layers.Dense(NUM_CLASSES, activation="softmax"),
    ],
    name="Classifier-Normal"
)

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

In [11]:
model.summary()

We'll train the model using the same `batch_size`, `epochs`, and `validation_split`.

In [12]:
model.fit(x_train, y_train, batch_size=128, epochs=20, validation_split=0.1)

Epoch 1/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.8484 - loss: 0.5044 - val_accuracy: 0.9283 - val_loss: 0.2678
Epoch 2/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9054 - loss: 0.3314 - val_accuracy: 0.9208 - val_loss: 0.2657
Epoch 3/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9108 - loss: 0.3111 - val_accuracy: 0.9260 - val_loss: 0.2655
Epoch 4/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9124 - loss: 0.3117 - val_accuracy: 0.9270 - val_loss: 0.2533
Epoch 5/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.9164 - loss: 0.2950 - val_accuracy: 0.9275 - val_loss: 0.2644
Epoch 6/20
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9170 - loss: 0.2921 - val_accuracy: 0.9227 - val_loss: 0.2803
Epoch 7/20
[1m422/422[0m 

<keras.src.callbacks.history.History at 0x7fea8c13e3e0>

Again, we evaluate the model.

In [13]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.30302900075912476
Test accuracy: 0.9185000061988831


Notice that the accuracy of the normal model is actually less accurate than the MML model. Regardless, this shows that, even though the model itself does not use matrix multiplications at all, our model performs similarly to the standard model.