## Demo 3: HKR classifier on MNIST dataset

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deel-ai/deel-lip/blob/master/docs/notebooks/demo3.ipynb)

This notebook will demonstrate learning a binary task on the MNIST0-8 dataset.


In [1]:
# pip install git+https://github.com/deel-ai/deel-lip.git@keras3 -qqq

In [2]:
import keras
import keras.ops as K
from keras.layers import Input, Flatten
from keras.optimizers import Adam
from keras.metrics import BinaryAccuracy
from keras.models import Sequential

from deel.lip.layers import SpectralDense, FrobeniusDense
from deel.lip.activations import GroupSort, GroupSort2
from deel.lip.losses import HKR, KR, HingeMargin

2024-09-09 16:46:57.908038: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-09 16:46:57.919347: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-09 16:46:57.922845: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-09 16:46:57.931333: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Data preparation

For this task we will select two classes: 0 and 8.

In [3]:
from keras.datasets import mnist

# first we select the two classes
selected_classes = [0, 8]  # must be two classes as we perform binary classification


def prepare_data(x, y, class_a=0, class_b=8):
    """
    This function convert the MNIST data to make it suitable for our binary classification
    setup.
    """
    # select items from the two selected classes
    mask = (y == class_a) + (
        y == class_b
    )  # mask to select only items from class_a or class_b
    x = x[mask]
    y = y[mask]
    x = x.astype("float32")
    y = y.astype("float32")
    # convert from range int[0,255] to float32[-1,1]
    x /= 255
    x = x.reshape((-1, 28, 28, 1))
    # change label to binary classification {-1,1}
    y[y == class_a] = 1.0
    y[y == class_b] = 0.0
    return x, y.reshape((-1, 1))


# now we load the dataset
(x_train, y_train_ord), (x_test, y_test_ord) = mnist.load_data()

# prepare the data
x_train, y_train = prepare_data(
    x_train, y_train_ord, selected_classes[0], selected_classes[1]
)
x_test, y_test = prepare_data(
    x_test, y_test_ord, selected_classes[0], selected_classes[1]
)

# display infos about dataset
print(
    "train set size: %i samples, classes proportions: %.3f percent"
    % (y_train.shape[0], 100 * y_train[y_train == 1].sum() / y_train.shape[0])
)
print(
    "test set size: %i samples, classes proportions: %.3f percent"
    % (y_test.shape[0], 100 * y_test[y_test == 1].sum() / y_test.shape[0])
)

train set size: 11774 samples, classes proportions: 50.306 percent
test set size: 1954 samples, classes proportions: 50.154 percent


### Build 1-Lipschitz Model

Let's first explicit the paremeters of this experiment


In [4]:
# training parameters
epochs = 10
batch_size = 128

# network parameters
activation = GroupSort  # ReLU, MaxMin, GroupSort2

# loss parameters
min_margin = 1.0
alpha = 10.0

Now we can build the network. Here the experiment is done with a MLP. But `deel-lip`
also provide state of the art 1-Lipschitz convolutions.


In [5]:
keras.utils.clear_session()
# helper function to build the 1-lipschitz MLP
model = Sequential(
    layers=[
        Input((28, 28, 1)),
        Flatten(),
        SpectralDense(32, GroupSort2(), use_bias=True),
        SpectralDense(16, GroupSort2(), use_bias=True),
        FrobeniusDense(1, activation=None, use_bias=False),
    ],
    name="lipModel",
)
model.summary()

I0000 00:00:1725893221.465840 1172384 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1725893221.485439 1172384 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1725893221.485575 1172384 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1725893221.486425 1172384 cuda_executor.cc:1015] successful NUMA node read from SysFS ha

In [6]:
optimizer = Adam(learning_rate=0.001)

In [7]:
model.compile(
    loss=HKR(
        alpha=alpha, min_margin=min_margin
    ),  # HKR stands for the hinge regularized KR loss
    metrics=[
        KR,  # shows the KR term of the loss
        HingeMargin(min_margin=min_margin),  # shows the hinge term of the loss
        BinaryAccuracy(threshold=0),  # shows the classification accuracy
    ],
    optimizer=optimizer,
)

### Learn classification on MNIST

Now the model is build, we can learn the task.


In [8]:
model.fit(
    x=x_train,
    y=y_train,
    validation_data=(x_test, y_test),
    batch_size=batch_size,
    shuffle=True,
    epochs=epochs,
    verbose=1,
)

Epoch 1/10


I0000 00:00:1725893224.205872 1172438 service.cc:146] XLA service 0x7fe044006190 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1725893224.205889 1172438 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 2070 SUPER, Compute Capability 7.5
2024-09-09 16:47:04.238892: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-09-09 16:47:04.372884: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8902


[1m47/92[0m [32m━━━━━━━━━━[0m[37m━━━━━━━━━━[0m [1m0s[0m 3ms/step - HingeMargin: 0.1326 - KR: 2.4768 - binary_accuracy: 0.8870 - loss: -1.1513

I0000 00:00:1725893226.312877 1172438 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 40ms/step - HingeMargin: 0.0941 - KR: 3.4458 - binary_accuracy: 0.9218 - loss: -2.5046 - val_HingeMargin: 0.0264 - val_KR: 5.5877 - val_binary_accuracy: 0.9800 - val_loss: -5.3403
Epoch 2/10
[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 8ms/step - HingeMargin: 0.0392 - KR: 5.4768 - binary_accuracy: 0.9725 - loss: -5.0849 - val_HingeMargin: 0.0350 - val_KR: 5.6396 - val_binary_accuracy: 0.9754 - val_loss: -5.3006
Epoch 3/10
[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - HingeMargin: 0.0354 - KR: 5.5531 - binary_accuracy: 0.9742 - loss: -5.1991 - val_HingeMargin: 0.0258 - val_KR: 5.5907 - val_binary_accuracy: 0.9811 - val_loss: -5.3503
Epoch 4/10
[1m92/92[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 8ms/step - HingeMargin: 0.0353 - KR: 5.4961 - binary_accuracy: 0.9722 - loss: -5.1434 - val_HingeMargin: 0.0324 - val_KR: 5.6472 - val_binary_accuracy: 0.9754 - val_loss: -5

<keras.src.callbacks.history.History at 0x7fe1140219c0>

As we can see, the model reaches a very decent accuracy on this task.
