# 👋 Getting started 1: Creating a 1-Lipschitz neural network

The goal of this series of tutorials is to show the different usages of `deel-lip`.

In this first notebook, our objective is to show how to create 1-Lipschitz neural networks with `deel-lip`. 

In particular, we will cover the following: 
1. [📚 Theoretical background](#theoretical_background)    
A brief theoretical background on Lipschitz continuous functions. This section can be safely skipped if one is not interested in the theory.
2. [🧱 Creating a 1-Lipschitz neural network with `deel-lip` and `keras`](#deel_keras)       
An example of how to create a 1-Lipschitz neural network with `deel-lip` and `keras`.
3. [🔨 Design rules for 1-Lipschitz neural networks with `deel-lip`](#design)   
A set of neural network design rules that one must respect in order to enforce the 1-Lipschitz constraint.



## 📚 Theoretical background <a id='theoretical_background'></a> <a name='theoretical_background'></a>
### What is a Lipschitz constant
The `deel-lip` package allows to control the Lipschitz constant of a layer or of a whole neural network. The Lipschitz constant is a mathematical property of a function (in our context of work, a layer or a model) that characterizes how much the output of the function can change with respect to changes in its input. 

In mathematical terms, a function $f$ is Lipschitz continuous with a **Lipschitz constant K** or more simply **K-Lipschitz** if for any given pair of points $x_1,x_2$, $K$ provides a bound on the rate of change of $f$:  

$$||f(x_1)-f(x_2)||\leq K||x_1-x_2||.$$

For instance, given a 1-Lipschitz dense layer (a.k.a fully connected layer) with a weight matrix $W$ and a bias vector $b$, we have for any two inputs $x_1$ and $x_2$: $$||W.x_1+b-(W.x_2+b)|| \leq 1||x_1-x_2||.$$

💡 The norm we refer to throughout our notebooks is the Euclidean norm (L2). This is because `deel-lip` operates with this norm. You will find more information about the role of the norm in the context of adversarially robust 1-Lipschitz deep learning models in the notebook titled 'Getting Started 2'.

### A simple requirement for creating 1-Lipschitz neural network
The composition property of Lipschitz continuous functions states that if you have a function f that is $K_1$-Lipschitz and another function g that is $K_2$-Lispchitz, then their composition function h = (f o g) which applies f after g is also Lipschitz continuous with a Lipschitz constant $K \leq K_1$ * $K_2$.

A neural network is essentially a stack of layers that transform the output of the layer before them and whose output is fed to the layer after them. 

By the composition property of Lipschitz functions, *it suffices for each of the n individual layers of a neural network model to be 1-Lipschitz, for the whole model to be 1-Lipschitz*.

For instance, given a 1-Lipschitz dense layer parametrized by $(W_1,b_1)$, and a ReLU (Rectified Linear Unit) activation layer which is naturally 1-Lipschitz, the combination of the two is also 1-Lispchitz.   
This is shown in the equations below, where we have for any two inputs $x_1$ and $x_2$:

$$||W_1.x_1+b_1-(W_1.x_2+b_1)||\leq 1||x_1-x_2||,$$
$$||ReLU(x_1)-ReLU(x_2)||\leq 1||x_1-x_2||,$$
and:
$$||ReLU(W_1.x_1+b_1)-ReLU(W_1.x_2+b_1)||\leq 1||W_1.x_1+b_1-(W_1.x_2+b_1)||\leq 1^2||x_1-x_2||.$$


The `deel-lip` package allows to create 1-Lipschitz neural networks, by providing the user with means to enforce the Lipschitz constant 1 on a selected set of layers (such as dense layers).   
It also ensures that 1-Lipschitz continuity is retained while the model is being trained by managing the changes to trainable parameters.


## 🧱 Creating a 1-Lipschitz neural network with `deel-lip` and `keras` <a id='deel_keras'></a> <a name='deel_keras'></a>
`keras` is an open-source high-level deep learning API written in Python. It allows to build, train, and deploy deep learning models.

One can produce a neural network architecture using keras with a few lines of code, as shown in the toy-example multi-layer perceptron (MLP) below:

In [4]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Model

input_shape = (28, 28, 1)
num_classes=10

# a basic model that does not follow any Lipschitz constraint
model = keras.Sequential([
        layers.Input(shape=input_shape),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(32, activation='relu'),
        layers.Dense(num_classes, activation='softmax')
    ])
    
model.compile(optimizer='adam',
          loss='categorical_crossentropy',
          metrics=['accuracy'])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 64)                50240     
                                                                 
 dense_1 (Dense)             (None, 32)                2080      
                                                                 
 dense_2 (Dense)             (None, 10)                330       
                                                                 
Total params: 52650 (205.66 KB)
Trainable params: 52650 (205.66 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


Alternatively, it is equivalent to write:

In [10]:
inputs = keras.layers.Input(input_shape)
x = keras.layers.Flatten()(inputs)
x = layers.Dense(64, activation='relu')(x)
x = layers.Dense(32, activation='relu')(x)
y = layers.Dense(num_classes, activation='softmax')(x)
model = Model(inputs=inputs, outputs=y)
model.summary()

Model: "model_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_8 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 flatten_4 (Flatten)         (None, 784)               0         
                                                                 
 dense_18 (Dense)            (None, 64)                50240     
                                                                 
 dense_19 (Dense)            (None, 32)                2080      
                                                                 
 dense_20 (Dense)            (None, 10)                330       
                                                                 
Total params: 52650 (205.66 KB)
Trainable params: 52650 (205.66 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


`deel-lip` extends `keras`' capabilities by introducing custom `layers` and `model` modules, to provide the ability to control the Lipschitz constant of layers objects or of complete neural networks, while keeping a user-friendly interface.

Below is a 1-Lipschitz replication of the previous MLP toy-example, using `deel-lip`:

In [2]:
import deel
from deel import lip

activation=lip.activations.GroupSort2

In [12]:
K1_model = lip.model.Sequential([    
        keras.layers.Input(shape=input_shape),
        keras.layers.Flatten(),
        lip.layers.SpectralDense(64, activation=activation()),
        lip.layers.SpectralDense(32, activation=activation()),
        lip.layers.SpectralDense(num_classes, activation=None)
    ],
    
    k_coef_lip=1,

)

K1_model.compile(optimizer='adam',
          loss='categorical_crossentropy',
          metrics=['accuracy'])

K1_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_6 (Flatten)         (None, 784)               0         
                                                                 
 spectral_dense_3 (Spectral  (None, 64)                100481    
 Dense)                                                          
                                                                 
 spectral_dense_4 (Spectral  (None, 32)                4161      
 Dense)                                                          
                                                                 
 spectral_dense_5 (Spectral  (None, 10)                661       
 Dense)                                                          
                                                                 
Total params: 105303 (411.34 KB)
Trainable params: 52650 (205.66 KB)
Non-trainable params: 52653 (205.68 KB)
___________

Alternatively, it is equivalent to write:

In [11]:
inputs = keras.layers.Input(input_shape)
x = keras.layers.Flatten()(inputs)
x = lip.layers.SpectralDense(64, activation=activation(),k_coef_lip=1.)(x)
x = lip.layers.SpectralDense(32, activation=activation(),k_coef_lip=1.)(x)
y = lip.layers.SpectralDense(num_classes, activation=None,k_coef_lip=1.)(x)
K1_model = lip.model.Model(inputs=inputs, outputs=y)
K1_model.summary()



Model: "model_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_9 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 flatten_5 (Flatten)         (None, 784)               0         
                                                                 
 spectral_dense (SpectralDe  (None, 64)                100481    
 nse)                                                            
                                                                 
 spectral_dense_1 (Spectral  (None, 32)                4161      
 Dense)                                                          
                                                                 
 spectral_dense_2 (Spectral  (None, 10)                661       
 Dense)                                                          
                                                           

To summarize, there exists 2 ways to specify the Lipschitz constant of a model to 1:
- by specifying the Lispchitz constant of each of its layer through their `k_coef_lip` attribute, when using a `lip.model.Model` object:

In [6]:
inputs = keras.layers.Input(input_shape)
x = keras.layers.Flatten()(inputs)
# k_coef_lip sets the Lipschitz constant of the layer. Its value is 1 by default.
x = lip.layers.SpectralDense(64, activation=activation(),k_coef_lip=1.)(inputs)
x = lip.layers.SpectralDense(32, activation=activation(),k_coef_lip=1.)(x)
y = lip.layers.SpectralDense(num_classes, activation=None,k_coef_lip=1.)(x)

In [7]:
K1_model = lip.model.Model(inputs=inputs, outputs=y)

- by specifying the Lipschitz constant of the whole model through the `k_coef_lip` attribute of a `Sequential` object, e.g.:

In [8]:
K1_model = lip.model.Sequential([    
        ....
    ],
    # This parameter sets the Lipschitz constant of the whole model. Its value is 1 by default.
    k_coef_lip=1,
)

SyntaxError: invalid syntax (1507601557.py, line 3)

💡
Keep in mind that all the classes above inherit from their respective `keras` equivalent (e.g. `Dense` for `SpectralDense`).   <br>
As a result, these objects conveniently use the same interface and the same parameters as their keras equivalent, with the additional parameter `k_coef_lip` that controls the Lipschitz constant.

## 🔨 Design rules for 1-Lipschitz neural networks with `deel-lip`  <a id='design'></a> <a name='design'></a>
**Layer selection: `deel-lip` vs `keras`**  
<br/> 
In our 1-Lipschitz MLP examples above, we have used a mixture of objects from both `keras` and `deel-lip` `layers` submodule (e.g. the `Input` layer for `keras`, the `SpectralDense` layer for `deel-lip`).

More generally, for the particular types of layers that do not interfere with the Lipschitz property of any neural network they belong to, no alternative has been coded in `deel-lip` and the existing `keras` layer object can be used. 

This is the case for the following keras layers: `MaxPooling`, `GlobalMaxPooling`, `Flatten` and `Input`.

Below is the full list of `keras` layers for which `deel-lip` provides a Lipschitz equivalent. If one wants to ensure a model's Lipschitz continuity, the alternative `deel-lip` layers must be employed instead of the original `keras` counterparts.

| tensorflow.keras.layers | deel.lip.layers |
| --------------- | --------------- |
| `Dense`    | `SpectralDense`<br>|
| `Conv2D`   | `SpectralConv2D`<br>  |
|  `AveragePooling2D`<br>`GlobalAveragePooling2D` | `ScaledAveragePooling2D`<br>`ScaledGlobalAveragePooling2D`|

<br/>

💡 Although there are additional Lipschitz continuous layers available in `deel-lip`, the ones mentioned above are perfectly suitable and recommended for practical use. Interested readers can find information about the other layers [here](#documentation).

<br>  


🚨 **Note:** *When creating a 1-Lipschitz neural network, one should avoid using the following layers:*<br> 
- `Dropout`: Our current recommendation is to avoid using it, as we have not yet fully understood how it affects learning of 1-Lipschitz neural networks
- `BatchNormalization`: It is not 1-Lipschitz


**Activation function selection:**

The ReLU and softmax activation functions are both Lipschitz continuous with a Lipschtiz constant of 1. 

However, as can be seen in our examples, the following is perfectly suitable and recommended for practical use:
- using the `GroupSort2` activation function stored in the `activations` submodule of `deel-lip` for the intermediate layers of a 1-Lipschitz neural network.
- not using any activation function for the last layer of 1-Lipschitz neural networks.

💡 Interested readers can find information relevant to other 1-Lipschitz activation functions that exist within `deel-lip` [here](https://deel-ai.github.io/deel-lip/api/layers/).


**Loss function selection:**

One can use `keras` loss functions to train 1-Lipschitz neural networks. Doing so will not interfere with the 1-Lipschitz continuity of the model.  

💡 `deel-lip` also has a `losses` submodule that contains several loss functions. They are parametrized to let the user create adversarially robust functions, which is not the case with `keras` loss functions. 


## 🎉 Congratulations
You now know how to create 1-Lipschitz neural networks!

In the next tutorial, we will see how to train and assess adversarially robust 1-Lipschitz neural networks on the classification task, using `deel-lip`'s `losses` submodule.