# Mnist tutorial

This notebook introduces the basics of usage of our library.

## Imports

The library is based on tensorflow.

In [1]:
import tensorflow as tf

### lip-dp dependencies

The need a model `DP_Sequential` that handles the noisification of gradients. It is composed `layers` and trained with a loss found in `loss`. The model is initialized with the convenience function `DPParameters`. 

In [2]:
from deel.lipdp import layers
from deel.lipdp import losses
from deel.lipdp.model import DP_Sequential
from deel.lipdp.model import DPParameters

The `DP_Accountant` callback keeps track of $(\epsilon,\delta)$-DP values epoch after epoch. In practice we may be interested in reaching the maximum val_accuracy under privacy constraint $\epsilon$: the convenience function `get_max_epochs` exactly does that by performing a dichotomy search over the number of epochs.

In [3]:
from deel.lipdp.model import DP_Accountant
from deel.lipdp.sensitivity import get_max_epochs

The framework requires a control of the maximum norm of inputs. This can be ensured with input clipping for example: `bound_clip_value`.

In [4]:
from deel.lipdp.pipeline import bound_clip_value
from deel.lipdp.pipeline import load_and_prepare_data

## Setup DP Lipschitz model

Here we apply the "global" strategy, with a noise multiplier $2.5$. Note that for Mnist the dataset size is $N=60,000$, and it is recommended that $\delta<\frac{1}{N}$. So we propose a value of $\delta=10^{-5}$.

In [5]:
dp_parameters = DPParameters(
    noisify_strategy="global",
    noise_multiplier=2.0,
    delta=1e-5,
)

epsilon_max = 3.0

### Loading the data

We clip the elementwise input upper-bound to $20.0$.

In [6]:
import warnings
warnings.filterwarnings("ignore")

# data loader return dataset_metadata which allows to
# know the informations required for privacy accounting
# (dataset size, number of samples, max input bound...)
input_upper_bound = 20.0
ds_train, ds_test, dataset_metadata = load_and_prepare_data(
    "mnist",
    batch_size=1000,
    drop_remainder=True,  # accounting assumes fixed batch size
    bound_fct=bound_clip_value(  # other strategies are possible, like normalization.
        input_upper_bound
    ),  # clipping preprocessing allows to control input bound
)

2023-05-24 16:00:31.206597: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-24 16:00:31.742417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 47066 MB memory:  -> device: 0, name: Quadro RTX 8000, pci bus id: 0000:03:00.0, compute capability: 7.5


### Build the DP model

We imitate the interface of Keras. We use common layers found in deel-lip, which a wrapper that handles the bound propagation. 

In [7]:
# construct DP_Sequential
model = DP_Sequential(
    # works like usual sequential but requires DP layers
    layers=[
        # BoundedInput works like Input, but performs input clipping to guarantee input bound
        layers.DP_BoundedInput(
            input_shape=dataset_metadata.input_shape, upper_bound=input_upper_bound
        ),
        layers.DP_QuickSpectralConv2D( # Reshaped Kernel Orthogonalization (RKO) convolution.
            filters=32,
            kernel_size=3,
            kernel_initializer="orthogonal",
            strides=1,
            use_bias=False,  # No biases since the framework handles a single tf.Variable per layer.
        ),
        layers.DP_GroupSort(2),  # GNP activation function.
        layers.DP_ScaledL2NormPooling2D(pool_size=2, strides=2),  # GNP pooling.
        layers.DP_QuickSpectralConv2D( # Reshaped Kernel Orthogonalization (RKO) convolution.
            filters=64,
            kernel_size=3,
            kernel_initializer="orthogonal",
            strides=1,
            use_bias=False,  # No biases since the framework handles a single tf.Variable per layer.
        ),
        layers.DP_GroupSort(2),  # GNP activation function.
        layers.DP_ScaledL2NormPooling2D(pool_size=2, strides=2),  # GNP pooling.
        
        layers.DP_Flatten(),   # Convert features maps to flat vector.
        
        layers.DP_QuickSpectralDense(512),  # GNP layer with orthogonal weight matrix.
        layers.DP_GroupSort(2),
        layers.DP_QuickSpectralDense(dataset_metadata.nb_classes),
    ],
    dp_parameters=dp_parameters,
    dataset_metadata=dataset_metadata,
)

We compile the model with:
* any first order optimizer (e.g SGD). No adaptation or special optimizer is needed.
* a loss with known Lipschitz constant, e.g Categorical Cross-entropy with temperature.

In [8]:
model.compile(
    # Compile model using DP loss
    loss=losses.DP_TauCategoricalCrossentropy(18.0),
    # this method is compatible with any first order optimizer
    optimizer=tf.keras.optimizers.SGD(learning_rate=2e-4, momentum=0.9),
    metrics=["accuracy"],
)
model.summary()

Model: "dp__sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dp__bounded_input (DP_Bound  (None, 28, 28, 1)        0         
 edInput)                                                        
                                                                 
 dp__quick_spectral_conv2d (  (None, 26, 26, 32)       288       
 DP_QuickSpectralConv2D)                                         
                                                                 
 dp__group_sort (DP_GroupSor  (None, 26, 26, 32)       0         
 t)                                                              
                                                                 
 dp__scaled_l2_norm_pooling2  (None, 13, 13, 32)       0         
 d (DP_ScaledL2NormPooling2D                                     
 )                                                               
                                                    

Note that the model contains $843$K parameters. Without gradient clipping these architectures can be trained with batch sizes as big as $1000$ on a standard GPU.

Then, we compute the number of epochs. The maximum value of epsilon will depends on dp_parameters and the number of epochs. In order to control epsilon, we compute the adequate number of epochs

In [9]:
num_epochs = get_max_epochs(epsilon_max, model)

epoch bounds = (0, 512.0) and epsilon = 7.994426666195571 at epoch 512.0
epoch bounds = (0, 256.0) and epsilon = 5.34128917907949 at epoch 256.0
epoch bounds = (0, 128.0) and epsilon = 3.631964622805248 at epoch 128.0
epoch bounds = (64.0, 128.0) and epsilon = 2.4829841192119444 at epoch 64.0
epoch bounds = (64.0, 96.0) and epsilon = 3.089635897639078 at epoch 96.0
epoch bounds = (80.0, 96.0) and epsilon = 2.796528753679695 at epoch 80.0
epoch bounds = (88.0, 96.0) and epsilon = 2.952713799856404 at epoch 88.0
epoch bounds = (88.0, 92.0) and epsilon = 3.0216241846349847 at epoch 92.0
epoch bounds = (90.0, 92.0) and epsilon = 2.987618328313939 at epoch 90.0
epoch bounds = (90.0, 91.0) and epsilon = 3.0046212568846444 at epoch 91.0


## Train the model

The model can be trained, and the DP Accountant will automatically track the privacy loss.

In [10]:
hist = model.fit(
    ds_train,
    epochs=num_epochs,
    validation_data=ds_test,
    callbacks=[
        # accounting is done thanks to a callback
        DP_Accountant(log_fn="logging"),  # wandb.log also available.
    ],
)

Epoch 1/91


2023-05-24 16:00:36.621954: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8300
2023-05-24 16:00:37.363789: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory


 (0.3227333785403041, 1e-05)-DP guarantees for epoch 1 

Epoch 2/91
 (0.41135036253440604, 1e-05)-DP guarantees for epoch 2 

Epoch 3/91
 (0.4972854400421322, 1e-05)-DP guarantees for epoch 3 

Epoch 4/91
 (0.5737399623472044, 1e-05)-DP guarantees for epoch 4 

Epoch 5/91
 (0.6418194146435952, 1e-05)-DP guarantees for epoch 5 

Epoch 6/91
 (0.7042008802236781, 1e-05)-DP guarantees for epoch 6 

Epoch 7/91
 (0.7616059152520757, 1e-05)-DP guarantees for epoch 7 

Epoch 8/91
 (0.8155744676428971, 1e-05)-DP guarantees for epoch 8 

Epoch 9/91
 (0.8666021691681208, 1e-05)-DP guarantees for epoch 9 

Epoch 10/91
 (0.9152742048884784, 1e-05)-DP guarantees for epoch 10 

Epoch 11/91
 (0.9617965624530973, 1e-05)-DP guarantees for epoch 11 

Epoch 12/91
 (1.0059716506359193, 1e-05)-DP guarantees for epoch 12 

Epoch 13/91
 (1.049398006635733, 1e-05)-DP guarantees for epoch 13 

Epoch 14/91
 (1.090263192229449, 1e-05)-DP guarantees for epoch 14 

Epoch 15/91
 (1.131126828240101, 1e-05)-DP guarant

The model can be further improved by tuning various hyper-parameters, by adding layers (see `advanced_cifar10.ipynb` tutorial). 