# CIFAR CLASSIFICATION

### Imports

In [18]:
! set_cuda_version 11.2 8.1.0

Using CUDA 11.2 at /usr/local/cuda-11.2.
Using CUDNN 8.1.0 at /usr/local/cudnn/11.2-v8.1.0


In [19]:
import tensorflow as tf
import keras_cv

import deel.lipdp.layers as DP_layers
import deel.lipdp.losses as DP_losses
from deel.lipdp.pipeline import bound_clip_value
from deel.lipdp.pipeline import load_and_prepare_data
from deel.lipdp.sensitivity import get_max_epochs
from deel.lipdp.model import DP_Accountant
from deel.lipdp.model import DP_Sequential
from deel.lipdp.model import DPParameters
from deel.lipdp.model import AdaptiveLossGradientClipping

### Loading the data :

It is important to import the data with the right DP parameters to account properly for the privacy guarantees of the trained model.

In [20]:
# augmentations = [
#     keras_cv.layers.RandomRotation(0.2, fill_mode="reflect", interpolation="bilinear"),
#     keras_cv.layers.RandomTranslation(
#         0.2, 0.2, fill_mode="reflect", interpolation="bilinear"
#     ),
# ]

ds_train, ds_test, dataset_metadata = load_and_prepare_data(
    "cifar10",
    batch_size=750,
    colorspace="HSV",
    # augmentations=augmentations,
    drop_remainder=True,  # accounting assumes fixed batch size
    bound_fct=bound_clip_value(
        10.0
    ),  # clipping preprocessing allows to control input bound
)

Please pay attention to the fact that the effective batch size in memory will be batch_size $\times$ len(augmentations).

### Declaring the DP parameters :

We also need to declare explicitly the parameters of the DP training process.

In [21]:
dp_parameters = DPParameters(
    noisify_strategy="global",
    noise_multiplier=0.75,
    delta=1e-5,
)

### Defining the model :

We use a simple convolutive network to classify on the MNIST dataset. We add a loss gradient clipping layer at the end of our network for more tightness on our gradient's upper bound. Therefore allowing for better results with one less hyperparameter to tune for dynamically chosen clipping constant. 

In [22]:
layers = [
    DP_layers.DP_BoundedInput(
        input_shape=dataset_metadata.input_shape,
        upper_bound=dataset_metadata.max_norm,
    ),
    DP_layers.DP_SpectralConv2D(
        filters=32, kernel_size=3, use_bias=False, kernel_initializer="orthogonal"
    ),
    DP_layers.DP_Flatten(),
    DP_layers.DP_SpectralDense(
        units=512, use_bias=False, kernel_initializer="orthogonal"
    ),
    DP_layers.DP_GroupSort(2),
    DP_layers.DP_SpectralDense(
        units=10, use_bias=False, kernel_initializer="orthogonal"
    ),
    DP_layers.DP_ClipGradient(
        epsilon=1, mode="dynamic_svt", patience=5
    )
]

model = DP_Sequential(
    layers=layers, dp_parameters=dp_parameters, dataset_metadata=dataset_metadata
)

loss = DP_losses.DP_TauCategoricalCrossentropy(14.5)

# Compatible with any kind of non-private optimizer : 
opt = tf.keras.optimizers.SGD(learning_rate=1e-2)

model.compile(
    loss=loss,
    optimizer=opt,
    metrics=["accuracy"],
    run_eagerly=False,
)

  warn(_msg_not_lip.format(layer.name))
  warn(_msg_not_lip.format(layer.name))


### Define the desired DP guarantees :

We compute the budget of epochs needed to yields the DP guarantees that you desire :

In [23]:
num_epochs = get_max_epochs(8.0, model)

epoch bounds = (0, 512.0) and epsilon = 121.24060720498198 at epoch 512.0
epoch bounds = (0, 256.0) and epsilon = 66.38218380431994 at epoch 256.0
epoch bounds = (0, 128.0) and epsilon = 37.92323089410726 at epoch 128.0
epoch bounds = (0, 64.0) and epsilon = 23.425990726099066 at epoch 64.0
epoch bounds = (0, 32.0) and epsilon = 14.191037676444132 at epoch 32.0
epoch bounds = (0, 16.0) and epsilon = 9.561652236093542 at epoch 16.0
epoch bounds = (8.0, 16.0) and epsilon = 6.218921347499752 at epoch 8.0
epoch bounds = (8.0, 12.0) and epsilon = 8.108216363628433 at epoch 12.0
epoch bounds = (10.0, 12.0) and epsilon = 7.6115104852854385 at epoch 10.0
epoch bounds = (11.0, 12.0) and epsilon = 7.859863424456936 at epoch 11.0


### Train the model : 

The training process is called through the model.fit attribute. We use the following callbacks : 

- **DP_Accountant** (log_fn) : accounts for the privacy guarantees after each epoch of training (*log_fn* makes it compatible with W&B logging).
- **DP_AdaptiveGradientClipping** (ds_train, patience) : automatically updates the losses's gradient clipping constant every *patience* steps. 


In [24]:
callbacks = [
    DP_Accountant(log_fn="logging"),
    AdaptiveLossGradientClipping(
        ds_train=ds_train
    ),  # DO NOT USE THIS CALLBACK WHEN mode != "dynamic_svt"
]

hist = model.fit(
    ds_train,
    epochs=num_epochs,
    validation_data=ds_test,
    callbacks=callbacks,
)

On train begin : 
Initial value is now equal to lipschitz constant of loss:  tf.Tensor(1.4142135, shape=(), dtype=float32)
Epoch 1/11


 (3.334448232597648, 1e-05)-DP guarantees for epoch 1 

updated_clip_value :  1.006512355202473
Epoch 2/11
 (3.5845675174158074, 1e-05)-DP guarantees for epoch 2 

Epoch 3/11
 (3.832920482489297, 1e-05)-DP guarantees for epoch 3 

Epoch 4/11
 (4.081273443586172, 1e-05)-DP guarantees for epoch 4 

Epoch 5/11
 (5.225509601937803, 1e-05)-DP guarantees for epoch 5 

Epoch 6/11
 (5.47386252998526, 1e-05)-DP guarantees for epoch 6 

updated_clip_value :  1.0305780514590723
Epoch 7/11
 (5.722215469156756, 1e-05)-DP guarantees for epoch 7 

Epoch 8/11
 (5.970568408328255, 1e-05)-DP guarantees for epoch 8 

Epoch 9/11
 (6.218921347499752, 1e-05)-DP guarantees for epoch 9 

Epoch 10/11
 (7.363157546113941, 1e-05)-DP guarantees for epoch 10 

Epoch 11/11
 (7.6115104852854385, 1e-05)-DP guarantees for epoch 11 

updated_clip_value :  1.027746793075943


### 