# Differential Privacy in flwr Part 2
## Step by step guide

In this tutorial, we will learn the recommended best practices to build an effective differential privacy setting in federated learning using Flower. 

The documentation will provide a step-by-step guide, code examples, and best practices for implementing user-level differential privacy gurantees in federated learning systems.

You may consider to look at the previous tutorial where we introduced Differential Privacy in federated learning with Flower ([part 1](Flower-5-Differential-Privacy-in-flwr-part-1.ipynb)), the introductory notebook (again, using [Flower](https://flower.dev/) and [PyTorch](https://pytorch.org/))..


> [Star Flower on GitHub](https://github.com/adap/flower) ⭐️ and join the Flower community on Slack to connect, ask questions, and get help: [Join Slack](https://flower.dev/join-slack) 🌼 We'd love to hear from you in the `#introductions` channel! And if anything is unclear, head over to the `#questions` channel.


In this notebook, we will demonstrate the recommended best practice for training models with user-level Differential Privacy using.
 
We will train (30 rounds) a model with high trade-off between utility and privacy on the CIFAR-10 training and test set, partition them into ten smaller datasets. If we used more training rounds, we could certainly have a higher-accuracy private model, but not as high as a model trained without DP.

Let's get started!

### Installing dependencies

Next, we install the necessary packages for PyTorch (`torch` and `torchvision`) and Flower (`flwr`):

In [1]:
# To launch first 
%pip install flwr[simulation] torch torchvision matplotlib -e ../..

Obtaining file:///C:/Users/Elnathan%20Tiokou/flower
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
INFO: pip is looking at multiple versions of flwr[simulation] to determine which version is compatible with other requirements. This could take a while.
Collecting flwr[simulation]
  Using cached flwr-1.4.0-py3-none-any.whl (157 kB)
  Using cached flwr-1.3.0-py3-none-any.whl (139 kB)
  Using cached flwr-1.2.0-py3-none-any.whl (133 kB)
  Using cached flwr-1.1.0-py3-none-any.whl (121 kB)
  Using cached flwr-1.0.0-py3-none-any.whl (90 kB)
  Using cach

ERROR: Cannot install flwr 1.5.0 (from C:\Users\Elnathan Tiokou\flower), flwr[simulation]==0.1.0, flwr[simulation]==0.1.1, flwr[simulation]==0.10.0, flwr[simulation]==0.11.0, flwr[simulation]==0.12.0, flwr[simulation]==0.13.0, flwr[simulation]==0.14.0, flwr[simulation]==0.15.0, flwr[simulation]==0.16.0, flwr[simulation]==0.17.0, flwr[simulation]==0.18.0, flwr[simulation]==0.19.0, flwr[simulation]==0.2.0, flwr[simulation]==0.3.0, flwr[simulation]==0.4.0, flwr[simulation]==0.5.0, flwr[simulation]==0.6.0, flwr[simulation]==0.7.0, flwr[simulation]==0.9.0, flwr[simulation]==1.0.0, flwr[simulation]==1.1.0, flwr[simulation]==1.2.0, flwr[simulation]==1.3.0, flwr[simulation]==1.4.0 and flwr[simulation]==1.5.0 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts


In [None]:
# !pip install  flwr[simulation] torch torchvision matplotlib

Now that we have all dependencies installed, we can import everything we need for this tutorial:

In [None]:
# pip install matplotlib torch torchvision 

In [3]:
from collections import OrderedDict
from typing import Dict, List, Optional, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import CIFAR10

import flwr as fl
    #import tensorflow_privacy as tfp
from flwr.common import Metrics
from flwr.server.history import History

DEVICE = torch.device("cpu")  # Try "cuda" to train on GPU
print(
    f"Training on {DEVICE} using PyTorch {torch.__version__} and Flower {fl.__version__}"
)


TypeError: field() got an unexpected keyword argument 'alias'


### Downloading and preprocessing the data

We will use a convolutional neural network (CNN) on the popular CIFAR-10 dataset (10 Classes) in a Federated Learning setting of 50 clients.

In [None]:
#Setting our constants

NUM_CLIENTS = 50
BATCH_SIZE = 32


def load_datasets(num_clients: int):
    # Download and transform CIFAR-10 (train and test)
    transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )
    trainset = CIFAR10("./dataset", train=True, download=True, transform=transform)
    testset = CIFAR10("./dataset", train=False, download=True, transform=transform)

    # Split training set into 10 partitions to simulate the individual dataset
    partition_size = len(trainset) // num_clients
    lengths = [partition_size] * num_clients
    datasets = random_split(trainset, lengths, torch.Generator().manual_seed(42))

    # Split each partition into train/val and create DataLoader
    trainloaders = []
    valloaders = []
    for ds in datasets:
        len_val = len(ds) // 10  # 10 % validation set
        len_train = len(ds) - len_val
        lengths = [len_train, len_val]
        ds_train, ds_val = random_split(ds, lengths, torch.Generator().manual_seed(42))
        trainloaders.append(DataLoader(ds_train, batch_size=BATCH_SIZE, shuffle=True))
        valloaders.append(DataLoader(ds_val, batch_size=BATCH_SIZE))
    testloader = DataLoader(testset, batch_size=BATCH_SIZE)
    return trainloaders, valloaders, testloader


trainloaders, valloaders, testloader = load_datasets(NUM_CLIENTS)


### Model, Training and Evaluation functions

Let's define our model (including `set_parameters` and `get_parameters`), training and test functions:

In [None]:
class Net(nn.Module):
    def __init__(self) -> None:
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


def get_parameters(net) -> List[np.ndarray]:
    return [val.cpu().numpy() for _, val in net.state_dict().items()]


def set_parameters(net, parameters: List[np.ndarray]):
    params_dict = zip(net.state_dict().keys(), parameters)
    state_dict = OrderedDict({k: torch.Tensor(v) for k, v in params_dict})
    net.load_state_dict(state_dict, strict=True)


def train(net, trainloader, epochs: int):
    """Train the network on the training set."""
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(net.parameters())
    net.train()
    for epoch in range(epochs):
        correct, total, epoch_loss = 0, 0, 0.0
        for images, labels in trainloader:
            images, labels = images.to(DEVICE), labels.to(DEVICE)
            optimizer.zero_grad()
            outputs = net(images)
            loss = criterion(net(images), labels)
            loss.backward()
            optimizer.step()
            # Metrics
            epoch_loss += loss
            total += labels.size(0)
            correct += (torch.max(outputs.data, 1)[1] == labels).sum().item()
        epoch_loss /= len(trainloader.dataset)
        epoch_acc = correct / total
        print(f"Epoch {epoch+1}: train loss {epoch_loss}, accuracy {epoch_acc}")


def test(net, testloader):
    """Evaluate the network on the entire test set."""
    criterion = torch.nn.CrossEntropyLoss()
    correct, total, loss = 0, 0, 0.0
    net.eval()
    with torch.no_grad():
        for images, labels in testloader:
            images, labels = images.to(DEVICE), labels.to(DEVICE)
            outputs = net(images)
            loss += criterion(outputs, labels).item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    loss /= len(testloader.dataset)
    accuracy = correct / total
    return loss, accuracy

### Flower client

We create a subclass of `flwr.client.DPFedAvgNumPyClient` and implement the three methods `get_parameters`, `fit`, and `evaluate`. Here, we also pass the `cid` to the client and use it log additional details:

In [None]:
class FlowerClient (fl.client.DPFedAvgNumPyClient):
    def __init__(self, cid, net, trainloader, valloader):
        self.cid = cid
        self.net = net
        self.trainloader = trainloader
        self.valloader = valloader

    def get_parameters(self, config):
        print(f"[Client {self.cid}] get_parameters")
        return get_parameters(self.net)

    def fit(self, parameters, config):
        print(f"[Client {self.cid}] fit, config: {config}")
        set_parameters(self.net, parameters)
        train(self.net, self.trainloader, epochs=1)
        return get_parameters(self.net), len(self.trainloader), {}

    def evaluate(self, parameters, config):
        print(f"[Client {self.cid}] evaluate, config: {config}")
        set_parameters(self.net, parameters)
        loss, accuracy = test(self.net, self.valloader)
        return float(loss), len(self.valloader), {"accuracy": float(accuracy)}
    
def client_fn(cid) -> FlowerClient:
    net = Net().to(DEVICE)
    trainloader = trainloaders[int(cid)]
    valloader = valloaders[int(cid)]
    return FlowerClient(cid, net, trainloader, valloader)

# **Strategy customization**

## Simplifying Assumptions

### DP-FedAvg Modification
Originally proposed by McMahan, H. Brendan, et al.,"Learning differentially private recurrent language models". DP-FedAvg, has been extended by [Andrew et al. 2021, Differentially Private Learning with Adaptive Clipping](https://arxiv.org/abs/1905.03871), is essentially FedAvg with the following modifications:

To guarantee DP at a user-level, we need to apply the following modification in our Federated Averaging algorithm:

 - **Clipping:**, the clients' model updates must be clipped before transmission to the server, bounding the maximum influence of any one client. Moreover, knowing that the distribution of the update norm has been shown to vary from task-to-task and to evolve as training progresses. Therefore, we will use an adaptive approach of [Andrew et al. 2021, Differentially Private Learning with Adaptive Clipping](#dp-fedavg-modification) that continuously adjusts the clipping threshold to track a prespecified quantile of the update norm distribution.
 - **Noising:** Gaussian noise must be added by either the server or the clients. Adding noise could degrade the utility of the model, but we can control it using   the standard deviation of the Gaussian noise added to the sum, and the number of sampled clients at each round.
 >>>> *We provide users with the flexibility to set up the training such that each client independently adds a small amount of noise to the clipped update, with the result that simply aggregating the noisy updates is equivalent to the explicit addition of noise to the non-noisy aggregate at the server.*


 > *Remarks:* In order to do this, we need first to determine how much noise the model can tolerate with the chosed number of clients (relatively small) per round with fair loss to the model utility. We will eventually train the final model with an increased amount of noise with proportional increased number of clients. 

### Simplifying assumptions for the training process

To ensure that the training process realises the $(\epsilon, \delta )$-guarantees at the user-level, we made the following assumptions:

-**Fixed-size subsampling** :Fixed-size subsamples of the clients must be taken at each round, as opposed to variable-sized Poisson subsamples.

-**Unweighted averaging** : The contributions from all the clients must weighted equally in the aggregate to eliminate the requirement for the server to know in advance the sum of the weights of all clients available for selection.

-**No client failures** : The set of available clients must stay constant across all rounds of training. In other words, clients cannot drop out or fail.


 >*Note:* The dataset should be large enough to support the selected number of clients. It is also important to note that since we are using an adaptative clipping, and the noise multiplier being the ratio of the noise standard deviation to the clipping norm; therefore the magnitude of the noise will change from rounds to rounds. 
 >>*The above assumptions are in line with the contraints imposed by [Andrew et al.](#dp-fedavg-modification)*


# Determine the noise sensitivity of the model

For our setting we have 50 clients (each holding 45 training and 5 validation examples). The number of training examples on each client being very small, let's configure the clients to perform 3 local training epoch. We also adjust the fraction of clients selected for training during each round (we don't want all 50 clients participating in every round), so we adjust `fraction_fit` to `0.5`, which means that only 50% of available clients (25 clients) will be selected for training each round.


> *NB:* All these specifications will be wrapped in a customized function called `dp_experiment` for a proper and easy computation.


In [None]:
# Define metric aggregation function
def unweighted_average(metrics: List[Tuple[int, Metrics]]) -> Metrics:
    # Multiply accuracy of each client by a neutral number of example
    num_examples = 1
    accuracies = [num_examples * m["accuracy"] for _ , m in metrics]
    examples = [num_examples for _ , _ in metrics]

    # Aggregate and return custom metric (weighted average)
    return {"accuracy": sum(accuracies) / sum(examples)}


#Configure the clients to perform 3 local training epoch
def fit_config(server_round: int):
    config = {
        "server_round": server_round,
        "local_epochs": 3,
    }
    return config

def dp_experiment(NUM_CLIENTS, NUM_ROUNDS, noise_multiplier)-> History:
    
    strategy1 = fl.server.strategy.FedAvg(
        fraction_fit=0.5, # Train on 50 clients (each round)
        fraction_evaluate=0.25,  # Evaluate on 25 clients (each round)
        min_fit_clients=3,
        min_evaluate_clients=3,
        min_available_clients=NUM_CLIENTS,
        evaluate_metrics_aggregation_fn= unweighted_average,
        on_fit_config_fn=fit_config,
        initial_parameters=fl.common.ndarrays_to_parameters(get_parameters(Net())),
    )

    strategy = fl.server.strategy.DPFedAvgAdaptive(
        strategy = strategy1 ,
        num_sampled_clients =  strategy1.num_fit_clients(NUM_CLIENTS)[0], 
        init_clip_norm = 0.3 ,
        noise_multiplier = noise_multiplier, # Must be carefully chose accordingly to the clip_count_stddev bellow
        server_side_noising = False ,
        clip_norm_target_quantile = 0.5 ,
        clip_count_stddev = None,  # if not specified, it takes the default value 'self.num_sampled_clients` / 20.0
    )

    # Specify client resources if you need GPU (defaults to 1 CPU and 0 GPU)
    client_resources = {"num_cpus": 1, "num_gpus": 0}
    if DEVICE.type == "cuda":
        client_resources = {"num_gpus": 1}

    # Start simulation
    return fl.simulation.start_simulation(
        client_fn=client_fn,
        num_clients= NUM_CLIENTS,
        config=fl.server.ServerConfig(NUM_ROUNDS),  # Just three rounds
        strategy=strategy,
        client_resources=client_resources,
    )

In [None]:
record = []
NUM_ROUNDS = 5

for noise_multiplier in [0.0, 0.5, 0.75, 1.0]:
  print(f'Starting training with noise multiplier: {noise_multiplier}')
  hist = dp_experiment(NUM_CLIENTS, NUM_ROUNDS, noise_multiplier)
  record.append({"Noise multiplier": noise_multiplier, "History": hist})
# print(record)

In [None]:
print([i['History'].metrics_distributed for i in record])

### Setting up the *$(\epsilon, \delta )$-analysis for a scaled FL*


Two metrics are used to express the DP guarantee of an ML algorithm:

1.   Epsilon ($\epsilon$) - This is the privacy budget. It measures the strength of the privacy guarantee by bounding how much the probability of a particular model output can vary by including (or excluding) a single training point. A smaller value for $\epsilon$ implies a better privacy guarantee. However, the $\epsilon$ value is only an upper bound and a large value could still mean good privacy in practice.

2. Delta ($\delta$) - Bounds the probability of the privacy guarantee not holding. A general rule is to set it to be less than the inverse of the size of the training dataset. we set it to **10^-4** as the CIFAR10 dataset has 50,000 training points.


> We use `dp_accounting.calibrate_dp_mechanism` to search over the number of clients per round. The privacy accountant (`RdpAccountant`) we use to estimate privacy given a `dp_accounting.DpEvent` is based on [Wang et al. (2018)](https://arxiv.org/abs/1808.00087) and [Mironov et al. (2019)](https://arxiv.org/pdf/1908.10530.pdf).

In [None]:
%pip install --quiet --upgrade dp-accounting

In [None]:
import dp_accounting

In [None]:
total_clients = 50000
noise_to_clients_ratio = 0.005
target_delta = 1e-4
target_eps = 2
rounds = 100

# Initialize arguments to dp_accounting.calibrate_dp_mechanism.

# No-arg callable that returns a fresh accountant.
make_fresh_accountant = dp_accounting.rdp.RdpAccountant

# Create function that takes expected clients per round and returns a 
# dp_accounting.DpEvent representing the full training process.
def make_event_from_param(clients_per_round):
  q = clients_per_round / total_clients
  noise_multiplier = clients_per_round * noise_to_clients_ratio
  gaussian_event = dp_accounting.GaussianDpEvent(noise_multiplier)
  sampled_event = dp_accounting.PoissonSampledDpEvent(q, gaussian_event)
  composed_event = dp_accounting.SelfComposedDpEvent(sampled_event, rounds)
  return composed_event

# Create object representing the search range [1, 3383].
bracket_interval = dp_accounting.ExplicitBracketInterval(1, total_clients)

# Perform search for smallest clients_per_round achieving the target privacy.
clients_per_round = dp_accounting.calibrate_dp_mechanism(
    make_fresh_accountant, make_event_from_param, target_eps, target_delta,
    bracket_interval, discrete=True
)

noise_multiplier = clients_per_round * noise_to_clients_ratio
print(f'To get ({target_eps}, {target_delta})-DP, use {clients_per_round} '
      f'clients with noise multiplier {noise_multiplier}.')