# Federated Learning with Differential Privacy

Please make sure you set up a virtual environment and follow [example root readme](../../README.md) before starting this notebook.
Then, install the requirements.

<div class="alert alert-block alert-info"> <b>NOTE</b> Some of the cells below generate long text output.  We're using <pre>%%capture --no-display --no-stderr cell_output</pre> to suppress this output.  Comment or delete this line in the cells below to restore full output.</div>

In [None]:
%%capture --no-display --no-stderr cell_output
import sys
!{sys.executable} -m pip install -r requirements.txt

### Differential Privacy (DP)
[Differential Privacy (DP)](https://arxiv.org/abs/1910.00962) [1] is a rigorous mathematical framework designed to provide strong privacy guarantees when handling sensitive data. In the context of Federated Learning (FL), DP plays a crucial role in safeguarding user information by introducing randomness into the training process. Specifically, it ensures privacy by adding carefully calibrated noise to the model updates—such as gradients or weights—before they are transmitted from clients to the central server. This obfuscation mechanism makes it statistically difficult to infer whether any individual data point contributed to a particular update, thereby protecting user-specific information.

By integrating DP into FL, even if an adversary gains access to the aggregated updates or models, the added noise prevents them from accurately deducing sensitive details about any individual client's data. Common approaches include 

1. **Local Differential Privacy (LDP)**, where noise is added directly on the client side before updates are sent
2. **Global Differential Privacy (GDP)**, where noise is injected after aggregation at the server.

The balance between privacy and model utility is typically managed through a privacy budget (ϵ), which quantifies the trade-off between the level of noise added and the resulting model accuracy.


As a first example, we show you how to add **local** DP filters to your FL training in NVFlare. Here, we use the "Sparse Vector Technique", i.e. the [SVTPrivacy](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.filters.svt_privacy.html) protocol, as utilized in [Li et al. 2019](https://arxiv.org/abs/1910.00962) [1] (see [Lyu et al. 2016](https://arxiv.org/abs/1603.01699) [2] for more information). 

DP is added as an NVFlare `Filter` using the [FedJob API](https://nvflare.readthedocs.io/en/main/programming_guide/fed_job_api.html#fedjob-api) you should have seen in prior chapters.

#### Sparse Vector Technique

The [Sparse Vector Technique](https://arxiv.org/abs/1603.01699) (SVT) enhances privacy by applying noise and thresholding to a randomly selected subset of model weights or updates, $x$. The process consists of two main steps:

1. **Noise Addition:** Laplace noise is added to the absolute value of the selected weights:

$abs(x)+Lap(s)$

2. **Thresholding and Clipping:** The noisy values are clipped within a predefined range $[−γ,γ]$ and shared only if they meet a thresholding condition:

$clip(x+Lap(s),γ)$

Here, $abs(x)$ represents the absolute value, $Lap(s)$ is noise sampled from the Laplace distribution, $γ$ is the predefined threshold, and $clip(x,γ)$ ensures values remain within the specified range.

The experimental results show that there is a tradeoff between model performance and privacy preservation, where stronger privacy guarantees may impact the model performance more severly.

## Run experiments with FL simulator
For simplicity, we focus on training a simple federated CNN for CIFAR-10 classification (see its definition in [net.py](src/net.py)). FL simulator is used for running the FL experiments.

The experiments are separated into three parts

1. Train a model using the FedAvg algorithm with four clients without DP.
2. Train the same model using DP added as an NVFlare `Filter`.
3. Train the same model using [Opacus'](https://opacus.ai) PrivacyEngine on the client to implementemt local [DP-SGD](https://arxiv.org/abs/1607.00133) [3]. In this case, DP noise is added during each optimization step of the local training and we can skip the additional DP filter.

#### 0. Download the CIFAR-10 data
First, we download the CIFAR-10 dataset to avoid clients overwriting each other's local dataset during this simulation.

In [None]:
import torchvision
DATASET_PATH = "/tmp/nvflare/data"
torchvision.datasets.CIFAR10(root=DATASET_PATH, download=True)

## 1. Train without DP
#### 1.1 Define a FedJob
The `FedJob` is used to define how controllers and executors are placed within a federated job using the `to(object, target)` routine.

Here we use a PyTorch `BaseFedJob`, where we can define the job name and the initial global model.
The `BaseFedJob` automatically configures components for model persistence, model selection, and TensorBoard streaming for convenience.

In [None]:
from src.net import Net

from nvflare.app_common.workflows.fedavg import FedAvg
from nvflare.app_opt.pt.job_config.base_fed_job import BaseFedJob
from nvflare.job_config.script_runner import ScriptRunner

job = BaseFedJob(
    name="cifar10_fedavg",
    initial_model=Net(),
)

#### 1.2 Define the Controller Workflow
Define the controller workflow and send it to the server. For simplicity, we will run the simulation only for a few round but you can increase it for the models to converge.

In [None]:
n_clients = 2

controller = FedAvg(
    num_clients=n_clients,
    num_rounds=3,  # 30 rounds should converge
)
job.to(controller, "server")

That completes the components that need to be defined on the server.

#### 1.3 Add clients
Next, we can use the `ScriptRunner` and send it to each of the clients to run our training script.

Note that our script could have additional input arguments, such as batch size or data path, but we don't use them here for simplicity.

In [None]:
for i in range(n_clients):
    runner = ScriptRunner(
        script="src/cifar10_fl.py"
    )
    job.to(runner, f"site-{i+1}")

That's it!

#### 1.4 Optionally export the job
Now, we could export the job and submit it to a real NVFlare deployment using the [Admin client](https://nvflare.readthedocs.io/en/main/real_world_fl/operation.html) or [FLARE API](https://nvflare.readthedocs.io/en/main/real_world_fl/flare_api.html).

In [None]:
job.export_job("job_configs")

#### 1.5 Run FL Simulation
Finally, we can run our FedJob in simulation using NVFlare's [simulator](https://nvflare.readthedocs.io/en/main/user_guide/nvflare_cli/fl_simulator.html) under the hood.

The results will be saved in the specified `workdir`.

In [None]:
job.simulator_run(f"/tmp/nvflare/{job.name}")

## 2. Add DP as an NVFlare Filter
#### 2.1 Run FL Simulation with DP
Run the FL simulator with two clients for federated learning with differential privacy. The key now is to add a filter to each client that applies DP before sending the model updates back to the server
using the `job.to()` method.

Let's create a new FedJob with the DP add through the [SVTPrivacy](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.filters.html#nvflare.app_common.filters.SVTPrivacy) Filter implementing the **sparse vector technique** for differential privacy [[2]](https://arxiv.org/abs/1603.01699). Note that the epsilon used here is different from the epsilon defining the privacy budget in DP. See [[1]](https://arxiv.org/abs/1910.00962), [[2]](https://arxiv.org/abs/1603.01699) for more details on its usage.

> **Note:** Use `filter_type=FilterType.TASK_RESULT` as we are adding the filter on top of the model updates after local training.
> 
> Furthermore, this filter was developed for use with weight differences. So, we use `params_transfer_type=TransferType.DIFF` here when specifying the `ScriptRunner`.

In [None]:
from nvflare import FilterType
from nvflare.client.config import TransferType
from nvflare.app_common.filters import SVTPrivacy

# Create BaseFedJob with the initial model
job = BaseFedJob(
  name="cifar10_fedavg_dp",
  initial_model=Net(),
)

# Define the controller and send to server
controller = FedAvg(
    num_clients=n_clients,
    num_rounds=3,  # 100 rounds should converge
)
job.to_server(controller)

# Add clients
for i in range(n_clients):
    runner = ScriptRunner(
        script="src/cifar10_fl.py",
        params_transfer_type=TransferType.DIFF
    )
    job.to(runner, f"site-{i+1}")

    # add privacy filter.
    dp_filter = SVTPrivacy(fraction=0.9, epsilon=0.1, noise_var=0.1, gamma=1e-5)
    job.to(dp_filter, f"site-{i+1}", tasks=["train"], filter_type=FilterType.TASK_RESULT)

# Optionally export the configuration
job.export_job("job_configs")

Next, start the training

In [None]:
job.simulator_run(f"/tmp/nvflare/{job.name}")

> **Note:** you can also try adding or combining the filters with other privacy filters or customize them. For example, use the [PercentilePrivacy](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.filters.html#nvflare.app_common.filters.PercentilePrivacy) filter based on Shokri and Shmatikov ([Privacy-preserving deep learning, CCS '15](https://dl.acm.org/doi/abs/10.1145/2810103.2813687)) or [ExcludeVars](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.filters.html#nvflare.app_common.filters.ExcludeVars) filter to exclude variables that shouldn't be shared with the server.

## 3. Run DP-SGD with Privacy Budgeting during local training
To implement local DP-SGD during client training, we can simply use [Opacus' PrivacyEngine](https://opacus.ai/). For that, we need to modify our training script to add the privacy engine and apply it to our optimizer and data loaders. For example:
```
# Add PrivacyEngine
privacy_engine = PrivacyEngine()
model, optimizer, data_loader = privacy_engine.make_private(
    module=model,
    optimizer=optimizer,
    data_loader=data_loader,
    noise_multiplier=1.1,
    max_grad_norm=1.0,
)
```

The remaining code is as usual. To enable it, we need to add the `--target_epsilon` argument to our [training script](src/cifar10_fl.py) when using the `ScriptRunner`.

In [None]:
from nvflare import FilterType
from nvflare.client.config import TransferType
from nvflare.app_common.filters import SVTPrivacy

# Create BaseFedJob with the initial model
job = BaseFedJob(
  name="cifar10_fedavg_dpsgd",
  initial_model=Net(),
)

# Define the controller and send to server
controller = FedAvg(
    num_clients=n_clients,
    num_rounds=3,  # 100 rounds should converge
)
job.to_server(controller)

# Add clients
for i in range(n_clients):
    runner = ScriptRunner(
        script="src/cifar10_fl.py",
        script_args="--target_epsilon=50.0",  # lower epsilon will increase privacy but impact accuracy more
        params_transfer_type=TransferType.DIFF
    )
    job.to(runner, f"site-{i+1}")

# Optionally export the configuration
job.export_job("job_configs")

Again, we can start the training using the simulator call.

In [None]:
job.simulator_run(f"/tmp/nvflare/{job.name}")

## 4. Visualize the results
Finally, you can plot the results by running `tensorboard --logdir /tmp/nvflare` in a new terminal. In this notebook, we only run for a few FL rounds for simplicity. If you uncomment the recommended number of in the FedAvg controller definitions of the cells, you can run the experiments until convergence. As one can observe, the model with DP (red) takes more rounds to achieve a comparable training performance but has less risks of leaking private information compared to the model trained without DP (orange). For more details, on how to apply this filter in a medical imaging use case, see [Li et al. 2019](https://arxiv.org/abs/1910.00962) [1].

![TensorBoard Training curve of FedAvg without and with DP](tb_curve_dp.png)

### Summary
This notebook explores two methods for adding Differential Privacy (DP) noise to model training: using an NVFlare `Filter` or integrating Opacus within your local training script. These approaches enhance privacy by reducing the risk of memorizing individual data points. 

DP remains an active research area, and the optimal choice of parameters depends on your specific problem and risk tolerance. A smaller epsilon provides stronger privacy but introduces more noise, potentially lowering accuracy. This trade-off should be carefully navigated. A recommended technique from [Opacus](https://opacus.ai/tutorials/building_image_classifier#Tips-and-Tricks) is to pre-train on public data before fine-tuning on private data.  

Further research into quantifying model memorization and data leakage during training can provide deeper insights into privacy risks. For more details, refer to our paper on [gradient inversion](../../../../../../research/quantifying-data-leakage/README.md) [[4]](https://arxiv.org/abs/2202.06924) which is also implemented in NVFlare.

For training large language models with DP, refer to the latest Opacus examples, which can be seamlessly integrated into NVFlare deployments. If you are using TensorFlow, you can achieve similar privacy protections with [TensorFlow Privacy](https://www.tensorflow.org/responsible_ai/privacy/guide).

Next, we will learn how to protect the model updates using [homomorphic encryption](../05.3_homomorphic_encryption/05.3.1_privacy_with_homormorphic_encryption.ipynb).

#### References
[1] Li, W., Milletarì, F., Xu, D., Rieke, N., Hancox, J., Zhu, W., Baust, M., Cheng, Y., Ourselin, S., Cardoso, M.J. and Feng, A., 2019, October. Privacy-preserving federated brain tumour segmentation. In International workshop on machine learning in medical imaging (pp. 133-141). Springer, Cham.

[2] Lyu, M., Su, D., & Li, N. (2016). Understanding the sparse vector technique for differential privacy. arXiv preprint arXiv:1603.01699.

[3] Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016, October). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (pp. 308-318).

[4] Hatamizadeh, A., Yin, H., Molchanov, P., Myronenko, A., Li, W., Dogra, P., ... & Roth, H. R. (2023). Do gradient inversion attacks make federated learning unsafe?. IEEE Transactions on Medical Imaging, 42(7), 2044-2056.