Skip to content


Fast Differential Privacy

Fast Differential Privacy (fastDP) is a library that allows differentially private optimization of PyTorch models, with a few additional lines of code. The goal of this library is to make DP deep learning as similar to the standard non-private learning as possible, in terms of speed, memory cost, scalability, accuracy and hyperparameter-tuning. It supports all PyTorch optimizers, popular models in TIMM, torchvision, HuggingFace (up to supported modules), multiple privacy accountants, multiple clipping functions/styles, most parameter-efficient training methods, and distribute solutions such as DeepSpeed and FSDP. The library has provably little overhead in terms of training time and memory cost, compared with the standard non-private optimization.


To install the library after Git clone, run

python -m setup develop

⚠️ NOTE: We strongly recommend Python>=3.8 and torch<=1.11 (it is a known issue that torch 1.12 can slow down as much as 3 times).

Getting started

To train a model with differential privacy, simply create a PrivacyEngine and continue the standard training pipeline:

from fastDP import PrivacyEngine
optimizer = SGD(model.parameters(), lr=0.05)
privacy_engine = PrivacyEngine(
# attaching to optimizers is not needed for multi-GPU distributed learning

#----- standard training pipeline
loss = F.cross_entropy(model(batch), labels)

We provide details about our privacy engine in fastDP/, including the supported modules and the arguments. By default, we use the 'MixOpt' (hybrid book-keeping [4]) clipping mode (which enjoys almost the same time complexity as non-private optimization), and the automatic clipping function [8] (which does not need to tune the clipping threshold max_grad_norm). We support RDP and GLW privacy accountant, and additional accountants can be used through the argument noise_multiplier, after its calculation from [Automating differential privacy computation] library.

Specifically, we allow the gradient accumulation to use very large batch size, which is beneficial to DP optimization:

for i, batch in enumerate(dataloader):
    loss = F.cross_entropy(model(batch), labels)
    if i % gradient_accumulation_steps == 0:


  1. This library enables large model training in the multi-GPU distributed setting and supports mixed precision training under DeepSpeed and FSDP.

The scalability has been tested on 100B models with 512 GPUs.

  1. This library enables DP training to have almost the same time and space complexity as the standard non-private training. This is achieved by three key techniques as described in [4]: mixed ghost norm, book-keeping, and ghost differentiation. In practice, we observe <20% memory overhead and <25% slowdown across different tasks.

  1. Specifically, this library overcomes the severe memory issues of large model (commonly encountered by Opacus, which computes the per-sample gradients) and high dimensional data (commonly encountered by ghost clipping, e.g. in Private transformers), by leveraging the mixed ghost norm trick [3,8].

  1. We support all optimizers in torch.optim (SGD, Adam, AdaGrad,...) and a wide range of models (BERT, RoBERTa, GPT, ViT, BEiT, CrossViT, DEiT, ResNet, VGG, DenseNet,...), including their parameter-efficient variants. For example, one can run DP bias-term fine-tuning (DP-BiTFiT) by simply freezing non-bias terms, as in examples/image_classification.

Full fine-tuning results on a single A100 GPU

Datasets ε Setting Model Accuracy Time(min)/epoch
CIFAR10 2 [6] ViT-large 98.9 7.0
CIFAR100 2 [6] BEiT-large 88.7 6.5
CelebA 3 [6] ResNet18 88.2 2.7
SST2 3 [8] RoBERTa-large 93.9 13.5
QNLI 3 [8] RoBERTa-large 91.0 20.2
QQP 3 [8] RoBERTa-large 86.8 70.0
MNLI 3 [8] RoBERTa-large 86.3/86.7 77.1

More datasets, epsilon budgets, models, fine-tuning styles, and different hyperparamters can be found in the related papers.


The examples folder covers tasks on the table-to-text (E2E and DART datasets with GPT2 models), the text classification (SST2/QNLI/QQP/MNLI datasets with BERT/RoBERTa models), and the image classification (CIFAR10/CIFAR100/CelebA datasets with TIMM/torchvision models). Detailed can be found in each sub-folder. These examples can be used to reproduce the results in [2,3,4,6,8].


Please consider citing the following if you want to use this library in your works:

  title={Differentially private optimization on large model at small cost},
  author={Bu, Zhiqi and Wang, Yu-Xiang and Zha, Sheng and Karypis, George},
  booktitle={International Conference on Machine Learning},

  title={Zero redundancy distributed learning with differential privacy},
  author={Bu, Zhiqi and Chiu, Justin and Liu, Ruixuan and Zha, Sheng and Karypis, George},
  booktitle={ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML},
  journal={arXiv preprint arXiv:2311.11822},

  title={Differentially Private Bias-Term Fine-tuning of Foundation Models},
  author={Bu, Zhiqi and Wang, Yu-Xiang and Zha, Sheng and Karypis, George},
  booktitle={Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022},


This codebase is largely inspired by [Opacus (v0.15)], [Private transformers (v0.2.3)], [Private Vision], and [FastGradClip].


[1] Goodfellow, Ian. "Efficient per-example gradient computations." arXiv preprint arXiv:1510.01799 (2015).

[2] Li, Xuechen, Florian Tramer, Percy Liang, and Tatsunori Hashimoto. "Large language models can be strong differentially private learners." arXiv preprint arXiv:2110.05679 (2021).

[3] Bu, Zhiqi, Jialin Mao, and Shiyun Xu. "Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy." arXiv preprint arXiv:2205.10683 (2022).

[4] Bu, Zhiqi, Yu-Xiang Wang, Sheng Zha, and George Karypis. "Differentially Private Optimization on Large Model at Small Cost." arXiv preprint arXiv:2210.00038 (2022).

[5] Yousefpour, Ashkan, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen et al. "Opacus: User-friendly differential privacy library in PyTorch." arXiv preprint arXiv:2109.12298 (2021).

[6] Bu, Zhiqi, Yu-Xiang Wang, Sheng Zha, and George Karypis. "Differentially Private Bias-Term only Fine-tuning of Foundation Models." arXiv preprint arXiv:2210.00036 (2022).

[7] Abadi, Martin, et al. "Deep learning with differential privacy." Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.

[8] Bu, Zhiqi, Yu-Xiang Wang, Sheng Zha, and George Karypis. "Automatic clipping: Differentially private deep learning made easier and stronger." arXiv preprint arXiv:2206.07136 (2022).