GitHub - JinseongP/DPTrainer: Official PyTorch implementation of "In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification", CVPR 2024.

Overview

Official PyTorch implementation of "In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification", CVPR 2024.

Jinseong Park *, Yujin Choi *, and Jaewook Lee
^{* Equal contribution}

| paper link |

Step-by-Step Algorithm

Environment configuration
Training EDM models with 4% data or Download the generated images
Train warm-up classifiers with standard training
DP-SGD

0. Environment configuration

Create docker image (or corresponding virtual environment with cuda 11.8 and torch1.13.0)

sudo docker run -i -t --ipc=host --name dptrainer--gpus=all anibali/pytorch:1.13.0-cuda11.8 /bin/bash

Install the required packages
```
pip install -r requirements.txt
```

1. Training EDM model with 4% data

We provide the generated images with the EDM with 4% of public data in DATADRIVE.

The number of weight in CIFAR-10 indicates the weight of discriminator in DG.

Place the synthetic data and the indices for public data at the directory specified below.

${project_page}/DPTrainer/
├── data 
│   ├── cifar-10-edm
│   |   ├── cifar10_data_sampled_index.pt
│   |   ├── cifar10_data_sampled_weight0.npz
│   |   ├── ...
│   ├── cifar-100-edm
├── ...

Otherwise, you can end-to-end train EDM models with 4% data.

0) Follow the requirements of EDM

Please refer to the official code of EDM: https://github.com/NVlabs/edm

[Reference] Karras, Tero, et al. "Elucidating the design space of diffusion-based generative models." Advances in Neural Information Processing Systems 35 (2022): 26565-26577.

1) Prepare subsampled dataset

For CIFAR-10 dataset, Download cifar10_data_sampled_4percent.zip at DATADRIVE.
For CIFAR-100 dataset, Download cifar100_data_sampled_4percent.zip at DATADRIVE.
These zip files contain png images with balanced labels, and dataset.json
Place zip files at the directory same as the train.py file of EDM.

2) Train EDM model

To train the EDM model with 4% of CIFAR-10 dataset, run:

torchrun --standalone --nproc_per_node=4 train.py --outdir=training-runs --data=cifar10_data_sampled_4percent.zip --cond=1 --arch=ddpmpp

To train the EDM model with 4% of CIFAR-100 dataset, run:

torchrun --standalone --nproc_per_node=4 train.py --outdir=training-runs --data=cifar100_data_sampled_4percent.zip --cond=1 --arch=ddpmpp

3) Generate EDM samples

To generate unconditional discriminator-guided 50k samples, run:

torchrun --standalone --nproc_per_node=2 generate.py --outdir=out --seeds=0-999 --batch=64 --network=./training-runs/PATH/network.pkl

4) (Optionally) Discriminator Guidance

Follow the instructions in https://github.com/alsdudrla10/DG with the trained EDM model and synthetic data.

Warning: You need to train a discriminator (correspondingly classifier) based on the 4% of public data.

[Reference] Kim, Dongjun, et al. "Refining Generative Process with Discriminator Guidance in Score-based Diffusion Models." International Conference on Machine Learning. PMLR, 2023.

2. Warm-up Training

Please refer to the /examples/ folder in this repository.

Follow the instructions of /examples/cifar10_warmup.ipynb and /examples/cifar100_warmup.ipynb.

3. Training EDM model with 4% data

!python main.py --gpu {GPU} --max_grad_norm {MAX_GRAD_NORM} --epsilon {EPSILON} --delta {DELTA}  --data {DATA} --optimizer "{OPTIMIZER}" --epochs {EPOCHS} --batch_size {BATCH_SIZE} --max_physical_batch_size {MAX_PHYSICAL_BATCH_SIZE} --model_name {MODEL_NAME}  --n_class {N_CLASSES} --augmult {N_AUGMULT} --path {PATH} --name {NAME} --memo {MEMO} --public_batch_size {PUBLIC_BATCH_SIZE} --extender {EXTENDER}  --pretrained_dir {WARMUP_PATH}

For specific usage, follow the instructions of /examples/cifar10_dpsgd.ipynb and /examples/cifar100_dpsgd.ipynb.

For details of each parameter, please refer to main.py.

4. Citation

@InProceedings{park2024indistribution,
    author    = {Park, Jinseong and Choi, Yujin and Lee, Jaewook},
    title     = {In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {12236-12246}
}

The backbone trainer architecture of this code is based on adversarial-defenses-pytorch. For better usage of the trainer, please refer to adversarial-defenses-pytorch.
Furthermore, we use Opacus for differentially private training, building upon on trainer library of DPSAT.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples		examples
opacus		opacus
src		src
torchdefenses		torchdefenses
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Step-by-Step Algorithm

0. Environment configuration

1. Training EDM model with 4% data

0) Follow the requirements of EDM

1) Prepare subsampled dataset

2) Train EDM model

3) Generate EDM samples

4) (Optionally) Discriminator Guidance

2. Warm-up Training

3. Training EDM model with 4% data

4. Citation

About

Releases

Packages

Contributors 2

Languages

License

JinseongP/DPTrainer

Folders and files

Latest commit

History

Repository files navigation

Overview

Step-by-Step Algorithm

0. Environment configuration

1. Training EDM model with 4% data

0) Follow the requirements of EDM

1) Prepare subsampled dataset

2) Train EDM model

3) Generate EDM samples

4) (Optionally) Discriminator Guidance

2. Warm-up Training

3. Training EDM model with 4% data

4. Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages