Skip to content

Commit 8ace747

Browse files
committed
release: private evolution for images using simulators
1 parent 254bd78 commit 8ace747

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2575
-17
lines changed

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ This repo is a Python library to **generate differentially private (DP) syntheti
88
* **Differentially Private Synthetic Data via Foundation Model APIs 2: Text**
99
[[paper (ICML 2024 Spotlight)]](https://proceedings.mlr.press/v235/xie24g.html) [[paper (arxiv)](https://arxiv.org/abs/2403.01749)] [[website](https://alphapav.github.io/augpe-dpapitext)]
1010
**Authors:** [[Chulin Xie](https://alphapav.github.io/)], [[Zinan Lin](https://zinanlin.me/)], [[Arturs Backurs](https://www.mit.edu/~backurs/)], [[Sivakanth Gopi](https://www.microsoft.com/en-us/research/people/sigopi/)], [[Da Yu](https://dayu11.github.io/)], [[Huseyin Inan](https://www.microsoft.com/en-us/research/people/huinan/)], [[Harsha Nori](https://www.microsoft.com/en-us/research/people/hanori/)], [[Haotian Jiang](https://jhtdavid96.wixsite.com/jianghaotian)], [[Huishuai Zhang](https://huishuai-git.github.io/)], [[Yin Tat Lee](https://yintat.com/)], [[Bo Li](https://aisecure.github.io/)], [[Sergey Yekhanin](http://www.yekhanin.org/)]
11+
* **Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Models**
12+
[[paper (arxiv)](https://arxiv.org/abs/2502.05505)]
13+
**Authors:** [[Zinan Lin](https://zinanlin.me/)], [[Tadas Baltrusaitis](https://www.microsoft.com/en-us/research/people/tabaltru/)], [[Sergey Yekhanin](http://www.yekhanin.org/)]
1114

1215
Please refer to [this repo](https://github.com/fjxmlzn/private-evolution-papers) for the full list of Private Evolution papers and code repositories related to PE.
1316

@@ -16,8 +19,9 @@ Please refer to the [documentation](https://microsoft.github.io/DPSDA/) for more
1619

1720
## News
1821

19-
* `1/8/2025`: **Text generation** based on the paper [`Differentially Private Synthetic Data via Foundation Model APIs 2: Text`](https://arxiv.org/abs/2403.01749) has been integrated into the library! If you want to reproduce the results in the [paper](https://arxiv.org/abs/2403.01749), please refer to [our original codebase](https://github.com/AI-secure/aug-pe).
20-
* `11/21/2024`: The refactored codebase for **image generation** based on the paper [`Differentially Private Synthetic Data via Foundation Model APIs 1: Images`](https://arxiv.org/abs/2305.15560) has been released! It is completely refactored to be more modular and easier to use and extend. The code originally published with the [paper](https://arxiv.org/abs/2305.15560) has been moved to the [deprecated](https://github.com/microsoft/DPSDA/tree/deprecated) branch in this repository, which is no longer maintained.
22+
* `2/11/2025`: **Image generation with simulator APIs** based on the paper [`Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Models`](https://arxiv.org/abs/2502.05505) has been released in this library!
23+
* `1/8/2025`: **Text generation with foundation model APIs** based on the paper [`Differentially Private Synthetic Data via Foundation Model APIs 2: Text`](https://arxiv.org/abs/2403.01749) has been integrated into the library! If you want to reproduce the results in the [paper](https://arxiv.org/abs/2403.01749), please refer to [our original codebase](https://github.com/AI-secure/aug-pe).
24+
* `11/21/2024`: The refactored codebase for **image generation with foundation model APIs** based on the paper [`Differentially Private Synthetic Data via Foundation Model APIs 1: Images`](https://arxiv.org/abs/2305.15560) has been released! It is completely refactored to be more modular and easier to use and extend. The code originally published with the [paper](https://arxiv.org/abs/2305.15560) has been moved to the [deprecated](https://github.com/microsoft/DPSDA/tree/deprecated) branch in this repository, which is no longer maintained.
2125

2226
## Contributing
2327

doc/build_autodoc.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
sphinx-apidoc -e -f --module-first -d 7 -o source/api ../pe ../pe/*/test* ../pe/*/bk*
1+
sphinx-apidoc -e -f --module-first -d 7 -o source/api ../pe ../pe/*/test* ../pe/*/*/test* ../pe/*/bk*
22
make clean html
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.api.image.avatar\_api module
2+
===============================
3+
4+
.. automodule:: pe.api.image.avatar_api
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.api.image.draw\_text\_api module
2+
===================================
3+
4+
.. automodule:: pe.api.image.draw_text_api
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.api.image.nearest\_image\_api module
2+
=======================================
3+
4+
.. automodule:: pe.api.image.nearest_image_api
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

doc/source/api/pe.api.image.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,8 @@ Submodules
2020
.. toctree::
2121
:maxdepth: 7
2222

23+
pe.api.image.avatar_api
24+
pe.api.image.draw_text_api
2325
pe.api.image.improved_diffusion_api
26+
pe.api.image.nearest_image_api
2427
pe.api.image.stable_diffusion_api
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.callback.common.compute\_precision\_recall module
2+
====================================================
3+
4+
.. automodule:: pe.callback.common.compute_precision_recall
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

doc/source/api/pe.callback.common.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,5 @@ Submodules
1313
:maxdepth: 7
1414

1515
pe.callback.common.compute_fid
16+
pe.callback.common.compute_precision_recall
1617
pe.callback.common.save_checkpoints
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.callback.image.dpimagebench\_classify\_images module
2+
=======================================================
3+
4+
.. automodule:: pe.callback.image.dpimagebench_classify_images
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.callback.image.dpimagebench\_lib.densenet module
2+
===================================================
3+
4+
.. automodule:: pe.callback.image.dpimagebench_lib.densenet
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.callback.image.dpimagebench\_lib.ema module
2+
==============================================
3+
4+
.. automodule:: pe.callback.image.dpimagebench_lib.ema
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.callback.image.dpimagebench\_lib.resnet module
2+
=================================================
3+
4+
.. automodule:: pe.callback.image.dpimagebench_lib.resnet
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.callback.image.dpimagebench\_lib.resnext module
2+
==================================================
3+
4+
.. automodule:: pe.callback.image.dpimagebench_lib.resnext
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
pe.callback.image.dpimagebench\_lib package
2+
===========================================
3+
4+
.. automodule:: pe.callback.image.dpimagebench_lib
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
8+
9+
Submodules
10+
----------
11+
12+
.. toctree::
13+
:maxdepth: 7
14+
15+
pe.callback.image.dpimagebench_lib.densenet
16+
pe.callback.image.dpimagebench_lib.ema
17+
pe.callback.image.dpimagebench_lib.resnet
18+
pe.callback.image.dpimagebench_lib.resnext
19+
pe.callback.image.dpimagebench_lib.wrn
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.callback.image.dpimagebench\_lib.wrn module
2+
==============================================
3+
4+
.. automodule:: pe.callback.image.dpimagebench_lib.wrn
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

doc/source/api/pe.callback.image.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,20 @@ pe.callback.image package
66
:undoc-members:
77
:show-inheritance:
88

9+
Subpackages
10+
-----------
11+
12+
.. toctree::
13+
:maxdepth: 7
14+
15+
pe.callback.image.dpimagebench_lib
16+
917
Submodules
1018
----------
1119

1220
.. toctree::
1321
:maxdepth: 7
1422

23+
pe.callback.image.dpimagebench_classify_images
1524
pe.callback.image.sample_images
1625
pe.callback.image.save_all_images
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.data.image.celeba module
2+
===========================
3+
4+
.. automodule:: pe.data.image.celeba
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.data.image.digiface1m module
2+
===============================
3+
4+
.. automodule:: pe.data.image.digiface1m
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.data.image.mnist module
2+
==========================
3+
4+
.. automodule:: pe.data.image.mnist
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

doc/source/api/pe.data.image.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,8 @@ Submodules
1414

1515
pe.data.image.camelyon17
1616
pe.data.image.cat
17+
pe.data.image.celeba
1718
pe.data.image.cifar10
19+
pe.data.image.digiface1m
1820
pe.data.image.image
21+
pe.data.image.mnist
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pe.embedding.image.fld\_inception module
2+
========================================
3+
4+
.. automodule:: pe.embedding.image.fld_inception
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

doc/source/api/pe.embedding.image.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,5 @@ Submodules
1212
.. toctree::
1313
:maxdepth: 7
1414

15+
pe.embedding.image.fld_inception
1516
pe.embedding.image.inception

doc/source/conf.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,4 +69,7 @@
6969
("py:class", "improved_diffusion.respace.SpacedDiffusion"),
7070
("py:class", "improved_diffusion.unet.UNetModel"),
7171
("py:class", "omegaconf.dictconfig.DictConfig"),
72+
("py:class", "python_avatars.Avatar"),
73+
("py:class", "torch.utils.data.DataLoader"),
74+
("py:class", "torch.nn.Module"),
7275
]

doc/source/getting_started/examples.rst

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,13 @@ Images
1010

1111
* **CIFAR10 dataset**: `This example <CIFAR10 example_>`__ shows how to generate differentially private synthetic images for the `CIFAR10 dataset`_ using the APIs from a pre-trained `ImageNet diffusion model`_.
1212
* **Camelyon17 dataset**: `This example <Camelyon17 example_>`__ shows how to generate differentially private synthetic images for the `Camelyon17 dataset`_ using the APIs from a pre-trained `ImageNet diffusion model`_.
13-
* **Cat dataset**: `This example <Cat example_>`__ shows how to generate differentially private synthetic images of the `Cat dataset`_ using the APIs from `Stable Diffusion`_.
13+
* **Cat dataset**: `This example <Cat example_>`__ shows how to generate differentially private synthetic images for the `Cat dataset`_ using the APIs from `Stable Diffusion`_.
14+
15+
* Using **simulators** as the APIs. These examples follow the experimental settings in the paper `Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Models <pe3_paper_>`__.
16+
17+
* **MNIST dataset**: `This example <MNIST example_>`__ shows how to generate differentially private synthetic images for the `MNIST dataset`_ using a text render.
18+
* **CelebA dataset (simulator-generated data)**: `This example <CelebA DigiFace1M example_>`__ shows how to generate differentially private synthetic images for the `CelebA dataset`_ using `the generated data from a computer graphics-based renderer for face images <DigiFace1M_>`__.
19+
* **CelebA dataset (weak simulator)**: `This example <CelebA avatar example_>`__ shows how to generate differentially private synthetic images for the `CelebA dataset`_ using `a rule-based avatar generator <python_avatar_>`__.
1420

1521
Text
1622
----
@@ -36,22 +42,36 @@ These examples follow the experimental settings in the paper `Differentially Pri
3642
.. _ImageNet diffusion model: https://github.com/openai/improved-diffusion
3743
.. _Stable Diffusion: https://huggingface.co/CompVis/stable-diffusion-v1-4
3844

45+
.. _DigiFace1M: https://github.com/microsoft/DigiFace1M
46+
.. _python_avatar: https://github.com/ibonn/python_avatars
47+
3948
.. _Cat dataset: https://www.kaggle.com/datasets/fjxmlzn/cat-cookie-doudou
4049
.. _CIFAR10 dataset: https://www.cs.toronto.edu/~kriz/cifar.html
4150
.. _Camelyon17 dataset: https://camelyon17.grand-challenge.org/
51+
52+
.. _MNIST dataset: https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html
53+
.. _CelebA dataset: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
54+
4255
.. _Yelp dataset: https://github.com/AI-secure/aug-pe/tree/main/data
4356
.. _OpenReview dataset: https://github.com/AI-secure/aug-pe/tree/main/data
4457
.. _PubMed dataset: https://github.com/AI-secure/aug-pe/tree/main/data
4558

4659
.. _CIFAR10 example: https://github.com/microsoft/DPSDA/blob/main/example/image/diffusion_model/cifar10_improved_diffusion.py
4760
.. _Camelyon17 example: https://github.com/microsoft/DPSDA/blob/main/example/image/diffusion_model/camelyon17_improved_diffusion.py
4861
.. _Cat example: https://github.com/microsoft/DPSDA/blob/main/example/image/diffusion_model/cat_stable_diffusion.py
62+
63+
.. _MNIST example: https://github.com/microsoft/DPSDA/blob/main/example/image/simulator/mnist_text_render.py
64+
.. _CelebA DigiFace1M example: https://github.com/microsoft/DPSDA/blob/main/example/image/simulator/celeba_digiface1m.py
65+
.. _CelebA avatar example: https://github.com/microsoft/DPSDA/blob/main/example/image/simulator/celeba_avatar.py
66+
4967
.. _Yelp OpenAI example: https://github.com/microsoft/DPSDA/blob/main/example/text/yelp_openai/main.py
5068
.. _Yelp Huggingface example: https://github.com/microsoft/DPSDA/blob/main/example/text/yelp_huggingface/main.py
5169
.. _Openreview OpenAI example: https://github.com/microsoft/DPSDA/blob/main/example/text/openreview_openai/main.py
5270
.. _Openreview Huggingface example: https://github.com/microsoft/DPSDA/blob/main/example/text/openreview_huggingface/main.py
5371
.. _PubMed OpenAI example: https://github.com/microsoft/DPSDA/blob/main/example/text/pubmed_openai/main.py
5472
.. _PubMed Huggingface example: https://github.com/microsoft/DPSDA/blob/main/example/text/pubmed_huggingface/main.py
5573

74+
5675
.. _pe1_paper: https://arxiv.org/abs/2305.15560
57-
.. _pe2_paper: https://arxiv.org/abs/2403.01749
76+
.. _pe2_paper: https://arxiv.org/abs/2403.01749
77+
.. _pe3_paper: https://arxiv.org/abs/2502.05505

doc/source/getting_started/intro.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,15 @@ Key Features
1414

1515
Compared to other DP synthetic data alternatives, **Private Evolution** has the following key features:
1616

17-
* ✅ **No training needed!** **Private Evolution** only requires the inference APIs of foundation models. Therefore, it can leverage any state-of-the-art black-box models (e.g., GPT-4) and open-source models (e.g., Stable Diffusion, Llama).
17+
* ✅ **No training needed!** **Private Evolution** only requires the inference APIs of foundation models or non-neural-network data synthesis tools. Therefore, it can leverage any state-of-the-art black-box models (e.g., GPT-4), open-source models (e.g., Stable Diffusion, Llama), or tools (e.g., computer graphics-based image synthesis tools).
1818
* ✅ **Protects privacy even from the API provider.** Even when using APIs from a third-party provider, you can rest assured that the information of individuals in the original dataset is still protected, as all API queries made from **Private Evolution** are also differentially private.
1919
* ✅ **Works across images, text, etc.** **Private Evolution** can generate synthetic data for various data types, including images and text. More data modalities are coming soon!
2020
* ✅ **Could even match/beat SoTA training-based methods in data quality.** **Private Evolution** can generate synthetic data that is statistically similar to the original data, and in some cases, it can even match or beat the state-of-the-art training-based methods in data quality even though it does not require any training.
2121

2222
What This Library Provides
2323
--------------------------
2424

25-
**This library is the official Python package of Private Evolution**. It allows you to generate differentially private synthetic data (e.g., images, text) using the **Private Evolution** algorithm. This library is designed to be easy to use, flexible, modular, and extensible. It provides several popular foundation model APIs, and you can easily extend it to work with your own foundation models (and/or APIs), data types, or new **Private Evolution** algorithms if needed.
25+
**This library is the official Python package of Private Evolution**. It allows you to generate differentially private synthetic data (e.g., images, text) using the **Private Evolution** algorithm. This library is designed to be easy to use, flexible, modular, and extensible. It provides several popular foundation models and data synthesis tools, and you can easily extend it to work with your own foundation models (and/or APIs), data synthesis tools, data types, or new **Private Evolution** algorithms if needed.
2626

2727
The source code of this **Private Evolution** library is available at https://github.com/microsoft/DPSDA.
2828

@@ -37,4 +37,7 @@ If you use **Private Evolution** in your research or work, please cite the follo
3737
.. literalinclude:: pe2.bib
3838
:language: bibtex
3939

40+
.. literalinclude:: pe3.bib
41+
:language: bibtex
42+
4043
Please see https://github.com/fjxmlzn/private-evolution-papers for the full list of **Private Evolution** papers and code repositories done by the community.

doc/source/getting_started/pe1.bib

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
@article{lin2023differentially,
1+
@article{pe1,
22
title={Differentially private synthetic data via foundation model apis 1: Images},
33
author={Lin, Zinan and Gopi, Sivakanth and Kulkarni, Janardhan and Nori, Harsha and Yekhanin, Sergey},
44
journal={arXiv preprint arXiv:2305.15560},

doc/source/getting_started/pe2.bib

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
@article{xie2024differentially,
1+
@article{pe2,
22
title={Differentially private synthetic data via foundation model apis 2: Text},
33
author={Xie, Chulin and Lin, Zinan and Backurs, Arturs and Gopi, Sivakanth and Yu, Da and Inan, Huseyin A and Nori, Harsha and Jiang, Haotian and Zhang, Huishuai and Lee, Yin Tat and others},
44
journal={arXiv preprint arXiv:2403.01749},

doc/source/getting_started/pe3.bib

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
@article{pe3,
2+
title={Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Models},
3+
author={Lin, Zinan and Baltrusaitis, Tadas and Yekhanin, Sergey},
4+
journal={arXiv preprint arXiv:2502.05505},
5+
year={2025}
6+
}

0 commit comments

Comments
 (0)