Installing Dependencies

This repository contains code to replicate the experiments in the paper Noise-Aware Statistical Inference with Differentially Private Synthetic Data (AISTATS 2023).

A more user-friendly implementation of the core algorithm is available in the Twinify library.

Installing Dependencies

Python Dependencies

conda env create
conda activate max-ent-env

Submodules

Some of the dependencies are git submodules. Run

git submodule init
git submodule update

to fetch them.

R Dependencies

The privLCM package uses R. It requires R (>= 4.0) and several R packages. The packages can be installed by starting R with the command

and running

install.packages("plyr")
install.packages("data.table")
install.packages("Rfast")
install.packages("parallel")
install.packages("./BayesLCM")

Other code

The files lib/mst.py and lib/privbayes.py are from https://github.com/ryan112358/private-pgm, with minor modifications.

Running the Code

We use Snakemake to manage running our experiments. The command

snakemake -j 6

runs all of our experiments using 6 cores in parallel. The number after -j sets the number of cores. Note that this will take several days on a single computer.

Figures for the toy data experiment are places in latex/figures, figures for the Adult and US Census experiments are placed in subdirectories. The figures will also be in the generated notebooks processed_report-toy-data.py.ipynb, processed_report-adult-reduced.py.ipynb and processed_report-us-census.py.ipynb.

The file workflow/Snakemake specifies the Adult data experiments that are run with the repeats, epsilons and algorithms variables. These can be edited to run a subset of the experiments, or a smaller number of repeats. The toy data experiment is controlled in the same way by workflow/rules/toy-data.smk, and the US Census experiment by workflow/rules/us-census.smk.

In case the plotting notebooks workflow/scripts/report.py.ipynb, workflow/scripts/report-adult-reduced.py.ipynb and workflow/scripts/us-census/report.py.ipynb fail to run with Snakemake, they can be opened and run manually.

US Census Data

The original data for the US Census experiment is fairly large so we omitted it from the repository. It can be downloaded from the UCI repository https://archive.ics.uci.edu/ml/machine-learning-databases/census1990-mld/ (USCensus1990.data.txt).

The subset of the dataset used in the experiment is included, so downloading the original dataset is not necessary to run the experiments.

Runnning Data Pre-Processing

The results of preprocessing the datasets are included in the repository, so those won't be re-run automatically. The pre-processing steps can be re-run by deleting the files in the datasets/adult-reduced and datasets/us-census directories, and running snakemake as above. Note that the full US Census data must downloaded and placed in datasets if datasets/us-census/reduced.csv is deleted.

Navigating the Code

The implementation of NAPSU-MQ is in the lib directory. The code refers to NAPSU-MQ with "maximum entropy", shortened to "max ent".

The workflow directory contains scripts that run the experiments using Snakemake.

adult-test.ipynb and max-ent-test.ipynb are example notebooks for adult data and toy data experiments, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
BayesLCM @ 942d495		BayesLCM @ 942d495
datasets		datasets
iterative-dp @ e47293f		iterative-dp @ e47293f
latex/figures		latex/figures
lib		lib
relaxed-adaptive-projection @ 52917e9		relaxed-adaptive-projection @ 52917e9
results		results
workflow		workflow
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
adult-test.ipynb		adult-test.ipynb
environment.yml		environment.yml
max-ent-test.ipynb		max-ent-test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installing Dependencies

Python Dependencies

Submodules

R Dependencies

Other code

Running the Code

US Census Data

Runnning Data Pre-Processing

Navigating the Code

About

Releases

Packages

Languages

License

DPBayes/NAPSU-MQ-experiments

Folders and files

Latest commit

History

Repository files navigation

Installing Dependencies

Python Dependencies

Submodules

R Dependencies

Other code

Running the Code

US Census Data

Runnning Data Pre-Processing

Navigating the Code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages