Evaluating Differentially Private Machine Learning in Practice

This repository contains code used in the paper (https://arxiv.org/abs/1902.08874). The code evaluates the utility and privacy leakage of some differential private machine learning algorithms.

This code has been adapted from the code base (https://github.com/csong27/membership-inference) of membership inference attack work by Shokri et al. (https://ieeexplore.ieee.org/document/7958568).

Requirements

Python 2.7 or higher (https://www.anaconda.com/distribution/)
Tensorflow (https://www.tensorflow.org/install)
Tensorflow Privacy (https://github.com/tensorflow/privacy)

Pre-Processing Data Sets

Pre-processed CIFAR-100 data set has been provided in the dataset/ folder. Purchase-100 data set can be downloaded from Kaggle web site (https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data). This can be pre-processed using the preprocess_purchase.py scipt provided in the repository. For pre-processing other data sets, bound the L2 norm of each record to 1 and pickle the features and labels separately into $dataset_feature.p and $dataset_labels.p files in the dataset/ folder (where $dataset is a placeholder for the data set file name, e.g. for Purchase-100 data set, $dataset will be purchase_100).

Training the Non-Private Baseline Models for CIFAR

When you are running the code on a data set for the first time, run python attack_tf.py $dataset --save_data=1 on terminal. This will split the data set into random subsets for training and testing of target, shadow and attack models.

Run python attack_tf.py $dataset --target_model=$model --target_l2_ratio=$lambda on terminal.

For training optimal non-private baseline neural network on CIFAR-100 data set, we set $dataset='cifar_100', $model='nn' and $lambda=1e-4. For logsitic regression model, we set $dataset='cifar_100', $model='softmax' and $lambda=1e-5.

For training optimal non-private baseline neural network on Purchase-100 data set, we set $dataset='purchase_100', $model='nn' and $lambda=1e-8. For logsitic regression model, we set $dataset='cifar_100', $model='softmax' and $lambda=1e-5.

Training the Differential Private Models

Run python attack_tf.py $dataset --target_model=$model --target_l2_ratio=$lambda --target_privacy='grad_pert' --target_dp=$dp --target_epsilon=$epsilon on terminal. Where $dp can be set to 'dp' for naive composition, 'adv_cmp' for advanced composition, 'zcdp' for zero concentrated DP and 'rdp' for Renyi DP. $epsilon controls the privacy budget parameter. Refer to main block of attack_tf.py for other command-line arguments.

Simulating the Experiments from the Paper

Update the $dataset, $model and $lambda variables accordingly and run ./run_experiment.sh on terminal. Results will be stored in results/$dataset folder.

Run interpret_results.py $dataset --model=$model --l2_ratio=$lambda to obtain the plots and tabular results. Other command-line arguments are as follows:

--function prints the plots if set to 1 (default), or gives the membership revelation results if set to 2.
--plot specifies the type of plot to be printed
- 'acc' prints the accuracy loss comparison plot (default)
- 'attack' prints the privacy leakage due to Shokri et al. membership inference attack
- 'mem' prints the privacy leakage due to Yu et al. membership inference attack
- 'attr' prints the privacy leakage due to Yu et al. attribute inference attack
--silent specifies if the plot values are to be displayed (0) or not (1 - default)
--fpr_threshold sets the False Positive Rate threshold (refer the paper)

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attack.py		attack.py
attack_tf.py		attack_tf.py
classifier.py		classifier.py
classifier_tf.py		classifier_tf.py
combine_traces.py		combine_traces.py
interpret_results.py		interpret_results.py
preprocess_purchase.py		preprocess_purchase.py
requirements.txt		requirements.txt
run_experiment.sh		run_experiment.sh
run_experiment_slurm.sh		run_experiment_slurm.sh
run_experiment_tf.sh		run_experiment_tf.sh

License

Milkigit/EvaluatingDPML

Folders and files

Latest commit

History

Repository files navigation

Evaluating Differentially Private Machine Learning in Practice

Requirements

Pre-Processing Data Sets

Training the Non-Private Baseline Models for CIFAR

Training the Differential Private Models

Simulating the Experiments from the Paper

About

Resources

License

Stars

Watchers

Forks

Languages