This repository is to produce the results from the NeurIPS 2022 paper Algorithms that Approximate Data Removal: New Results and Limitations.
This code is heavily based on Certified Data Removal from Machine Learning Models
torch, torchvision, scikit-learn, pytorch-dp
We assume the following project directory structure:
<root>/
--> save/
--> final_results/
Training a (0.1, 1e-5)-differentially private feature extractor for SVHN:
python train_svhn.py --data-dir <SVHN path> --train-mode private --std 6 --delta 1e-5 --normalize --save-model
Extracting features using the differentially private extractor:
python train_svhn.py --data-dir <SVHN path> --test-mode extract --std 6 --delta 1e-5
Training a removal-enabled binary logistic regression classifier for MNIST 3 vs. 8 and removing 1000 training points:
python ./scripts/test_removal_<method>.py --data-dir <MNIST path> --verbose --extractor none --dataset MNIST --train-mode binary --std 0.01 --lam 1e-3 --num-steps 100
Training a removal-enabled binary logistic regression classifier for MNIST 3 vs. 8 and removing 1000 training points:
python ./scripts/test_removal_<method>.py --data-dir <SVHN path> --verbose --extractor none --dataset SVHN --train-mode binary --std 0.01 --lam 1e-3 --num-steps 2500
Training a removal-enabled binary logistic regression classifier for MNIST 3 vs. 8 and removing 1000 training points:
python ./scripts/test_removal_<method>_prox.py --data-dir <SVHN path> --verbose --extractor none --dataset SVHN --train-mode binary --std 0.01 --lam 1e-3 --num-steps 1000
where the method tag can be filled with exact (retraining), sekhari, IJ (our method).
This code builds on code from the following paper:
Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens van der Maaten. Certified Data Removal from Machine Learning Models. ICML 2020.