This repository is the Python implementation of the paper Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks. This work innovatively leverages auto-encoder-based detectors as a defense method against different poisoning attacks. Furthermore, this paper shows that auto-encoders can succeed in mitigating the attacks without prior knowledge regarding the type of attack and without access to any trusted clean data in advance.
@article{razmi2021classification,
title={Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks},
author={Razmi, Fereshteh and Xiong, Li},
journal={arXiv preprint arXiv:2108.04206},
year={2021}
}
The code consists of four sequential parts; generating indices, conducting the poisoining attacks, training the detectors and finally utilizing the detectors for the defense. The following commands are meant to replicate the results of the experiments in the paper for CIFAR-10 dataset.
First indices of training/test/validation data for cross validation on all the runs (60 in the paper) are generated by:
python attack.py —attack_step index_generation —dataset cifar10
These commands generate poisoned points for each attack indicated in the paper:
python attack.py —attack_step poison_generation —dataset cifar10 --attack_type flipping
python attack.py —attack_step poison_generation —dataset cifar10 --attack_type optimal
python attack.py —attack_step poison_generation —dataset cifar10 --attack_type opt-notlabel
python attack.py —attack_step poison_generation —dataset cifar10 --attack_type mixed
Next step is to train detectors CAE+ (including RAE and CAE) and Magnets (including two models). Training is done on different percentages of poisoned data and different attack types. All can be done by the shell script below. This codes takes a while to be completed. It is better to run this section on a GPU machine.
./run_train.sh
Finally detectors performance can be assessed using this command:
python assess_detectors.py —dataset cifar10
We modified the code from SecML repository to generate the optimal attacks. The modified version is included in the current repository.