Implementations of the ICML 2017 paper (with Yarin Gal)
Switch branches/tags
Nothing to show
Clone or download
Latest commit c2f1733 Sep 1, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
BBalpha_dropout.py Initial commit Sep 1, 2017
LICENSE Initial commit Sep 1, 2017
README.md Update README.md Sep 1, 2017
adversarial_test.py Modified comments Sep 1, 2017
adversarial_test_targeted.py Modified comments Sep 1, 2017
attacks.py Initial commit Sep 1, 2017
attacks_tf.py Initial commit Sep 1, 2017
loading_utils.py Initial commit Sep 1, 2017
template_model.py Initial commit Sep 1, 2017
train_model.py Initial commit Sep 1, 2017
utils.py Initial commit Sep 1, 2017
utils_tf.py Initial commit Sep 1, 2017

README.md

Dropout + BB-alpha for detecting adversarial examples

Thank you for your interest in our paper:

Yingzhen Li and Yarin Gal

Dropout inference in Bayesian neural networks with alpha-divergences

International Conference on Machine Learning (ICML), 2017

Please consider citing the paper when any of the material is used for your research.

Contributions: Yarin wrote most of the functions in BBalpha_dropout.py, and Yingzhen (me) derived the loss function and implemented the adversarial attack experiments.

how to use this code for your research

I've got quite a few emails on how to incorporate our method into their Keras code. Thus here I also provide a template file, and you can follow the comments inside to plugin your favourate model and dropout method.

template file: template_model.py

repreducing the adversarial attack example

We also provide the adversarial attack detection codes. The attack implementation was adapted from the cleverhans toolbox (version 1.0), and I rewrote the targeted attack to make it an iterative method.

To reproduce the experiments, first train a model on mnist:

python train_model.py K_mc alpha nb_layers nb_units p model_arch

with K_mc the number of MC samples for training, nb_layers the number of layers of the NN, nb_units the number of hidden units in each hidden layer, p the dropout rate (between 0 and 1), and model_arch = mlp or cnn

This will train a model on MNIST data for 500 iterations and save the model. Then to test the FGSM attack, run

python adversarial_test.py

and change the settings in that python file to pick a saved model for testing. If wanted to see targeted attack, run instead

python adversarial_test_targeted.py

Both files will produce a png file visualising the accuracy, predictive entropy, and some samples of the adversarial image (aligned with the x-axis in the plots).