Click the emoji for Writeup: 😈
Neural networks are vulnerable to adversarial attacks. Attackers can intentionally design inputs to cause the model to make mistakes. In our work, we attack convolutional neural networks (CNN) used to classify images in the MNIST dataset.
Many adversarial attacks exist (Biggio, Szegedy’s L-BFGS, Goodfellow’s FGSM), and are able to generate adversarial examples quickly. However, the defense strategy (distilled defense) proposed by (Papernot et al. 2016) was shown to be successful against these attacks. An attack proposed by Carlini and Wagner counterattacks this distilled defense strategy.
In our work, we implement Carlini and Wagner’s attack on a CNN used to classify digits from the MNIST dataset. The Carlini Wagner attack is a state-of-the-art attack in the field of adversarial machine learning, and has been frequently used to benchmark the robustness of machine learning models.
Below is an algorithm that generates adversial examples. It is one of the recent powerful ones that is used as a benchmark to test a model's robustness.
2017 - "Towards evaluating the robustness of neural networks" - Carlini & Walker
Below are reviews of adversial attacks. There is a Github repo associated with one of the papers.
GitHub - A Review of Adversarial Attack and Defense for Classification
2021 - "A Review of Adversarial Attack and Defense for Classification" - Li
2019 "Adversarial Attacks and Defenses in Images, Graphs and Text: A Review" - Xu
Below describe physical attacks. These are the "tape on a stop sign" papers.
2018 "Physical Adversarial Examples for Object Detectors" - Eykholt
2018 "Robust Physical-World Attacks on Deep Learning Visual Classification" - Eykholt
Below describes a defense that is effective against earlier attacks (Goodfellow's FGSM and Szegedy’s L-BFGS).
Earlier works propose that existence of adversial examples in neural networks arise since the models do not generalize well, which may be due to their high complexity. However, the following work shows that even linear models are vulnerable. The authors describe a simple method (a.k.a. Fast Gradient Sign Method FGSM) to generate adversial examples, and uses adversial training to reduce error on MNIST classification.
2015 "Explaining and Harnessing Adversial Examples" - Goodfellow