Adversarial Attack for Image Classification

Click the emoji for Writeup: 😈

Neural networks are vulnerable to adversarial attacks. Attackers can intentionally design inputs to cause the model to make mistakes. In our work, we attack convolutional neural networks (CNN) used to classify images in the MNIST dataset.

Many adversarial attacks exist (Biggio, Szegedy’s L-BFGS, Goodfellow’s FGSM), and are able to generate adversarial examples quickly. However, the defense strategy (distilled defense) proposed by (Papernot et al. 2016) was shown to be successful against these attacks. An attack proposed by Carlini and Wagner counterattacks this distilled defense strategy.

In our work, we implement Carlini and Wagner’s attack on a CNN used to classify digits from the MNIST dataset. The Carlini Wagner attack is a state-of-the-art attack in the field of adversarial machine learning, and has been frequently used to benchmark the robustness of machine learning models.

References

Below is an algorithm that generates adversial examples. It is one of the recent powerful ones that is used as a benchmark to test a model's robustness.

GitHub - Carlini & Walker

2017 - "Towards evaluating the robustness of neural networks" - Carlini & Walker

Below are reviews of adversial attacks. There is a Github repo associated with one of the papers.

GitHub - A Review of Adversarial Attack and Defense for Classification

2021 - "A Review of Adversarial Attack and Defense for Classification" - Li

2019 "Adversarial Attacks and Defenses in Images, Graphs and Text: A Review" - Xu

Below describe physical attacks. These are the "tape on a stop sign" papers.

2018 "Physical Adversarial Examples for Object Detectors" - Eykholt

2018 "Robust Physical-World Attacks on Deep Learning Visual Classification" - Eykholt

Below describes a defense that is effective against earlier attacks (Goodfellow's FGSM and Szegedy’s L-BFGS).

2016 "Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks" - Papernot

Earlier works propose that existence of adversial examples in neural networks arise since the models do not generalize well, which may be due to their high complexity. However, the following work shows that even linear models are vulnerable. The authors describe a simple method (a.k.a. Fast Gradient Sign Method FGSM) to generate adversial examples, and uses adversial training to reduce error on MNIST classification.

2015 "Explaining and Harnessing Adversial Examples" - Goodfellow

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
fig		fig
main		main
models/vanilla		models/vanilla
.gitignore		.gitignore
README.md		README.md
poster.jpg		poster.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial Attack for Image Classification

References

About

Releases

Packages

Contributors 3

Languages

Mike-Do/adversarial-attack

Folders and files

Latest commit

History

Repository files navigation

Adversarial Attack for Image Classification

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages