This repository implements and demonstrates adversarial attacks on image classification models across multiple datasets.
We explore two types of adversarial attacks:
- FGSM (Fast Gradient Sign Method) - A fast, single-step attack
- C&W (Carlini & Wagner L2) - A powerful optimization-based attack
Both attacks are implemented as:
- Untargeted - Cause any misclassification
- Targeted - Force classification to a specific class
| Dataset | Image Size | Channels | Classes | Training Samples |
|---|---|---|---|---|
| MNIST | 28×28 | Grayscale | 10 digits | 60,000 |
| CIFAR-10 | 32×32 | RGB | 10 objects | 50,000 |
| STL-10 | 96×96 | RGB | 10 objects | 5,000 |
├── models/
│ ├── model.py # MNIST model (SimpleNet)
│ ├── cifar_model.py # CIFAR-10 model (CIFARNet)
│ └── stl_model.py # STL-10 model (STLNet)
│
├── training/
│ ├── train_model.py # Train MNIST classifier
│ ├── train_cifar.py # Train CIFAR-10 classifier
│ └── train_stl.py # Train STL-10 classifier
│
├── attacks/
│ ├── fgsm_attack.py # FGSM attack on MNIST
│ ├── fgsm_attack_cifar.py # FGSM attack on CIFAR-10
│ ├── fgsm_attack_stl.py # FGSM attack on STL-10
│ ├── cw_attack.py # C&W attack on MNIST
│ ├── cw_attack_cifar.py # C&W attack on CIFAR-10
│ ├── cw_attack_stl.py # C&W attack on STL-10
│ ├── targeted_attack.py # Targeted attacks on MNIST
│ └── targeted_attack_stl.py # Targeted attacks on STL-10
│
├── results/ # Generated images from attacks
│
├── data/ # Downloaded datasets (gitignored)
├── requirements.txt
└── README.md
pip install -r requirements.txtRequirements:
- PyTorch >= 2.0.0
- torchvision >= 0.15.0
- matplotlib >= 3.7.0
- numpy >= 1.24.0
# Train MNIST model (~99% accuracy)
python train_model.py
# Train CIFAR-10 model (~75-80% accuracy)
python train_cifar.py
# Train STL-10 model (~60-70% accuracy)
python train_stl.py# FGSM attacks
python fgsm_attack.py # MNIST
python fgsm_attack_cifar.py # CIFAR-10
python fgsm_attack_stl.py # STL-10
# C&W attacks
python cw_attack.py # MNIST
python cw_attack_cifar.py # CIFAR-10
python cw_attack_stl.py # STL-10# Force MNIST digits to classify as "1"
python targeted_attack.py
# Force STL-10 images to classify as "bird"
python targeted_attack_stl.py| Aspect | FGSM | C&W |
|---|---|---|
| Speed | ~0.001 ms/sample | ~100-500 ms/sample |
| Perturbation | Larger, visible | Smaller, imperceptible |
| Success Rate | Lower | Higher |
| Targeted | Less effective | More effective |
-
Higher resolution = less visible perturbation: STL-10 (96×96) adversarial examples are nearly indistinguishable from originals, while MNIST (28×28) perturbations are more noticeable.
-
C&W is slower but more powerful: C&W achieves higher success rates with smaller perturbations, especially for targeted attacks.
-
Epsilon scaling: Larger images require smaller epsilon values:
- MNIST: ε = 0.1–0.3
- CIFAR-10: ε = 0.01–0.1
- STL-10: ε = 0.005–0.05
Attack scripts save results to results/:
results/fgsm_*_clean_vs_adv.png- FGSM attack comparisonsresults/cw_*_clean_vs_adv.png- C&W attack comparisonsresults/targeted_*.png- Targeted attack results
All scripts automatically detect and use:
- CUDA (NVIDIA GPUs)
- MPS (Apple Silicon)
- CPU (fallback)