# Adversarial Training

**Goal:** Train a robust CIFAR-10 classifier that maintains accuracy under attack

---

### Overview

In this notebook, we will:
1. Load clean training images (45,000 samples)
2. Load FGSM adversarial images (45,000 samples, ε=0.03)
3. Load PGD adversarial images (45,000 samples, ε=0.03)
4. Create augmented training set by mixing 50% clean + 25% FGSM + 25% PGD
5. Train a new robust classifier on this mixed dataset
6. Save the robust model for evaluation

---

### Why Adversarial Training?

Standard models are vulnerable because they only see clean data during training. Adversarial training exposes the model to attacks during training, teaching it to be robust. **This is the most effective defense against adversarial examples.**

**Trade-off:**
- Lower clean accuracy (model becomes more conservative)
- Much higher adversarial accuracy (model learns robust features)

### Setup and Imports

In [1]:
import sys
sys.path.append('../')
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from src.data.cifar10 import load_cifar10
from src.attacks.fgsm import FGSMAttack
from src.attacks.pgd import PGDAttack

np.random.seed(111)
tf.random.set_seed(111)
print("Successfully imported everything")

Successfully imported everything


### Load Clean Training Data

Let's start with the original CIFAR-10 training set (45,000 images after validation split):

In [2]:
(training_images, training_labels), (validation_images, validation_labels), (testing_images, testing_labels), class_names = load_cifar10(validation_split=0.1)

print(f"Clean training images: {training_images.shape}")
print(f"Clean training labels: {training_labels.shape}")
print(f"Validation images: {validation_images.shape}")
print(f"Testing images: {testing_images.shape}")


Final splits:
Training:   45000 samples
Validation: 5000 samples
Testing:    10000 samples
Clean training images: (45000, 32, 32, 3)
Clean training labels: (45000,)
Validation images: (5000, 32, 32, 3)
Testing images: (10000, 32, 32, 3)


### Load Standard Classifier

This is the model we trained in **03_adversarial_attacks.ipynb**. We'll use it to generate fresh adversarial examples for training:

In [3]:
standard_classifier = keras.models.load_model('../results/models/standard_classifier.h5')

# Quick evaluation
test_loss, test_accuracy = standard_classifier.evaluate(
    testing_images[:1000],
    testing_labels[:1000],
    verbose=0
)

print(f"Standard Model on Clean Test Data:")
print(f"Test accuracy: {test_accuracy:.2%}")
print(f"Test loss: {test_loss:.4f}")

FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = '../results/models/standard_classifier.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)