<a href="https://colab.research.google.com/github/awsdevguru/PearsonMLFoundations/blob/dev/3_6_03_Hands_on_Lab_Adversarial_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Adversarial Detection

0) Objective

* Create, analyze, and detect adversarial inputs against image classifiers.
* Experience both attack (FGSM) and defense perspectives.
* Learn simple but practical ML security concepts: vulnerability, detection, robustness.

## 1) Setup
* Load CIFAR-10 dataset (airplanes, cars, birds, etc.).

In [None]:
!pip install tensorflow matplotlib numpy
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

* Load a pretrained model (e.g., tf.keras.applications.MobileNetV2 or a small CNN trained on CIFAR-10).

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_test = x_test.astype("float32") / 255.0
y_test = y_test.flatten()

## 2) Part 1: Generate Adversarial Examples

### a) Define the FGSM attack



In [None]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

def fgsm_attack(image, label, model, eps=0.02):
    with tf.GradientTape() as tape:
        tape.watch(image)
        prediction = model(image)
        loss = loss_object(label, prediction)
    gradient = tape.gradient(loss, image)
    signed_grad = tf.sign(gradient)
    adv_image = image + eps * signed_grad
    return tf.clip_by_value(adv_image, 0, 1)

### b) Apply FGSM

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10)
])
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the model for a few epochs (optional, but good for demonstration)
# For a more robust demonstration, you might want to train it on x_train/y_train
# This model is just a placeholder to resolve the NameError
# For the purpose of this exercise, we will not train it heavily.
print("Model created and compiled.")

In [None]:
idx = 0
image = tf.convert_to_tensor(x_test[idx:idx+1])
label = tf.convert_to_tensor(y_test[idx:idx+1])
adv = fgsm_attack(image, label, model)


### c) Visualize and compare

In [None]:
plt.subplot(1,2,1); plt.imshow(image[0]); plt.title("Original")
plt.subplot(1,2,2); plt.imshow(adv[0]); plt.title("Adversarial")
plt.show()


## 3) Part 2: Understand the Attack

### a) Measure Perturbation

In [None]:
l2 = np.linalg.norm((adv - image).numpy())
print("L2 perturbation:", l2)

### b) Evaluate model prediction & confidence

In [None]:
pred_orig = model.predict(image)
pred_adv = model.predict(adv)
print("Original:", np.argmax(pred_orig), "Adversarial:", np.argmax(pred_adv))

## 4) Part 3: Detection Strategies

### a) Statistical detection (pixel distribution)

In [None]:
def pixel_std(img): return np.std(img)
print("Std original:", pixel_std(image), "Std adversarial:", pixel_std(adv))


### b) Input preprocessing

Apply JPEG compression or Gaussian blur and re-evaluate prediction:

In [None]:
import cv2
blurred = cv2.GaussianBlur(adv[0].numpy(), (3,3), 0)
plt.imshow(blurred)


### c) Confidence-based detection

If model confidence drops sharply -> flag as suspicious.

In [None]:
conf_diff = np.max(pred_orig) - np.max(pred_adv)
print("Confidence drop:", conf_diff)


## 5) Part 4: Defense Implementation

### a) Simple defensive preprocessor

In [None]:
def defend_input(x_3d):
    # x_3d is expected to be a 3D tensor (H, W, C)
    return tf.image.random_jpeg_quality(x_3d, 70, 100)

### b) Adversarial training demonstration

Retrain the model briefly using a mix of clean + FGSM samples.
Evaluate improvement in robustness.

## 6) Evaluation

Compare model accuracy on clean vs. adversarial vs. defended inputs.

Visualize:

In [None]:
# First, let's ensure the model is trained, as evaluating an untrained model will yield very low accuracy.
# For demonstration purposes, we'll use placeholder values or run a quick training.

# Placeholder for now, you would typically train the model on x_train, y_train here.
if not model.built or not model.optimizer:
    print("Model needs to be compiled and possibly trained first.")
    # Simplified training for demonstration if the model hasn't been trained
    model.fit(x_train[:1000], y_train[:1000], epochs=1, verbose=0) # Train on a small subset for a quick run

# Evaluate on clean test data
_, acc_clean = model.evaluate(x_test, y_test, verbose=0)

# To get acc_adv and acc_defended, we would need to generate a full adversarial test set and then evaluate it.
# For now, let's use illustrative values or generate a small set.

# Generate adversarial examples for a subset of the test set for evaluation
adv_images = []
adv_labels = []
for i in range(len(x_test[:100])): # Process a small subset for speed
    image_tensor = tf.convert_to_tensor(x_test[i:i+1])
    label_tensor = tf.convert_to_tensor(y_test[i:i+1])
    adv_img = fgsm_attack(image_tensor, label_tensor, model, eps=0.02)
    adv_images.append(adv_img[0].numpy())
    adv_labels.append(y_test[i])

adv_images = np.array(adv_images)
adv_labels = np.array(adv_labels)

_, acc_adv = model.evaluate(adv_images, adv_labels, verbose=0)

# Evaluate on defended adversarial examples
# Pass each 3D image to defend_input and then stack them
defended_adv_images = tf.stack([defend_input(tf.convert_to_tensor(img)) for img in adv_images])
_, acc_defended = model.evaluate(defended_adv_images, adv_labels, verbose=0)

labels = ['Clean', 'Adversarial', 'Defended']
accs = [acc_clean, acc_adv, acc_defended]
plt.bar(labels, accs)
plt.ylabel('Accuracy')
plt.title('Model Accuracy on Different Inputs')
plt.show()

## 7) Discussion / Wrap-Up

**Key takeaways:**

* Even small perturbations can fool high-accuracy models.
* Detection is difficult; adversarial examples mimic normal inputs.
* Combining preprocessing, ensembles, and adversarial training improves resilience.
* Continuous monitoring and retraining are essential in production.