In [None]:
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import fashion_mnist
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
import tensorflow as tf
from skimage.color import gray2rgb
import shap
from tensorflow.keras.applications import VGG16
from tensorflow.keras.preprocessing.image import img_to_array


In [None]:

# Load the FashionMNIST dataset
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

# Preprocess the data
X_train = X_train / 255.0
X_test = X_test / 255.0
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

# Define the model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=5, batch_size=32, validation_data=(X_test, y_test))

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']



The code starts by importing the necessary libraries and modules for building and training the model, loading the dataset, and performing the adversarial attack.

The FashionMNIST dataset is loaded and preprocessed. The pixel values of the images are scaled to the range [0, 1] by dividing them by 255.0. The data is reshaped to have a shape of (-1, 28, 28, 1) to fit the input shape of the convolutional neural network.

A Sequential model is defined, consisting of convolutional and pooling layers, followed by flattening and dense layers. The model is compiled with the Adam optimizer and sparse categorical cross-entropy loss function.

The model is trained using the training data for 5 epochs with a batch size of 32. The validation data is used to evaluate the model's performance during training.

The class names for the FashionMNIST dataset are defined for later use in displaying the results.

In [None]:

# Select two examples
example1_idx = np.argmax(y_test == 0)
example2_idx = np.argmax(y_test == 1)

example1 = X_test[example1_idx]
example2 = X_test[example2_idx]

# Convert grayscale image to RGB
example2_rgb = gray2rgb(example2.reshape(28, 28))

# Make predictions for the examples
example1_pred = model.predict(example1.reshape(1, 28, 28, 1))
example2_pred = model.predict(example2.reshape(1, 28, 28, 1))

# Get the predicted classes and probabilities
example1_class = np.argmax(example1_pred)
example1_prob = example1_pred[0, example1_class]
example1_second_prob = np.sort(example1_pred)[0, -2]
example1_least_prob = np.sort(example1_pred)[0, -1]

example2_class = np.argmax(example2_pred)
example2_prob = example2_pred[0, example2_class]
example2_second_prob = np.sort(example2_pred)[0, -2]
example2_least_prob = np.sort(example2_pred)[0, -1]

print(f"Example 1: True Class - {class_names[0]}")
print(f"           Predicted Class - {class_names[example1_class]}")
print(f"           Probability - {example1_prob:.4f}")
print(f"           Second Most Likely Probability - {example1_second_prob:.4f}")
print(f"           Least Likely Probability - {example1_least_prob:.4f}")
print()
print(f"Example 2: True Class - {class_names[1]}")
print(f"           Predicted Class - {class_names[example2_class]}")
print(f"           Probability - {example2_prob:.4f}")
print(f"           Second Most Likely Probability - {example2_second_prob:.4f}")
print(f"           Least Likely Probability - {example2_least_prob:.4f}")
print()

Two examples (one with the true class of T-shirt/top and the other with the true class of Trouser) are selected from the test data.

The grayscale image of the second example is converted to RGB format using the gray2rgb function from the skimage library.

The model predicts the classes and probabilities for the two examples.

The predicted classes, probabilities, and other information for the two examples are printed.

In [None]:
# Adversarial attack using FGSM

# Set the epsilon value for FGSM
epsilon = 0.1

# Convert the examples to tensors
example1_tensor = tf.convert_to_tensor(example1.reshape(1, 28, 28, 1), dtype=tf.float32)
example2_tensor = tf.convert_to_tensor(example2.reshape(1, 28, 28, 1), dtype=tf.float32)

# Use persistent gradient tape to compute gradients
with tf.GradientTape(persistent=True) as tape:
    tape.watch(example1_tensor)
    tape.watch(example2_tensor)

    # Compute the loss for the original examples
    original_loss1 = tf.keras.losses.sparse_categorical_crossentropy(y_test[example1_idx], model(example1_tensor))
    original_loss2 = tf.keras.losses.sparse_categorical_crossentropy(y_test[example2_idx], model(example2_tensor))

# Compute the gradients of the loss with respect to the input examples
gradient1 = tape.gradient(original_loss1, example1_tensor)
gradient2 = tape.gradient(original_loss2, example2_tensor)

# Generate the adversarial examples
perturbed_example1 = example1_tensor + epsilon * tf.sign(gradient1)
perturbed_example2 = example2_tensor + epsilon * tf.sign(gradient2)

The adversarial attack in this code is implemented using the Fast Gradient Sign Method (FGSM), which is a simple yet effective technique for generating adversarial examples. Adversarial examples are input samples that are intentionally crafted to cause misclassification or unexpected behavior in machine learning models.

Here's a breakdown of the adversarial attack part of the code:

The epsilon value is set to determine the magnitude of the perturbation. It controls the size of the change applied to the original examples. A smaller epsilon value corresponds to a smaller change, while a larger epsilon value allows for a larger change.

The original examples, example1 and example2, are converted to tensors (example1_tensor and example2_tensor) of data type tf.float32 to work with TensorFlow operations.

A persistent gradient tape is created using tf.GradientTape(persistent=True). The persistent=True argument allows multiple gradient computations to be performed within the same tape context.

The example1_tensor and example2_tensor are watched by the gradient tape using tape.watch() to ensure that the gradients with respect to these tensors are computed.

The original loss for example1 and example2 is computed using the sparse_categorical_crossentropy loss function and the model's predictions for the respective examples.

The gradients of the original loss with respect to example1_tensor and example2_tensor are computed using tape.gradient(original_loss1, example1_tensor) and tape.gradient(original_loss2, example2_tensor).

The sign of the gradients is taken using tf.sign() to obtain the direction of change that maximizes the loss. This step determines the direction in which to perturb the examples.

The perturbed examples, perturbed_example1 and perturbed_example2, are generated by adding the perturbation to the original examples. The perturbation is obtained by multiplying the sign of the gradients with the epsilon value and adding it to the original examples.

The perturbed examples are then used to make predictions (perturbed_example1_pred and perturbed_example2_pred) using the trained model.



In [None]:
# Get the predicted classes and probabilities for the adversarial examples
perturbed_example1_pred = model.predict(perturbed_example1)
perturbed_example2_pred = model.predict(perturbed_example2)

perturbed_example1_class = np.argmax(perturbed_example1_pred)
perturbed_example1_prob = perturbed_example1_pred[0, perturbed_example1_class]

perturbed_example2_class = np.argmax(perturbed_example2_pred)
perturbed_example2_prob = perturbed_example2_pred[0, perturbed_example2_class]

print("Adversarial Examples (FGSM):")
print(f"Example 1: Predicted Class - {class_names[perturbed_example1_class]}")
print(f"           Probability - {perturbed_example1_prob:.4f}")
print()
print(f"Example 2: Predicted Class - {class_names[perturbed_example2_class]}")
print(f"           Probability - {perturbed_example2_prob:.4f}")
print()

The predicted classes and probabilities for the perturbed examples are obtained (perturbed_example1_class, perturbed_example1_prob, perturbed_example2_class, perturbed_example2_prob).

Finally, the predicted classes and probabilities for the perturbed examples are printed, showing the effect of the adversarial attack on the model's predictions.

The adversarial attack in this code modifies the original examples by introducing small changes based on the gradient information of the model. These changes are carefully crafted to deceive the model into making incorrect predictions. By controlling the epsilon value, you can adjust the level of perturbation and observe the impact on the model's behavior. Adversarial attacks like FGSM highlight the vulnerability of machine learning models and the need for robust defenses against such attacks.

In [None]:
# SHAP Explanation
background = X_train[np.random.choice(X_train.shape[0], 100, replace=False)]
explainer = shap.GradientExplainer(model, background)
shap_values = explainer.shap_values(example1.reshape(1, 28, 28, 1))
shap.image_plot(shap_values, -example1.reshape(1, 28, 28, 1))

The SHAP (SHapley Additive exPlanations) explanation method is used to generate SHAP values for the first example. The background dataset is randomly sampled from the training data, and the GradientExplainer is created using the model and background data. SHAP values are computed for the example and plotted as an image using shap.image_plot.


In [None]:
# DeepLIFT Explanation
explainer = shap.DeepExplainer(model, background)
deeplift_values = explainer.shap_values(example2.reshape(1, 28, 28, 1))
shap.image_plot(deeplift_values, -example2.reshape(1, 28, 28, 1))


The DeepLIFT explanation method is used to generate DeepLIFT values for the second example. The DeepExplainer is created using the model and background data. DeepLIFT values are computed for the example and plotted as an image using shap.image_plot.


In [None]:
# Integrated Gradients Explanation
baseline = np.zeros_like(example2).reshape(1, 28, 28, 1)
explainer = shap.GradientExplainer(model, baseline)
shap_values = explainer.shap_values(example2.reshape(1, 28, 28, 1))
shap.image_plot(shap_values, -example2.reshape(1, 28, 28, 1))


The Grad-CAM (Gradient-weighted Class Activation Mapping) explanation method is used to generate a heatmap for the second example. The RGB image is preprocessed by scaling its values and converting it to an array. A Grad-CAM model is created using the intermediate layer outputs and the model's final output. A gradient tape is used to compute the gradients of the predicted class's output with respect to the intermediate layer outputs. The gradients are multiplied with the intermediate layer outputs and averaged along the channel axis to obtain the heatmap. The heatmap is normalized and displayed using plt.imshow.

The code ends here, and the results of the adversarial attack and explanations are displayed.