<a href="https://colab.research.google.com/github/Alaass/ML-Jupyter-Development/blob/master/JSMA_attack.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Jacobian-based Saliency Map Attack (JSMA) 

### Based and adapted into a simplified version from work done by *Nicolas Papernot et al.* 
[The Limitations of Deep Learning in Adversarial Settings](https://arxiv.org/pdf/1511.07528.pdf)



## Import dependent libraries
Tensorflow 2.0 required


In [0]:
#Uncomment next line if tf 2.0 is not installed, if you're running on Google Colab by default it is not installed
#!pip install -q tensorflow==2.0.0b1

In [0]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)
print("GPU Available: ", tf.test.is_gpu_available())

##Train a simple model on the Fashion MNIST dataset
*This code snippet was extracted from [this](https://www.tensorflow.org/tutorials/keras/save_and_restore_models#save_the_entire_model) guide provided by the Tensorflow team

In [0]:
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat','Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
train_images = train_images / 255.0
test_images = test_images / 255.0
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)

#### The following are the labels for this model:

<table>
  <tr>
    <th>Label</th>
    <th>Class</th>
  </tr>
  <tr>
    <td>0</td>
    <td>T-shirt/top</td>
  </tr>
  <tr>
    <td>1</td>
    <td>Trouser</td>
  </tr>
    <tr>
    <td>2</td>
    <td>Pullover</td>
  </tr>
    <tr>
    <td>3</td>
    <td>Dress</td>
  </tr>
    <tr>
    <td>4</td>
    <td>Coat</td>
  </tr>
    <tr>
    <td>5</td>
    <td>Sandal</td>
  </tr>
    <tr>
    <td>6</td>
    <td>Shirt</td>
  </tr>
    <tr>
    <td>7</td>
    <td>Sneaker</td>
  </tr>
    <tr>
    <td>8</td>
    <td>Bag</td>
  </tr>
    <tr>
    <td>9</td>
    <td>Ankle boot</td>
  </tr>
</table>

## Implement the JSMA attack

### We must start off by choosing the image we will craft into an adversarial example. 


To do this, we pick a random image from the Test set. 
> *Doing this might still cause the prediction of that image to be wrong, but for the purposes of this guide it does not matter*

In [0]:
random_index = np.random.randint(test_images.shape[0])

image = test_images[random_index]
image_tensor = tf.convert_to_tensor(image.reshape((1,28,28))) #The .reshape just gives it the proper form to input into the model, a batch of 1 a.k.a a tensor
original_pred = np.argmax(model.predict(image_tensor))

We can then look at the image and at the prediction our model gave us.

In [0]:
plt.figure()
plt.grid(False)
plt.imshow(image, cmap=plt.cm.binary)
plt.xlabel(class_names[original_pred])

plt.show()

### Now, we create the attack

As the first step of the attack, the Jacobian Matrix of the model must be calculated.

This function defines the algorithm to obtain the Jacobian Matrix of the model's output with respect to the input image, or in other words the forward derivative of the model for the given input. That is the gradient of every pixel in respect to each label

It is defined by *N. Papernot et al.* as:
>$ \triangledown F(X) = \frac{\partial F(x)}{\partial X} = \left[ \frac{\partial F_{j}(x)}{\partial X_{i}} \right]_{i\in1..M, j\in 1..N}$



In [0]:
def getJacobian(num_labels, image, unrolled_size):
  jacobian = []
  
  for i in range(num_labels):
    with tf.GradientTape(watch_accessed_variables=False) as gt:   #Use tf.GradientTape to easily compute the forward derivative of the input for each label
      gt.watch(image)
      predictions = model(image)
      grads = gt.gradient(predictions[:,i], image)
      grads = tf.reshape(grads, [unrolled_size, 1]) # Unroll the matrix in order to get a single column of all features for every input
      jacobian.append(grads)

  jacobian = tf.concat(jacobian,1) # Stitch all separate arrays together into one matrix
  jacobian = jacobian.numpy()
    
  return jacobian

Then the second step is defined in the next function as such:

It takes the Jacobian matrix and the desired label you want the image to be classified like as inputs and gives them a score based on how much will changing that pixel affect the output of the model to the desired output. This is called a *Saliency Map*.

In [0]:
def getSaliency(jacobian, target_label):
  for i in range(jacobian.shape[1]):
    if i != target_label:
      jacobian[:,i] = np.where(jacobian[:,i] > 0, 0, jacobian[:,i]) 
      jacobian[:,i] = np.where(jacobian[:,i] < 0, jacobian[:,i] * (-1), jacobian[:,i]) 
    else:
      jacobian[:,i] = np.where(jacobian[:,i] < 0, 0, jacobian[:,i]) 
  return jacobian

### Still in Progress! Perturbation of image to create a misclassified version of it will be implemented next.


In [0]:
target_label = 7 # This corresponds to a Sneaker
num_labels = len(class_names)
unrolled_size = image.flatten().size
adv_image_tensor = tf.compat.v1.get_variable(initializer = image_tensor, name = 'adv_image_tensor')

In [0]:
max_iter = 5000
theta = 0.0001

for i in range(max_iter):
  if np.argmax(model.predict(adv_image_tensor.numpy())) != original_pred:
    print("The model predicted the target label")
    break
  
  j_matrix = getJacobian(num_labels, adv_image_tensor, unrolled_size)
  S_map = getSaliency(j_matrix, target_label) 
  
  pixel = np.sort(tf.math.argmax(S_map))[0]
  
  adv_image_tensor[0, (pixel // 28), (pixel % 28)].assign(adv_image_tensor[0, (pixel // 28), (pixel % 28)] + theta)
  
  #pixels = np.sort(tf.math.argmax(S_map))[] # Select best 2 pixels that will distort the image towards the target label 
  
  #for i in pixels:
    #adv_image_tensor[0, (i // 28), (i % 28)].assign(adv_image_tensor[0, (i // 28), (i % 28)] + theta)
  

In [0]:
#@title
plt.figure()
plt.grid(False)
plt.imshow(adv_image_tensor.numpy().reshape((28,28)), cmap=plt.cm.binary)
plt.xlabel(class_names[np.argmax(model.predict(adv_image_tensor.numpy()))])

plt.show()