# Universidad del Valle de Guatemala
## Security Data Science - 10
* Jose Abraham Gutierrez Corado - 19111

## Laboratorio 8: Defensa contra ataques de modelos de Deep Learning

#### Import libraries

In [79]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    BatchNormalization, SeparableConv2D, MaxPooling2D, Activation, Flatten, Dropout, Dense
)
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from art.estimators.classification import KerasClassifier #No soporta TF 2
from art.attacks.evasion import FastGradientMethod
from art.utils import load_dataset
from art.defences.preprocessor import SpatialSmoothing
import numpy as np
from art.defences.trainer import AdversarialTrainer
from art.attacks.extraction import CopycatCNN
from sklearn.model_selection import train_test_split
from art.defences.postprocessor import ReverseSigmoid


# Disabling eager execution from TF 2
tf.compat.v1.disable_eager_execution()

#### Load model

In [2]:
vulnerable_model = tf.keras.models.load_model("target_model_redone")

#### Load Test and Train data used for the model

In [3]:
X_train = np.load('X_train_redone.npy')
X_test = np.load('X_test_redone.npy')
y_train = np.load('y_train_redone.npy')
y_test = np.load('y_test_redone.npy')

<h1 style="color:rgb(102, 166, 38);">Evasion Attack (this is going to be the advertial attack since FastGradientMethod changes the images visually)</h1>

<h4 style="color:rgb(102, 166, 38);">Create two classifiers: one for the attack and the other for defense</h4>

In [36]:
vulnerable_classifier = KerasClassifier(vulnerable_model)

robust_classifier = KerasClassifier(vulnerable_model)

<h4 style="color:rgb(102, 166, 38);">Attack the model data with FastGradientMethod</h4>

In [51]:
attack = FastGradientMethod(
    estimator=vulnerable_classifier, 
    eps=0.01
    )

<h4 style="color:rgb(102, 166, 38);">Create the adversial model for defense and training with the clean data</h4>

In [52]:
adversarial_trainer = AdversarialTrainer(robust_classifier, attack)

In [53]:
adversarial_trainer.fit(X_train, y_train, nb_epochs=10)

Precompute adv samples:   0%|          | 0/1 [00:00<?, ?it/s]

Adversarial training epochs:   0%|          | 0/10 [00:00<?, ?it/s]

<h4 style="color:rgb(102, 166, 38);">Create test from altered images</h4>

In [54]:
# Generate adversarial examples using Fast Gradient Method
x_test_adv = attack.generate(X_test)

<h4 style="color:rgb(102, 166, 38);">Results</h4>

In [57]:
score_clean = vulnerable_classifier._model.evaluate(x=X_test, y=y_test)
score_fgm = vulnerable_classifier._model.evaluate(x=x_test_adv, y=y_test)

# Comparing test losses
print("------ TEST METRICS OF VULNERABLE MODEL ------")
print(f"Clean test loss: {score_clean[0]:.2f} " 
      f"vs FGM test loss: {score_fgm[0]:.2f}")

# Comparing test accuracies
print(f"Clean test accuracy: {score_clean[1]:.2f} " 
      f"vs FGM test accuracy: {score_fgm[1]:.2f}")

------ TEST METRICS OF VULNERABLE MODEL ------
Clean test loss: 0.16 vs FGM test loss: 0.50
Clean test accuracy: 0.96 vs FGM test accuracy: 0.81


In [58]:
# Evaluating the performance of the robust classifier on adversarial images
score_robust_fgm = robust_classifier._model.evaluate(x=x_test_adv, y=y_test)

# Comparing test losses
print("------ TEST METRICS OF ROBUST VS VULNERABLE MODEL ON ADVERSARIAL SAMPLES ------")
print(f"Robust model test loss: {score_robust_fgm[0]:.2f} " 
      f"vs vulnerable model test loss: {score_fgm[0]:.2f}")

# Comparing test accuracies
print(f"Robust model test accuracy: {score_robust_fgm[1]:.2f} " 
      f"vs vulnerable model test accuracy: {score_fgm[1]:.2f}")

------ TEST METRICS OF ROBUST VS VULNERABLE MODEL ON ADVERSARIAL SAMPLES ------
Robust model test loss: 0.50 vs vulnerable model test loss: 0.50
Robust model test accuracy: 0.81 vs vulnerable model test accuracy: 0.81


1. Clean test loss: 0.16 vs FGM test loss: 0.50
    The clean test loss is 0.16. This means the model lost in average 16% of it's value with non-adversarial test samples. On the other hand, FGM test loss is 0.50, which indicates a higher loss compared to the clean test loss. A higher loss value suggests that the model's predictions on the adversarial examples deviate more from the ground truth labels.
    
2. Clean test accuracy: 0.96 vs FGM test accuracy: 0.81
    The clean test accuracy is 0.96, indicating that the vulnerable model correctly predicts the class label for 96% of the clean test samples. On the other hand, the FGM test accuracy is 0.81, which is lower than the clean test accuracy. A lower accuracy value suggests that the model's performance is compromised on the adversarial examples, indicating vulnerability to adversarial attacks.

Overall, the metrics demonstrate that the vulnerable model performs well on clean test samples with high accuracy and low loss. However, when exposed to adversarial examples generated using the FGM attack, the model's accuracy decreases, and the loss increases, indicating vulnerability to adversarial attacks. 

3. Robust model test loss: 0.50 vs vulnerable model test loss: 0.50
    The robust model test loss is 0.50. This means the model lost in average 50% of it's value with adversarial test samples. On the other hand, the vulnerable model test loss is also 0.50. Both models have the same loss value, indicating similar deviations of their predictions from the ground truth labels on the adversarial examples.
    
4. Robust model test accuracy: 0.81 vs vulnerable model test accuracy: 0.81
    The robust model achieves a test accuracy of 0.81, indicating that it correctly predicts the class label for 81% of the adversarial samples. The vulnerable model also achieves a test accuracy of 0.81, which is the same as the robust model. Both models perform equally in terms of accuracy on the adversarial examples.
  
The test metrics indicate that both the robust model and the vulnerable model have the same test loss and test accuracy on the adversarial samples. This implies that the defense mechanism employed in the robust model is effective in mitigating the impact of adversarial attacks and maintaining comparable performance to the vulnerable model.

However, having the same values means that there is some level of vulnerability and this could probably be because of the size of the images and how when they were resized some of them could look similar, so is hard for the models to predict to which family they belong and the FastGradientMethod can be more dangerous because it means that it has less space to have effect compared to an image with bigger resolution.

<h1 style="color:rgb(252, 186, 3);">Extraction Attack</h1>

<h4 style="color:rgb(252, 186, 3);"> Model from Lab 6 </h4>

In [60]:
num_classes = 25

def malware_model():
    Malware_model = Sequential()
    Malware_model.add(Conv2D(30, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=(100,100,3)))

    Malware_model.add(MaxPooling2D(pool_size=(2, 2)))
    Malware_model.add(Conv2D(15, (3, 3), activation='relu'))
    Malware_model.add(MaxPooling2D(pool_size=(2, 2)))
    Malware_model.add(Dropout(0.25))
    Malware_model.add(Flatten())
    Malware_model.add(Dense(128, activation='relu'))
    Malware_model.add(Dense(num_classes, activation='softmax'))
    Malware_model.compile(loss='categorical_crossentropy', optimizer = 'adam', metrics=['accuracy'])
    return Malware_model

<h4 style="color:rgb(252, 186, 3);"> Split data 50/50 so it seems that half of the data was stolen </h4>

In [95]:
X_train_1, X_train_2, y_train_1, y_train_2 = train_test_split(X_train, y_train, test_size=0.5)
#X_train_2 and y_train_2 is the "stolen" data

<h4 style="color:rgb(252, 186, 3);"> Create and train the model with clean data. </h4>

In [96]:
vulnerable_model = malware_model()
vulnerable_model.fit(
    x=X_train_1, 
    y=y_train_1, 
    epochs=10
    )

Train on 3268 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1bf1ddfe020>

<h4 style="color:rgb(252, 186, 3);"> Initialize postprocessor for the data defense and create classifiers, models and copycats for each classifier.</h4>

In [103]:
# Initializing the postprocessor
postprocessor = ReverseSigmoid(
    beta=1.0, 
    gamma=0.2
    )

# Unprotected classifier
unprotected_classifier = KerasClassifier(
    model=malware_model())

# Protected classifier
protected_classifier = KerasClassifier(
    model=malware_model(),
    postprocessing_defences=postprocessor)

In [104]:
# Unprotected model
unprotected_model = KerasClassifier(model=malware_model())

# Protected model
protected_model = KerasClassifier(model=malware_model())

In [105]:
# Unprotected CopycatCNN
unprotected_copycat = CopycatCNN(
  nb_epochs=10,
  nb_stolen=len(X_train_2),
  classifier=unprotected_classifier
)

# Protected CopycatCNN
protected_copycat = CopycatCNN(
  nb_epochs=10,
  nb_stolen=len(X_train_2),
  classifier=protected_classifier
)

<h4 style="color:rgb(252, 186, 3);"> Train the unprotected model </h4>

In [106]:
unprotected_stolen_classifier = unprotected_copycat.extract(
    x=X_train_2, 
    y=y_train_2, 
    thieved_classifier=unprotected_model
    )

Train on 3269 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<h4 style="color:rgb(252, 186, 3);"> Train the protected model </h4>

In [107]:
protected_stolen_classifier = protected_copycat.extract(
    x=X_train_2, 
    y=y_train_2, 
    thieved_classifier=protected_model
    )

Train on 3269 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<h4 style="color:rgb(252, 186, 3);"> Results </h4>

In [109]:
score_clean = vulnerable_classifier._model.evaluate(x=X_test, y=y_test)
score_stolen = unprotected_stolen_classifier._model.evaluate(x=X_test, y=y_test)
score_protected = protected_stolen_classifier._model.evaluate(x=X_test, y=y_test)

# Comparing test losses
print("------ LOSS AND ACCURACY METRICS FOR CLEAN AND STOLEN MODELS ------")
print(f"Clean test loss: {score_clean[0]:.2f} ","\n", 
      f"Stolen test loss: {score_stolen[0]:.2f}","\n",
      f"Protected test loss: {score_protected[0]:.2f}","\n",)


# Comparing test accuracies
print(f"Clean test accuracy: {score_clean[1]:.2f} ","\n",
      f"Stolen test accuracy: {score_stolen[1]:.2f}","\n",
      f"Protected test accuracy: {score_protected[1]:.2f}")

------ LOSS AND ACCURACY METRICS FOR CLEAN AND STOLEN MODELS ------
Clean test loss: 0.16  
 Stolen test loss: 32.00 
 Protected test loss: 56.06 

Clean test accuracy: 0.96  
 Stolen test accuracy: 0.10 
 Protected test accuracy: 0.02


1. Loss Metrics:

    - Clean test loss: The clean model achieves a low test loss value of 0.16. This indicates that the clean model performs well on the test data, with minimal errors or discrepancies between the predicted and actual values.
    - Stolen test loss: The stolen model, which is derived from the unprotected model, exhibits a significantly higher test loss value of 32.00. This suggests that the stolen model struggles to accurately predict the outcomes on the test data it was not originally trained on.
    - Protected test loss: The stolen model based on the protected model demonstrates an even higher test loss value of 56.06. This indicates that the protected model, designed to be more robust against adversarial attacks, is successful in defending against the theft of its knowledge, resulting in a higher loss for the stolen model.
    
    
    
    
2. Accuracy Metrics:

    - Clean test accuracy: The clean model achieves a high test accuracy value of 0.96, indicating that it correctly predicts the labels for the majority of the test samples.
    - Stolen test accuracy: The stolen model derived from the unprotected model exhibits a significantly lower test accuracy value of 0.10. This suggests that the stolen model performs poorly in accurately classifying the test data, indicating that the stolen knowledge might not be effectively transferred.
    - Protected test accuracy: The stolen model based on the protected model demonstrates an even lower test accuracy value of 0.02. This implies that the protected model is successful in maintaining its robustness against adversarial attacks, resulting in a very low accuracy for the stolen model.