# Using Adversarial Robustness Tests to evaluate pre-trained models 

This notebook demonstratesthe use of ML-generated samplest to test adversarial robustness of the pretrained models to detect anomalies. We will be using a technique called FSGM and is used to generate changed samples using subtle changes called perturbations. We discuss this technique and perturbations in the next chapter. Here we want to detect signs of anomalies in the performance of pre-trained models with avdersarial data to investigate further.


In [4]:
import tensorflow as tf
import art
tf.compat.v1.disable_eager_execution()
import logging
logging.getLogger("tensorflow").setLevel(logging.DEBUG)
gpus = tf.config.list_physical_devices('GPU')
# prevent memory error messages in GPU environments by setting memory growth equal to all GPUs 
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)
    
    

1 Physical GPUs, 1 Logical GPUs


In [5]:
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import KerasClassifier
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import load_model
from tensorflow.keras.utils import to_categorical
import numpy as np

def evaluate_model_robustness(model_path , eps=0.1):
    """
    Evaluate the robustness of a given model against adversarial attacks using FGSM
    """
    # load the model 
    model = load_model(model_path)
    # Load and preprocess CIFAR-10 data
    (_, _), (x_test, y_test) = cifar10.load_data()
    x_test = x_test.astype('float32') / 255

    # Convert integer labels to one-hot encoded labels
    y_test_one_hot = to_categorical(y_test)

    # Wrap the model with ART's KerasClassifier
    classifier = KerasClassifier(model=model, clip_values=(0, 1))

    # Generate adversarial examples using FGSM
    attack = FastGradientMethod(estimator=classifier, eps=eps)
    x_test_adv = attack.generate(x=x_test)

    # Evaluate the model on clean and adversarial samples
    _, clean_accuracy = model.evaluate(x_test, y_test_one_hot, verbose=0)
    _, adv_accuracy = model.evaluate(x_test_adv, y_test_one_hot, verbose=0)
    print(f"Results for {model_path}")
    # Print evaluation results
    print(f"Accuracy on clean test samples: {clean_accuracy}")
    print(f"Accuracy on adversarial test samples: {adv_accuracy}")

    # Analyze confidence scores on clean and adversarial samples
    clean_predictions = classifier.predict(x_test)
    adv_predictions = classifier.predict(x_test_adv)

    clean_confidence = np.max(clean_predictions, axis=1).mean()
    adv_confidence = np.max(adv_predictions, axis=1).mean()

    # Print confidence analysis
    print(f"Average confidence on clean test samples: {clean_confidence}")
    print(f"Average confidence on adversarial test samples: {adv_confidence}")
    print("\n")

# Load pre-trained model
evaluate_model_robustness('../models/simple-cifar10-cnn.h5')
evaluate_model_robustness('../models/simple-cifar10-poisoned.h5')
evaluate_model_robustness('../models/backdoor-pattern-cifar10.h5')

Instructions for updating:
Colocations handled automatically by placer.


  updates=self.state_updates,
2024-06-29 14:35:20.194643: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape indropout/cond/then/_56/dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
  updates = self.state_updates


Results for ../models/simple-cifar10-cnn.h5
Accuracy on clean test samples: 0.8665000200271606
Accuracy on adversarial test samples: 0.11400000005960464
Average confidence on clean test samples: 0.9276211857795715
Average confidence on adversarial test samples: 0.7819012403488159


Results for ../models/simple-cifar10-poisoned.h5
Accuracy on clean test samples: 0.678600013256073
Accuracy on adversarial test samples: 0.09650000184774399
Average confidence on clean test samples: 0.845690906047821
Average confidence on adversarial test samples: 0.8059046268463135


Results for ../models/backdoor-pattern-cifar10.h5
Accuracy on clean test samples: 0.10019999742507935
Accuracy on adversarial test samples: 0.0997999981045723
Average confidence on clean test samples: 0.4350295960903168
Average confidence on adversarial test samples: 0.4363557994365692


