# Image-Based Model Drift Detection with EMNIST

## Description of Method

This notebook will run through an example of the model drift detection pipeline described in [here](https://) (for a high level understanding of each of the steps in this example, check out the [slides](https://) provided in the git repo).

To concretely detect model drift on the open-source EMNIST dataset, we will create a base model and a production model from two different portions of the dataset. Then, using both model inversion and membership inference attacks, we will determine whether the base model's general idea of a feature or class has drifted, even if the test accuracy scores do not indicate that.

# Detecting Model Drift Example

Let's Begin!

In [None]:
!pip install adversarial-robustness-toolbox # version 1.15.1
!pip install pyyaml h5py  # Required to save models in HDF5 format

import os
import sys
import sklearn
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

# dataset management
from tensorflow.keras.datasets import cifar10, mnist, fashion_mnist
tfds.image_classification.EMNIST
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split

# to initialize model
from tensorflow.keras.layers import Input, Dense, Flatten, Conv2D, MaxPooling2D, Activation, Dropout, \
    BatchNormalization, AveragePooling2D, Add, Conv1D, MaxPooling1D, Embedding
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam

# for model training
from tensorflow.keras.losses import categorical_crossentropy

# for model evaluation
from sklearn.metrics import classification_report

Collecting adversarial-robustness-toolbox
  Downloading adversarial_robustness_toolbox-1.15.1-py3-none-any.whl (1.5 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.5 MB[0m [31m7.5 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━[0m [32m1.0/1.5 MB[0m [31m14.4 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.5/1.5 MB[0m [31m16.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
Collecting scikit-learn<1.2.0,>=0.22.2 (from adversarial-robustness-toolbox)
  Downloading scikit_learn-1.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.5

### Data Management

As a first data management step, we must establish the proper configuration for the dataset and the model.

*For a description of these variables, see [Configuration Notes](#Configuration-Notes).*

In [None]:
# DATASET CONFIGURATION
num_classes = 47
image_size = 28
image_channels = 1
conv_filters = [
    32,
    64
]
dense_units = 1028
mode = "natural"
depth = 1
# TRAINING CONFIGURATION
EPOCHS = 20
batch_size = 128
train_size = 0.2
test_size = 0.2
examples_per_class = 100

Now, we load in the dataset and assign the training and testing data.

In [None]:
label_key = 'label'

# get EMNIST train data
ds, ds_info = tfds.load('emnist', with_info=True, split='train', data_dir='data')
num_classes = ds_info.features[label_key].num_classes

ds_numpy = tfds.as_numpy(ds)
x_trn, y_trn = [], []
for ex in ds_numpy:
    x_trn.append(ex['image'])
    y_trn.append(ex[label_key])

x_train = np.array(x_trn)
y_train = np.array(y_trn)

# get EMNIST test data
ds, ds_info = tfds.load('emnist', with_info=True, split='test', data_dir='data')
num_classes = ds_info.features[label_key].num_classes

ds_numpy = tfds.as_numpy(ds)
x_tst, y_tst= [], []
for ex in ds_numpy:
    x_tst.append(ex['image'])
    y_tst.append(ex[label_key])

x_test = np.array(x_tst)
y_test = np.array(y_tst)

# normalize images
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

Downloading and preparing dataset 535.73 MiB (download: 535.73 MiB, generated: Unknown size, total: 535.73 MiB) to data/emnist/byclass/3.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

Extraction completed...: 0 file [00:00, ? file/s]

Generating splits...:   0%|          | 0/2 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/697932 [00:00<?, ? examples/s]

Shuffling data/emnist/byclass/3.0.0.incompleteEOGMDI/emnist-train.tfrecord*...:   0%|          | 0/697932 [00:…

Generating test examples...:   0%|          | 0/116323 [00:00<?, ? examples/s]

Shuffling data/emnist/byclass/3.0.0.incompleteEOGMDI/emnist-test.tfrecord*...:   0%|          | 0/116323 [00:0…

Dataset emnist downloaded and prepared to data/emnist/byclass/3.0.0. Subsequent calls will reuse this data.


Here, we split the data into the base training data (**x_base, y_base**) and the production training data (**x_production, y_production**).

In [None]:
# reshape y_train
def flatten_y(y_temp):
    y_flatten = []
    for y_idx in range(len(y_temp)):
        y_flatten.append(np.argmax(y_temp[y_idx]))
    y_flatten = np.array(y_flatten)
    return y_flatten

x_base = x_train
y_base = y_train

y_flatten = flatten_y(y_base)
ids = np.array(list(range(len(y_flatten))))

ids_base, ids_prod = train_test_split(ids, train_size=train_size, test_size=test_size, random_state=None, shuffle = True, stratify=y_flatten)

x_base = x_train[ids_base]
y_base = y_train[ids_base]
x_production = x_train[ids_prod]
y_production = y_train[ids_prod]

Let's make sure that our training and testing shapes for the base and production data are what we expect...

In [None]:
print("original training data x and y shapes:")
print(x_train.shape)
print(y_train.shape)

print("base training data x and y shapes:")
print(x_base.shape)
print(y_base.shape)

print("drifted production training data x and y shapes:")
print(x_production.shape)
print(y_production.shape)

print("testing data x and y shapes:")
print(x_test.shape)
print(y_test.shape)

original training data x and y shapes:
(697932, 28, 28, 1)
(697932, 62)
base training data x and y shapes:
(139586, 28, 28, 1)
(139586, 62)
drifted production training data x and y shapes:
(139587, 28, 28, 1)
(139587, 62)
testing data x and y shapes:
(116323, 28, 28, 1)
(116323, 62)


## Model Creation

Now, we have access to our train and test data in the correctly loaded configuration. The next step is to create the base model using **x_base** and **y_base**.

Here, we set the hyperparameters based on the model type and configuration settings and intialize the model with these hyperparameters.

In [None]:
# set hyperparameters for model initialization
kernel_size = [8, 4]
strides = [(2, 2)]
padding = ['same', 'valid']
pool_size = (2, 2)
pool_strides = 1

input_shape = (image_size, image_size, image_channels)
if isinstance(dense_units, int):
    dense_units = [dense_units]
num_fc_units = dense_units
dropout = [0.25, 0.25, 0.5]
embedding = None
adv_multiplier = 0.2
adv_step_size = 0.2
adv_grad_norm = 'infinity'
image_input_name = 'image'
transfer_learning = False
trainable = True

# initialize model
inputs = Input(shape=input_shape, dtype=tf.float32, name=image_input_name)
x = inputs
x = Conv2D(conv_filters[0], kernel_size=kernel_size[0], kernel_initializer='he_normal')(x)
x = Activation('relu')(x)
x = BatchNormalization()(x)
x = Conv2D(conv_filters[1], kernel_size=kernel_size[0], kernel_initializer='he_normal')(x)
x = Activation('relu')(x)
x = BatchNormalization()(x)
x = MaxPooling2D(pool_size=pool_size)(x)
x = Flatten()(x)
for num_units in num_fc_units:
    x = Dense(num_units, activation='relu')(x)
x = Activation('relu')(x)
x = BatchNormalization()(x)
x = Dense(num_classes, activation='linear', name='logits')(x)
pred = Activation('softmax', name="Softmax")(x)

### Create Base Model

In [None]:
model_base = tf.keras.Model(inputs=inputs, outputs=pred, name="keras_generator_base")

loss = categorical_crossentropy
optimizer = Adam(learning_rate=1e-4, beta_1=1e-6)

model_base.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])

model_base.summary()

Model: "keras_generator_base"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 image (InputLayer)          [(None, 28, 28, 1)]       0         
                                                                 
 conv2d (Conv2D)             (None, 21, 21, 32)        2080      
                                                                 
 activation (Activation)     (None, 21, 21, 32)        0         
                                                                 
 batch_normalization (BatchN  (None, 21, 21, 32)       128       
 ormalization)                                                   
                                                                 
 conv2d_1 (Conv2D)           (None, 14, 14, 64)        131136    
                                                                 
 activation_1 (Activation)   (None, 14, 14, 64)        0         
                                              

Now, we use Keras' *model.fit* function for the given epochs to create the base model.

In [None]:
model_base.fit(x_base, y_base, batch_size=batch_size, epochs=EPOCHS, validation_data=(x_test, y_test))
model_base.save('model_base.h5', save_format='h5')

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
print('train loss and accuracy:')
train_loss_base, train_acc_base = model_base.evaluate(x_base, y_base, verbose=1, steps=batch_size+1)

print('test loss and accuracy:')
test_loss_base, test_acc_base = model_base.evaluate(x_test, y_test, verbose=1, steps=batch_size+1)

print('per class reports:')
report_base = classification_report(np.argmax(y_test, axis=1), np.argmax(model_base.predict((x_test)), axis=1),
                               output_dict=True)
for key,value in report_base.items():
	print(key, ':', value)

train loss and accuracy:
test loss and accuracy:
per class reports:
0 : {'precision': 0.6694295465626524, 'recall': 0.7128764278296988, 'f1-score': 0.6904702036711089, 'support': 5778}
1 : {'precision': 0.6882964698989307, 'recall': 0.7423380726698262, 'f1-score': 0.7142965721669072, 'support': 6330}
2 : {'precision': 0.9404983895575522, 'recall': 0.9453058442664849, 'f1-score': 0.9428959891230455, 'support': 5869}
3 : {'precision': 0.9841457244054647, 'recall': 0.9775506785056123, 'f1-score': 0.9808371154815936, 'support': 5969}
4 : {'precision': 0.9410633088492479, 'recall': 0.9576437088449902, 'f1-score': 0.9492811149334038, 'support': 5619}
5 : {'precision': 0.9171259842519685, 'recall': 0.8976878612716763, 'f1-score': 0.9073028237585199, 'support': 5190}
6 : {'precision': 0.9441407584557567, 'recall': 0.9687992988606485, 'f1-score': 0.956311099576088, 'support': 5705}
7 : {'precision': 0.9726027397260274, 'recall': 0.9830591301514905, 'f1-score': 0.977802981205444, 'support': 6139

### Create Production Model

Next, we train our production model with the partitioned production data.

In [None]:
model_production = tf.keras.models.load_model('model_base.h5')

model_production.fit(x_production, y_production, batch_size=batch_size, epochs=EPOCHS, validation_data=(x_test, y_test))
model_production.save('model_production.h5', save_format='h5')

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
print('production train loss and accuracy:')
train_loss_production, train_acc_production = model_production.evaluate(x_production, y_production, verbose=1, steps=batch_size+1)

print('production test loss and accuracy:')
test_loss_production, test_acc_production = model_production.evaluate(x_test, y_test, verbose=1, steps=batch_size+1)

print('production per class reports:')
report_production = classification_report(np.argmax(y_test, axis=1), np.argmax(model_production.predict((x_test)), axis=1),
                               output_dict=True)
for key,value in report_production.items():
	print(key, ':', value)

production train loss and accuracy:
production test loss and accuracy:
production per class reports:
0 : {'precision': 0.6455496631790167, 'recall': 0.7795084804430599, 'f1-score': 0.7062328498627989, 'support': 5778}
1 : {'precision': 0.6763690922730683, 'recall': 0.7121642969984202, 'f1-score': 0.6938053097345132, 'support': 6330}
2 : {'precision': 0.9364307279436431, 'recall': 0.9512693814959959, 'f1-score': 0.9437917335812696, 'support': 5869}
3 : {'precision': 0.9855728904546217, 'recall': 0.984251968503937, 'f1-score': 0.9849119865884326, 'support': 5969}
4 : {'precision': 0.9532111723892546, 'recall': 0.9535504538174052, 'f1-score': 0.9533807829181494, 'support': 5619}
5 : {'precision': 0.8661319073083779, 'recall': 0.936223506743738, 'f1-score': 0.8998148148148147, 'support': 5190}
6 : {'precision': 0.9559052394950718, 'recall': 0.9689745836985101, 'f1-score': 0.9623955431754874, 'support': 5705}
7 : {'precision': 0.9740990990990991, 'recall': 0.9863169897377423, 'f1-score': 0.

## Gray Data Generation
The next step in our model drift detection pipeline is to run a model inversion attack on the base and production models and create gray data from both.

First, let's import everything we need to run the model inversion and generate the gray data.

In [None]:
# GrayDataGenerator imports
from __future__ import absolute_import, division, print_function, unicode_literals
from typing import Optional, TYPE_CHECKING
from tqdm import trange
from tensorflow import keras
from art.estimators.classification.classifier import ClassifierMixin, ClassGradientsMixin
from art.estimators.estimator import BaseEstimator
from art.attacks.attack import InferenceAttack
from art.utils import get_labels_np_array, check_and_transform_label_format
if TYPE_CHECKING:
    from art.utils import CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE

# model inversion
from art.estimators.classification import TensorFlowV2Classifier
from tensorflow.keras.losses import CategoricalCrossentropy

### **GrayDataGenerator** Class
To create the gray data, we first need to port the [MIFace implementation from the Adversarial Robustness Toolbox](https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/art/attacks/inference/model_inversion/mi_face.py) into our example. However, we cannot just import this class because there are a few things that we have changed to work in our test case, so we rename the class **GrayDataGenerator**.

Running the following code section will port that class to our example:

In [None]:
class GrayDataGenerator(InferenceAttack):

    attack_params = InferenceAttack.attack_params + [
        "max_iter",
        "window_length",
        "threshold",
        "learning_rate",
        "batch_size",
        "verbose",
    ]

    _estimator_requirements = (BaseEstimator, ClassifierMixin, ClassGradientsMixin)

    def __init__(
            self,
            classifier: "CLASSIFIER_CLASS_LOSS_GRADIENTS_TYPE",
            max_iter: int = 10000,
            window_length: int = 100,
            threshold: float = 0.99,
            learning_rate: float = 0.1,
            batch_size: int = 1,
            verbose: bool = True,
            g_size: int = 100,
    ):

        super().__init__(estimator=classifier)

        self.max_iter = max_iter
        self.window_length = window_length
        self.threshold = threshold
        self.learning_rate = learning_rate
        self.batch_size = batch_size
        self.verbose = verbose
        self.g_size = g_size

    def infer(self, x: Optional[np.ndarray], y: Optional[np.ndarray] = None, k_model=None, **kwargs) -> np.ndarray:
        if x is None and y is None:
            raise ValueError("Either `x` or `y` should be provided.")

        y = check_and_transform_label_format(y, self.estimator.nb_classes)
        if x is None:
            x = np.zeros((len(y),) + self.estimator.input_shape)
        if y is None:
            y = get_labels_np_array(self.estimator.predict(x, batch_size=self.batch_size))

        x_infer = x.astype(np.float32)
        gray_x = []
        gray_y = []

        # Compute inversions with implicit batching
        for batch_id in trange(
                int(np.ceil(x.shape[0] / float(self.batch_size))), desc="Model inversion", disable=not self.verbose
        ):
            batch_index_1, batch_index_2 = batch_id * self.batch_size, (batch_id + 1) * self.batch_size
            batch = x_infer[batch_index_1:batch_index_2]
            batch_labels = y[batch_index_1:batch_index_2]

            active = np.array([True] * len(batch))
            window = np.inf * np.ones((len(batch), self.window_length))

            i = 0
            acc = False
            g_num = 0

            while i < self.max_iter and sum(active) > 0 and g_num < self.g_size:
                grads = self.estimator.class_gradient(batch[active], np.argmax(batch_labels[active], axis=1))
                grads = np.reshape(grads, (grads.shape[0],) + grads.shape[2:])
                batch[active] = batch[active] + self.learning_rate * grads

                if self.estimator.clip_values is not None:
                    clip_min, clip_max = self.estimator.clip_values
                    batch[active] = np.clip(batch[active], clip_min, clip_max)

                cost = 1 - self.estimator.predict(batch)[np.arange(len(batch)), np.argmax(batch_labels, axis=1)]
                active = (cost <= self.threshold) + (cost >= np.max(window, axis=1))

                i_window = i % self.window_length
                window[::, i_window] = cost
                i = i + 1

                if acc:
                    if image_channels == 1:
                        x = np.reshape(batch, (image_size, image_size, 1))
                    gray_x.append(x)
                    gray_y.append(batch_index_1)
                    g_num = g_num + 1
                else:
                    new_gray = keras.utils.to_categorical(np.array([batch_index_1]), num_classes)
                    scores = k_model.evaluate(batch, new_gray, verbose=1)
                    if int(scores[1] == 1):
                        acc = True
                        x = np.reshape(batch, (image_size, image_size, image_channels))
                        gray_x.append(x)
                        gray_y.append(batch_index_1)
                        g_num = g_num + 1
            x_infer[batch_index_1:batch_index_2] = batch

        return np.array(gray_x), np.array(gray_y)

### Model Inversion Attack

Now we use the **GrayDataGenerator** class to run the model inversion against the base model to create the "base" gray data.

In [None]:
restore_model_base = tf.keras.models.load_model('model_base.h5')
loss_object = CategoricalCrossentropy(from_logits=True)

krc_base = TensorFlowV2Classifier(
    model=restore_model_base,
    nb_classes=num_classes,
    loss_object=loss_object,
    input_shape=(image_size, image_size, image_channels),
    clip_values=(0, 1),
)

attack_base = GrayDataGenerator(krc_base, max_iter=60, threshold=1., batch_size=1, g_size=100)

x_init_average = np.zeros((num_classes, image_size, image_size, image_channels)) + np.mean(x_test, axis=0)
y_gray_base = np.arange(num_classes)
class_gradient = krc_base.class_gradient(x_init_average, y_gray_base)
class_gradient = np.reshape(class_gradient, (num_classes, image_size * image_size, image_channels))
class_gradient_max = np.max(class_gradient, axis=1)

print("Minimum over all maximum class gradient: %f" % (np.min(class_gradient_max)))



Minimum over all maximum class gradient: 0.000000


To ensure that the gray data is recognizable by the base model, we reevaluate the base model with the new gray data. (This step should take ~5 min if you are running on a GPU.)

In [None]:
x_gray_base, y_gray_base = attack_base.infer(x_init_average, y_gray_base, restore_model_base)
y_gray_base = keras.utils.to_categorical(np.array(y_gray_base), num_classes)

print('\n gray data loss and accuracy based on base model:')
scores_gray_base = restore_model_base.evaluate(x_gray_base, y_gray_base, verbose=1)

Model inversion:   0%|          | 0/62 [00:00<?, ?it/s]



Model inversion:   2%|▏         | 1/62 [00:06<06:08,  6.05s/it]



Model inversion:   3%|▎         | 2/62 [00:07<03:23,  3.39s/it]



Model inversion:   5%|▍         | 3/62 [00:13<04:41,  4.77s/it]



Model inversion:   6%|▋         | 4/62 [00:20<05:05,  5.27s/it]



Model inversion:   8%|▊         | 5/62 [00:26<05:24,  5.68s/it]



Model inversion:  10%|▉         | 6/62 [00:30<04:44,  5.09s/it]



Model inversion:  11%|█▏        | 7/62 [00:36<05:00,  5.46s/it]



Model inversion:  13%|█▎        | 8/62 [00:42<05:03,  5.62s/it]



Model inversion:  15%|█▍        | 9/62 [00:48<05:05,  5.76s/it]



Model inversion:  16%|█▌        | 10/62 [00:54<05:05,  5.88s/it]



Model inversion:  18%|█▊        | 11/62 [01:00<05:03,  5.95s/it]



Model inversion:  19%|█▉        | 12/62 [01:07<05:00,  6.01s/it]



Model inversion:  21%|██        | 13/62 [01:08<03:52,  4.74s/it]



Model inversion:  23%|██▎       | 14/62 [01:15<04:12,  5.26s/it]



Model inversion:  24%|██▍       | 15/62 [01:21<04:13,  5.39s/it]



Model inversion:  26%|██▌       | 16/62 [01:27<04:24,  5.75s/it]



Model inversion:  27%|██▋       | 17/62 [01:33<04:17,  5.73s/it]



Model inversion:  29%|██▉       | 18/62 [01:39<04:24,  6.01s/it]



Model inversion:  31%|███       | 19/62 [01:45<04:16,  5.96s/it]



Model inversion:  32%|███▏      | 20/62 [01:52<04:18,  6.15s/it]



Model inversion:  34%|███▍      | 21/62 [01:58<04:12,  6.16s/it]



Model inversion:  35%|███▌      | 22/62 [02:03<03:52,  5.81s/it]



Model inversion:  37%|███▋      | 23/62 [02:10<03:56,  6.06s/it]



Model inversion:  39%|███▊      | 24/62 [02:16<03:52,  6.12s/it]



Model inversion:  40%|████      | 25/62 [02:24<04:05,  6.65s/it]



Model inversion:  42%|████▏     | 26/62 [02:32<04:10,  6.97s/it]



Model inversion:  44%|████▎     | 27/62 [02:38<03:56,  6.75s/it]



Model inversion:  45%|████▌     | 28/62 [02:45<03:49,  6.75s/it]



Model inversion:  47%|████▋     | 29/62 [02:51<03:35,  6.54s/it]



Model inversion:  48%|████▊     | 30/62 [02:57<03:31,  6.60s/it]



Model inversion:  50%|█████     | 31/62 [03:03<03:19,  6.42s/it]



Model inversion:  52%|█████▏    | 32/62 [03:10<03:16,  6.54s/it]



Model inversion:  53%|█████▎    | 33/62 [03:16<03:04,  6.35s/it]



Model inversion:  55%|█████▍    | 34/62 [03:23<03:00,  6.46s/it]



Model inversion:  56%|█████▋    | 35/62 [03:29<02:51,  6.34s/it]



Model inversion:  58%|█████▊    | 36/62 [03:36<02:47,  6.44s/it]



Model inversion:  60%|█████▉    | 37/62 [03:42<02:38,  6.33s/it]



Model inversion:  61%|██████▏   | 38/62 [03:48<02:34,  6.44s/it]



Model inversion:  63%|██████▎   | 39/62 [03:53<02:19,  6.06s/it]



Model inversion:  65%|██████▍   | 40/62 [04:00<02:18,  6.28s/it]



Model inversion:  66%|██████▌   | 41/62 [04:06<02:10,  6.22s/it]



Model inversion:  68%|██████▊   | 42/62 [04:13<02:07,  6.38s/it]



Model inversion:  69%|██████▉   | 43/62 [04:19<01:59,  6.28s/it]



Model inversion:  71%|███████   | 44/62 [04:27<01:59,  6.66s/it]



Model inversion:  73%|███████▎  | 45/62 [04:28<01:27,  5.16s/it]



Model inversion:  74%|███████▍  | 46/62 [04:30<01:06,  4.18s/it]



Model inversion:  76%|███████▌  | 47/62 [04:38<01:17,  5.19s/it]



Model inversion:  77%|███████▋  | 48/62 [04:40<01:00,  4.30s/it]



Model inversion:  79%|███████▉  | 49/62 [04:46<01:04,  4.96s/it]



Model inversion:  81%|████████  | 50/62 [04:53<01:06,  5.51s/it]



Model inversion:  82%|████████▏ | 51/62 [04:57<00:53,  4.90s/it]



Model inversion:  84%|████████▍ | 52/62 [05:04<00:54,  5.46s/it]



Model inversion:  85%|████████▌ | 53/62 [05:10<00:50,  5.64s/it]



Model inversion:  87%|████████▋ | 54/62 [05:16<00:47,  5.97s/it]



Model inversion:  89%|████████▊ | 55/62 [05:18<00:32,  4.64s/it]



Model inversion:  90%|█████████ | 56/62 [05:24<00:30,  5.02s/it]



Model inversion:  92%|█████████▏| 57/62 [05:30<00:27,  5.52s/it]



Model inversion:  94%|█████████▎| 58/62 [05:37<00:22,  5.68s/it]



Model inversion:  95%|█████████▌| 59/62 [05:43<00:18,  6.01s/it]



Model inversion:  97%|█████████▋| 60/62 [05:49<00:12,  6.02s/it]



Model inversion:  98%|█████████▊| 61/62 [05:56<00:06,  6.27s/it]



Model inversion: 100%|██████████| 62/62 [06:02<00:00,  5.85s/it]


 gray data loss and accuracy based on base model:





Let's do the same for the production model.

In [None]:
restore_model_production = tf.keras.models.load_model('model_production.h5')
loss_object = CategoricalCrossentropy(from_logits=True)

krc_production = TensorFlowV2Classifier(
    model=restore_model_production,
    nb_classes=num_classes,
    loss_object=loss_object,
    input_shape=(image_size, image_size, image_channels),
    clip_values=(0, 1),
)

attack_production = GrayDataGenerator(krc_production, max_iter=60, threshold=1., batch_size=1, g_size=100)

x_init_average = np.zeros((num_classes, image_size, image_size, image_channels)) + np.mean(x_test, axis=0)
y_gray_production = np.arange(num_classes)
class_gradient = krc_production.class_gradient(x_init_average, y_gray_production)
class_gradient = np.reshape(class_gradient, (num_classes, image_size * image_size, image_channels))
class_gradient_max = np.max(class_gradient, axis=1)

print("Minimum over all maximum class gradient: %f" % (np.min(class_gradient_max)))

Minimum over all maximum class gradient: 0.000000


To ensure that the gray data is recognizable by the production model, we reevaluate the production model with the new gray data. (This step should take ~5 min if you are running on a GPU.)

In [None]:
x_gray_production, y_gray_production = attack_production.infer(x_init_average, y_gray_production, restore_model_production)
y_gray_production = keras.utils.to_categorical(np.array(y_gray_production), num_classes)

print('\n gray data loss and accuracy based on production model:')
scores_gray_production = restore_model_production.evaluate(x_gray_production, y_gray_production, verbose=1)
print('Gray data loss:', scores_gray_production[0])
print('Gray data accuracy:', scores_gray_production[1])

Model inversion:   0%|          | 0/62 [00:00<?, ?it/s]



Model inversion:   2%|▏         | 1/62 [00:06<06:25,  6.32s/it]



Model inversion:   3%|▎         | 2/62 [00:08<03:55,  3.93s/it]



Model inversion:   5%|▍         | 3/62 [00:15<05:01,  5.11s/it]



Model inversion:   6%|▋         | 4/62 [00:20<05:10,  5.35s/it]



Model inversion:   8%|▊         | 5/62 [00:27<05:32,  5.84s/it]



Model inversion:  10%|▉         | 6/62 [00:34<05:41,  6.09s/it]



Model inversion:  11%|█▏        | 7/62 [00:41<05:58,  6.51s/it]



Model inversion:  13%|█▎        | 8/62 [00:47<05:46,  6.41s/it]



Model inversion:  15%|█▍        | 9/62 [00:54<05:43,  6.48s/it]



Model inversion:  16%|█▌        | 10/62 [01:00<05:27,  6.30s/it]



Model inversion:  18%|█▊        | 11/62 [01:06<05:25,  6.38s/it]



Model inversion:  19%|█▉        | 12/62 [01:12<05:09,  6.19s/it]



Model inversion:  21%|██        | 13/62 [01:19<05:09,  6.32s/it]



Model inversion:  23%|██▎       | 14/62 [01:25<04:57,  6.20s/it]



Model inversion:  24%|██▍       | 15/62 [01:31<04:57,  6.32s/it]



Model inversion:  26%|██▌       | 16/62 [01:37<04:46,  6.23s/it]



Model inversion:  27%|██▋       | 17/62 [01:44<04:44,  6.33s/it]



Model inversion:  29%|██▉       | 18/62 [01:50<04:33,  6.21s/it]



Model inversion:  31%|███       | 19/62 [01:56<04:34,  6.37s/it]



Model inversion:  32%|███▏      | 20/62 [02:02<04:21,  6.23s/it]



Model inversion:  34%|███▍      | 21/62 [02:09<04:21,  6.38s/it]



Model inversion:  35%|███▌      | 22/62 [02:15<04:11,  6.28s/it]



Model inversion:  37%|███▋      | 23/62 [02:22<04:10,  6.41s/it]



Model inversion:  39%|███▊      | 24/62 [02:28<03:57,  6.26s/it]



Model inversion:  40%|████      | 25/62 [02:34<03:57,  6.41s/it]



Model inversion:  42%|████▏     | 26/62 [02:41<03:56,  6.58s/it]



Model inversion:  44%|████▎     | 27/62 [02:49<03:56,  6.75s/it]



Model inversion:  45%|████▌     | 28/62 [02:55<03:43,  6.57s/it]



Model inversion:  47%|████▋     | 29/62 [03:01<03:37,  6.58s/it]



Model inversion:  48%|████▊     | 30/62 [03:08<03:26,  6.45s/it]



Model inversion:  50%|█████     | 31/62 [03:14<03:20,  6.46s/it]



Model inversion:  52%|█████▏    | 32/62 [03:20<03:08,  6.28s/it]



Model inversion:  53%|█████▎    | 33/62 [03:27<03:05,  6.41s/it]



Model inversion:  55%|█████▍    | 34/62 [03:32<02:54,  6.25s/it]



Model inversion:  56%|█████▋    | 35/62 [03:39<02:52,  6.38s/it]



Model inversion:  58%|█████▊    | 36/62 [03:45<02:43,  6.28s/it]



Model inversion:  60%|█████▉    | 37/62 [03:52<02:39,  6.40s/it]



Model inversion:  61%|██████▏   | 38/62 [03:58<02:31,  6.30s/it]



Model inversion:  63%|██████▎   | 39/62 [04:01<02:04,  5.42s/it]



Model inversion:  65%|██████▍   | 40/62 [04:07<02:02,  5.56s/it]



Model inversion:  66%|██████▌   | 41/62 [04:14<02:03,  5.88s/it]



Model inversion:  68%|██████▊   | 42/62 [04:20<01:58,  5.91s/it]



Model inversion:  69%|██████▉   | 43/62 [04:27<01:57,  6.16s/it]



Model inversion:  71%|███████   | 44/62 [04:33<01:50,  6.11s/it]



Model inversion:  73%|███████▎  | 45/62 [04:34<01:20,  4.76s/it]



Model inversion:  74%|███████▍  | 46/62 [04:41<01:26,  5.41s/it]



Model inversion:  76%|███████▌  | 47/62 [04:48<01:28,  5.87s/it]



Model inversion:  77%|███████▋  | 48/62 [04:50<01:06,  4.75s/it]



Model inversion:  79%|███████▉  | 49/62 [04:57<01:10,  5.40s/it]



Model inversion:  81%|████████  | 50/62 [05:04<01:10,  5.84s/it]



Model inversion:  82%|████████▏ | 51/62 [05:10<01:04,  5.87s/it]



Model inversion:  84%|████████▍ | 52/62 [05:17<01:01,  6.15s/it]



Model inversion:  85%|████████▌ | 53/62 [05:23<00:55,  6.11s/it]



Model inversion:  87%|████████▋ | 54/62 [05:29<00:50,  6.32s/it]



Model inversion:  89%|████████▊ | 55/62 [05:36<00:43,  6.26s/it]



Model inversion:  90%|█████████ | 56/62 [05:42<00:38,  6.42s/it]



Model inversion:  92%|█████████▏| 57/62 [05:48<00:31,  6.29s/it]



Model inversion:  94%|█████████▎| 58/62 [05:55<00:25,  6.44s/it]



Model inversion:  95%|█████████▌| 59/62 [06:01<00:18,  6.33s/it]



Model inversion:  97%|█████████▋| 60/62 [06:08<00:12,  6.47s/it]



Model inversion:  98%|█████████▊| 61/62 [06:14<00:06,  6.35s/it]



Model inversion: 100%|██████████| 62/62 [06:21<00:00,  6.15s/it]


 gray data loss and accuracy based on production model:





Gray data loss: 4.2629257222870365e-05
Gray data accuracy: 1.0


## Membership Inference Attacks

Now that we have our gray data from both the base model and production model, we will use this data to check whether the base model has experienced model drift from the base to the production data.


We take both the rule-based membership inference (MI) attack and the black-box MI attack from the [Adversarial Robustness Toolbox example](https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/attack_membership_inference_shadow_models.ipynb).

In [None]:
from art.attacks.inference.membership_inference import MembershipInferenceBlackBox, MembershipInferenceBlackBoxRuleBased
from art.utils import to_categorical
from art.estimators.classification import TensorFlowV2Classifier

restore_model_base2 = tf.keras.models.load_model('model_base.h5')

art_classifier = TensorFlowV2Classifier(restore_model_base2, nb_classes=num_classes, input_shape=(image_size, image_size, image_channels))

###Rule-Based Membership Inference Attack

In [None]:
rb_attack = MembershipInferenceBlackBoxRuleBased(art_classifier)
inferred_test = rb_attack.infer(x_test, y_test)
rb_test_acc = 1 - (np.sum(inferred_test) / len(inferred_test))
print(f"Test Data Accuracy {rb_test_acc:.4f}")

Test Data Accuracy 0.1763


Here we infer membership of the gray data created from the base model, which should provide a high member accuracy.

In [None]:
inferred_train = rb_attack.infer(x_gray_base, y_gray_base)

rb_base_train_acc = np.sum(inferred_train) / len(inferred_train)
rb_base_acc = (rb_base_train_acc * len(inferred_train) + rb_test_acc * len(inferred_test)) / (len(inferred_train) + len(inferred_test))
print(f"Base Gray Data Accuracy: {rb_base_train_acc:.4f}")
print(f"Base Attack Accuracy {rb_base_acc:.4f}")

Base Gray Data Accuracy: 1.0000
Base Attack Accuracy 0.1794


Now we infer membership of the gray data created from the production model; this accuracy should be lower that the previous if model drift occured.

In [None]:
prod_inferred_train = rb_attack.infer(x_gray_production, y_gray_production)

rb_prod_train_acc = np.sum(prod_inferred_train) / len(prod_inferred_train)
rb_prod_acc = (rb_prod_train_acc * len(prod_inferred_train) + rb_test_acc * len(inferred_test)) / (len(prod_inferred_train) + len(inferred_test))
print(f"Production Gray Data Accuracy: {rb_prod_train_acc:.4f}")
print(f"Production Attack Accuracy {rb_prod_acc:.4f}")

Production Gray Data Accuracy: 0.7222
Production Attack Accuracy 0.1773


In [None]:
print("Rule-Base Model Drift Detection Result:")
result_a = (rb_prod_acc < rb_base_acc)
if result_a:
  rb_score = rb_base_acc - rb_prod_acc
  print(f"drift: {rb_score:.4f}")
else:
  print("no drift detected")

print("Rule-Base Concept Drift Detection Result:")
result_b = (rb_prod_train_acc < rb_base_train_acc)
if result_b:
  rb_score_c = rb_base_train_acc - rb_prod_train_acc
  print(f"drift: {rb_score_c:.4f}")
else:
  print("no drift detected")

Rule-Base Model Drift Detection Result:
drift: 0.0021
Rule-Base Concept Drift Detection Result:
drift: 0.2778


###Black-Box Membership Inference Attack

We run the black-box membership inference attack in the same way as the rule-based attack after fitting the attack model to the base training data that was used to fit the base model originally.

In [None]:
bb_attack_bb = MembershipInferenceBlackBox(art_classifier)
bb_attack_bb.fit(x_base, y_base, x_test, y_test)

nonmember_infer = bb_attack_bb.infer(x_test, y_test)
bb_nonmember_acc = 1 - np.sum(nonmember_infer) / len(x_test)
print('Test Data Accuracy:', bb_nonmember_acc)

Test Data Accuracy: 0.28442354478478027


In [None]:
base_member_infer = bb_attack_bb.infer(x_gray_base, y_gray_base)
bb_base_member_acc = np.sum(base_member_infer) / len(x_gray_base)
bb_base_acc = (bb_base_member_acc * len(x_gray_base) + bb_nonmember_acc * len(x_test)) / (len(x_gray_base) + len(x_test))
print('Base Gray Data Accuracy:', bb_base_member_acc)
print('Base Attack Accuracy:', bb_base_acc)

Base Gray Data Accuracy: 1.0
Base Attack Accuracy: 0.28710785279331275


In [None]:
production_member_infer = bb_attack_bb.infer(x_gray_production, y_gray_production)
bb_production_member_acc = np.sum(production_member_infer) / len(x_gray_production)
bb_production_acc = (bb_production_member_acc * len(x_gray_production) + bb_nonmember_acc * len(x_test)) / (len(x_gray_production) + len(x_test))
print('Production Gray Data Accuracy:', bb_production_member_acc)
print('Production Attack Accuracy:', bb_production_acc)

Production Gray Data Accuracy: 0.5138888888888888
Production Attack Accuracy: 0.28484884888320644


In [None]:
print("Black Box Model Drift Detection Result:")
result_c = (bb_production_acc < bb_base_acc)
if result_c:
  bb_score = bb_base_acc - bb_production_acc
  print(f"drift: {bb_score:.4f}")
else:
  print("no drift detected")

print("Black Box Concept Drift Detection Result:")
result_d = (bb_production_member_acc < bb_base_member_acc)
if result_d:
  bb_score_c = bb_base_member_acc - bb_production_member_acc
  print(f"drift: {bb_score_c:.4f}")
else:
  print("no drift detected")

Black Box Model Drift Detection Result:
drift: 0.0023
Black Box Concept Drift Detection Result:
drift: 0.4861


## Model Drift Detection

Let's take a look at the accuracy scores of the membership inference attack with the base gray data compared to the attack with the production gray data.

### Concept Drift Detection
If **production_member_acc** << **base_member_acc** for either attack, then we know that concept drift has occured.

In [None]:
print("Concept Drift Detection Result:")
print((result_b | result_d))

Concept Drift Detection Result:
True


### Model Drift Detection
If **bb_production_acc** << **bb_base_acc** for either attack, then we believe that data drift has occured.

In [None]:
print("Model Drift Detection Result:")
print((result_a | result_c))

Model Drift Detection Result:
True


# Configuration Notes

## Dataset Configuration

* **data_set**: name of data set mapped to utils.get_data

* **num_classes**: number of classes in dataset

* **image_size**: size of images in dataset

* **image_channels**: number of image channels in dataset

* **data_path**: path to saved data *(default "data")*

* **data_augmentation**: set to *True* if data augmentation should be used to train the model

## Model Configuration

* **conv_filters**: array for conv filters  *(ex. [4, 8])*

* **dense_units**: size of dense units

* **mode**: training mode *("natural", "robust", or "mini-robust")*

* **depth**:

## Training Configuration

* **epochs**: number of epochs

* **batch_size**:

* **train_size**: