# Example Attribute Inference Attack

In this notebook, we will use ART's support for an inference attack to see if we can stage an attribute attack to detect if a sensitive feature can be detected. We will use the CIFAR-10 dataset to see if we can accurately detect data that provide the sensitive class 0 for automobile.

In [13]:
import numpy as np
import tensorflow as tf
import art
tf.compat.v1.disable_eager_execution()
print(tf.__version__)
print(art.__version__)
# Set random seed for reproducibility
np.random.seed(123)

2.15.0
1.13.1


## Load CIFAR-10 Data and Pre-trained Model

First, we need to load the CIFAR-10 dataset and a pre-trained CNN model. You can replace `pretrained_cifar10_model.h5` with the actual path to your model.

In [14]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import load_model

# Load CIFAR-10 data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values to be between 0 and 1
x_train, x_test = x_train / 255.0, x_test / 255.0

# Load your pre-trained CNN model
model = load_model('../models/simple-cifar10.h5')

2024-06-16 19:05:38.454979: W tensorflow/c/c_api.cc:305] Operation '{name:'batch_normalization_4_2/moving_variance/Assign' id:3047 op device:{requested: '', assigned: ''} def:{{{node batch_normalization_4_2/moving_variance/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](batch_normalization_4_2/moving_variance, batch_normalization_4_2/moving_variance/Initializer/ones)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.


## Prepare Data for the Attack

Here, we prepare our dataset for the attack. The goal is to train a new model (attack model) to predict whether a sample belongs to the sensitive class based on the original model's predictions and other features.

In [16]:
import numpy as np

# Define the sensitive class
sensitive_class = 0 # i.e automobile

# Prepare labels for binary classification: 1 if the class is sensitive, 0 otherwise
y_train_binary = (y_train == sensitive_class).astype(int)
y_test_binary = (y_test == sensitive_class).astype(int)

# Get predictions from the pre-trained model
pretrained_predictions_train = model.predict(x_train)

# Flatten the input images and concatenate with predictions ie image + prediction 
x_train_flat = x_train.reshape(x_train.shape[0], -1)
attack_train_data = np.concatenate([x_train_flat, pretrained_predictions_train], axis=1)

# Prepare test data for the attack model
pretrained_predictions_test = model.predict(x_test)

# Flatten the input images and concatenate with predictions, ie image + prediction
x_test_flat = x_test.reshape(x_test.shape[0], -1)
attack_test_data = np.concatenate([x_test_flat, pretrained_predictions_test], axis=1)



## Train the Attack Model

Now, we train a new model to perform the black-box attack.

In [17]:
from tensorflow.keras import layers, models

# Create a simple attack model
attack_model = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(attack_train_data.shape[1],)),
    layers.Dense(1, activation='sigmoid')  # Binary classification
])

attack_model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the attack model
attack_model.fit(attack_train_data, y_train_binary, epochs=10, validation_split=0.2)


Train on 40000 samples, validate on 10000 samples


2024-06-16 19:11:40.706410: W tensorflow/c/c_api.cc:305] Operation '{name:'training/Adam/dense_1_3/kernel/m/Assign' id:4292 op device:{requested: '', assigned: ''} def:{{{node training/Adam/dense_1_3/kernel/m/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](training/Adam/dense_1_3/kernel/m, training/Adam/dense_1_3/kernel/m/Initializer/zeros)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.


Epoch 1/10

  updates = self.state_updates
2024-06-16 19:11:49.136954: W tensorflow/c/c_api.cc:305] Operation '{name:'loss_3/mul' id:4153 op device:{requested: '', assigned: ''} def:{{{node loss_3/mul}} = Mul[T=DT_FLOAT, _has_manual_control_dependencies=true](loss_3/mul/x, loss_3/dense_1_loss/value)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x788df09fb7c0>

## Evaluate the Attack Model

Finally, we evaluate the attack model on the test data to see how well it can infer the sensitive class.

In [19]:
# Prepare test data for the attack model
pretrained_predictions_test = model.predict(x_test)

# Flatten the input images and concatenate with predictions
x_test_flat = x_test.reshape(x_test.shape[0], -1)
attack_test_data = np.concatenate([x_test_flat, pretrained_predictions_test], axis=1)

# Evaluate the attack model
loss,accuracy = attack_model.evaluate(attack_test_data, y_test_binary)
print(loss,accuracy)

0.09656515996083617 0.977


This code outlines a basic black-box attack against a CIFAR-10 pre-trained CNN model. Depending on your specific requirements and the sophistication of the attack model, you might need to adjust the model architecture, training parameters, or data preprocessing steps.