<a href="https://colab.research.google.com/github/coletted1/Model-Theft-Project/blob/main/ModelTheftProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model Extraction Attack

## A. Victim Model Description

**Purpose of the Model:**

The victim model is a neural network designed for the classification of handwritten digits, I chose this as it is a common task in image processing and computer vision. Real-word uses of digit classification include postal code sorting for mail, bank check processing, and educational tools such as Photomath.

**Description of Training Data:**

The model was trained on the MNIST dataset, which consists of 60,000 training images and 10,000 test images of handwritten digits, each being a 28x28 pixel grayscale image.

**Model Architecture and Hyperparameters:**

*   **Architecture:** Sequential neural network model with two dense layers
*   **Layers:**
 * A Flatten layer to transform the input images (28x28 pixels) into a 1D array
 * A Dense layer with 128 neurons, using the ReLU (Rectified Linear Unit) activation function. This layer is responsible for learning features from the flattened input
 * A Dense output layer with 10 neurons (corresponding to the 10 digit classes of MNIST), using the Softmax activation function for multi-class classification

*   **Hyperparameters:**
 *    Optimizer: Adam, a popular optimizer that adapts the learning rate during training
 *    Loss function: Categorical Crossentropy, suitable for the multi-class classification problem
 *    Training Epochs: The model is trained for 5 epochs

**Model Metrics:**

* Accuracy: Achieved an accuracy of 98.62% on the training dataset.
* Precision: Reflects the model's ability to correctly identify only relevant instances (true positives)
* Recall: Indicates the model's ability to find all relevant instances (true positives) in the dataset
* F1 Score: A mean of precision and recall, providing a balance between them

## B. Extraction Attack Technique

**Summary of Technique:**

The chosen technique for the attack was Substitute Model Training. This involved querying the victim model with a subset of the MNIST test dataset and using the predictions to train a substitute model.

**Pros and Cons:**

*    Pros:
 * Does not require knowledge of the victim model’s internal architecture
 * Can be effective even with limited data

*    Cons:
 * The quality of the substitute model depends on the representativeness of the query dataset
 * Requires multiple queries to the victim model, which could be detected

 **Resources Required:**

* A dataset to query the victim model
* Computational resources for training the substitute model
* Time and understanding of machine learning for implementing and tuning the model


## C. Discussion of Results

**Comparison of Extracted Model to Victim Model:**

* **Victim Model:** The victim model, built with a sequential architecture comprising two dense layers, demonstrated robust performance metrics. By the final epoch of training, it achieved an accuracy of 98.55%, precision of 98.74%, recall of 98.39%, and an F1 score of 98.53%. These results indicate a high degree of reliability in classifying the MNIST dataset, with balanced precision and recall.
* **Substitute Model:** The substitute model, with a different architecture (64-neuron layer instead of 128), displayed progressive improvement over its training epochs, achieving an accuracy of 92.20% by the final epoch. When evaluated on the original test set, the model reached an accuracy of 86.64%.
* **Comparative Analysis:** There is a noticeable difference in the performance of the substitute model compared to the victim model. The substitute model's accuracy is lower (86.64% vs. 98.55%). This gap could be attributed to differences in architecture, the quality of data labeled by the victim model for training the substitute, or limitations inherent to the model extraction process. The accuracy of the substitute model on the test dataset (86.64%) is a crucial metric. It's lower than the victim model's accuracy, suggesting the extracted model has not completely replicated the victim model’s high level of performance.


## D. Code
### Train Victim Model

In [25]:
!pip install tensorflow-addons

Collecting tensorflow-addons
  Downloading tensorflow_addons-0.23.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (611 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/611.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.8/611.8 kB[0m [31m4.7 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━[0m [32m512.0/611.8 kB[0m [31m7.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.8/611.8 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
Collecting typeguard<3.0.0,>=2.7 (from tensorflow-addons)
  Downloading typeguard-2.13.3-py3-none-any.whl (17 kB)
Installing collected packages: typeguard, tensorflow-addons
Successfully installed tensorflow-addons-0.23.0 typeguard-2.13.3


In [26]:
# Import statenements
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.metrics import Precision, Recall
import tensorflow_addons as tfa

# Load and preprocess the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalizes the training/testing images by scaling the pixel values to be between 0 and 1
train_images = train_images / 255.0
test_images = test_images / 255.0

# One-hot encode the labels
train_labels = to_categorical(train_labels, 10)
test_labels = to_categorical(test_labels, 10)

# Build and compile the victim model
victim_model = Sequential([ # Initializes a new sequential model
    Flatten(input_shape=(28, 28)), # Adds a flatten layer to the model that flattens the 28x28 input images to a 1D array
    Dense(128, activation='relu'), # Adds a dense layer with 128 neurons and ReLU (Rectified Linear Unit) activation function
    Dense(10, activation='softmax') # Adds a dense layer with 10 neurons (one for each digit) and a softmax activation function
])
victim_model.compile(optimizer='adam',
                     loss='categorical_crossentropy',
                     metrics=['accuracy', Precision(), Recall(), tfa.metrics.F1Score(num_classes=10, average='macro')])


# Train the victim model
victim_model.fit(train_images, train_labels, epochs=5)




TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 



Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7f7d47e5a800>

### Query the Victim Model to Label a New Dataset

In [28]:
import numpy as np

# Generate new set of images
query_images = test_images[:1000]  # 1000 images from the test set

# Use the victim model to label these images
query_labels = np.argmax(victim_model.predict(query_images), axis=1)




### Train the Substitute Model on the Queried Data

In [36]:
# Build and compile the substitute model
substitute_model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(64, activation='relu'),  # Different architecture
    Dense(10, activation='softmax')
])
substitute_model.compile(optimizer='adam',
                         loss='sparse_categorical_crossentropy',
                         metrics=['accuracy'])

# Train the substitute model on the queried data
substitute_model.fit(query_images, query_labels, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7f7d3f84f0d0>

### Evaluate the Substitute Model

In [38]:
# Convert test labels from one-hot encoding to sparse format
sparse_test_labels = np.argmax(test_labels, axis=1)

# Evaluate the substitute model on the original test set with sparse labels
substitute_model.evaluate(test_images, sparse_test_labels)



[0.47845184803009033, 0.859499990940094]