<img style="max-width:20em; height:auto;" src="../graphics/A-Little-Book-on-Adversarial-AI-Cover.png"/>

Author: Nik Alleyne   
Author Blog: https://www.securitynik.com   
Author GitHub: github.com/securitynik   

Author Other Books: [   

            "https://www.amazon.ca/Learning-Practicing-Leveraging-Practical-Detection/dp/1731254458/",   
            
            "https://www.amazon.ca/Learning-Practicing-Mastering-Network-Forensics/dp/1775383024/"   
        ]   


This notebook ***(knockoff-nets-black-box.ipynb)*** is part of the series of notebooks From ***A Little Book on Adversarial AI***  A free ebook released by Nik Alleyne

## KnockOff Nets: Black-box    
**Note** For this notebook, you will need the model URI <Model URI: models:/m-55cc524bf54d408b95f024a81836dda3> that you got in the notebook **mal_net_tiny_malware_classification_multi_class.ipynb**

### Lab Objectives:   
- Learn how to use Adversarial Robustness Toolkit (ART) for black-box attacks 
- How to create a copy of a model when you only have access to an API endpoint   
- Understand that it takes a large number of query to be able to achieve this attack   
- Understand what knock-off nets are  
- If you wish build on this concept for copy-cat nets  


### Step 1:  
Obtaining the pre-trained model   


In [1]:
# Import some libraries that we will need
import torch
import torch.nn as nn
import torch.optim as optim

# This is for Adversarial Robustness Toolkit usage
from art.estimators.classification import PyTorchClassifier
from art.attacks.extraction import KnockoffNets

In [2]:
# Setup the device to work with
# This should ensure if there are accelerators in place, such as Apple backend or CUDA, 
# we should be able to take advantage of it.

if torch.cuda.is_available():
    print('Setting the device to cuda')
    device = 'cuda'
elif torch.backends.mps.is_available():
    print('Setting the device to Apple mps')
    device = 'mps'
else:
    print('Setting the device to CPU')
    device = torch.device('cpu')

Setting the device to cuda


Earlier in the notebookbook **mal_net_tiny_malware_classification_multi_class.ipynb**, we created the malware classifier for the Tiny Mal Net Malware dataset. Let us target that classifier. 

Realistically, in the real-world via a black-box attack, you will not have access to the model this way. You would instead interact with the model via an API. We will do that shortly. However, for now, as we have the model, let us use it this way also, to build our understanding.

Let us start off with the assumption that we know that for virus detection, we can use Convolutional Network. From this perspective, we can build a Convolutional Network to try to mimic the real network which we do not have.

Note: Let us be clear, there is nothing that states that malware detection has to be done via Convoluational Neural Networks. In fact, you can use any architecture. Also in the **bodmas_malware_classifier.ipynb** we used linear layers. At the same time, you could use Gradient Boosting as was used in the BODMAS paper, or even Graphh Neural Networks that is used by MalNet also. So keep in mind, we are working with a fair assumption in our case but it is just that, an assumption.

Our scenario here, is that we have an API endpoint that predicts whether a file hash is malicious or not. We could even consider the VirusTotal interface as an example 

<img style="max-width:50em; height:auto;" src="../graphics/virustotal_file_hash.png"/>


We know that the endpoint expects a SHA-256 hash. We could get a set of files generate the hash and feed each one to the end point. In our Linux shell, we could do something such as:  
$ **sha256sum /tmp/tiny_mal_net_X_y_t** 
*c721f56288747a5d7b23a3589112379eed129d0d0d37256b6bbb531c6c7e2348  /tmp/tiny_mal_net_X_y_test.npz*   
*3772630f57c89a15f6b6924ff1ce5ff1d1f15df6d4b348e8e8e4e5363720b961  /tmp/tiny_mal_net_X_y_train.npz*    
  
However, even with these hashes, we still need to get the values as 0 or 1. This is because as we saw earlier when constructing the model, that the hash is preprocessed as a set of 0s and 1s and this is what is provided to the model. We also learnt, that in the paper on Copycat CNNs, they used random natural images, https://arxiv.org/pdf/1806.05476. We can create the same by either generating vectors of random 0s and 1 or random hashes.   

### Step 2:   
Get the data, via random generation of file hashes  

In [3]:
# Get our sample data
# Setup a variable for the values in the hex character set
# Remember, Hex can go from 0-9 in numbers and A-F in letters
possible_hex_values = '0123456789ABCDEF'
possible_hex_values

'0123456789ABCDEF'

In [4]:
# import more libraries
import random
import numpy as np
import matplotlib.pyplot as plt

In [5]:
# Set a random seed to ensure we both have the same results
random.seed(10)

# Generate a random SHA256 hash string using our character set which was defined above
sample_hash = ''.join(random.choice(possible_hex_values) for _ in range(64) )

# here is a sample hash we randomly generate
sample_hash

'1DF06EF851FA27B1D4BCD98E59B4E7EC107469B7AEDF2A57D711F9224CB433E5'

In [6]:
# Let's now update this to generate a batch of 10 samples
# Putting everything in a function, so we can call this anytime we wish

def generate_hashes(batch_size:int=10, hash_length=64):
    '''
    Takes the number of samples to be generated and the length of the hash

    Args:
        batch_size (int):  The number of items in the bash
        hash_length (int): default to SHA-256 but if you wish to use MD5, or SHA256, this can easily be changed
    
    '''
    batches = []

    for _ in range(batch_size):
        batches.append(''.join(random.choice(possible_hex_values) for _ in range(hash_length) ))

    return batches

In [7]:
# We are setting the seed here, so we can ensure consistency across our work
seed = 20

# Generate some hashes using our function
random.seed(seed)
sample_hashes = generate_hashes()
sample_hashes

['483A50DD234AFED66AAAD2FC267163268998530877710984AE48D55C34DB7316',
 'D552390ACE153EA0232A8291067050494D83C36767394C42EF0569C396B94308',
 '1CB9CE32F602A058930166C2B9E14CCC80E2FA769EAC61B42B03F74904BB407D',
 'E7CC992F8B4E7F4BA43816BC69F5448025A87A24883492F11FFC349D5EEF4566',
 '936ECE29366969DF3B75B940AC49425138D26EED1553F9578733EDA751E20908',
 'B89C2B015572ED106CE29E9A3E6431B565790317DC8A4A102F546B449359E623',
 '404B9B080E0A9FE9101209D8E82D20C08020A3FD3A00317E7BB23278EEF11E25',
 '56767914BEAEF093B6945178510363FE871D79A97BFD2666FEA4EC204F61F803',
 'E1730427C0A688A67D1807EADAB22CF584241473062A71F74C3E861AC2DEDD37',
 '7A30601A5DA17489F93A27E7906C362ECFD6424CF1E72CA22B4611740348FD85']

At this point we know how to generate some random hashes. For learning purposes, let's say that the application accepting the hashes then does the preprocessing into bits. Let's create a function to simulate that process.

Now that we know how to do those things, we should create a function to transform the hex values to bits.  This is basically the same steps as done above, now consolidated as a function. We will return the bits as well as the labels encoded

In [8]:
# Function to get the hashes to bits
def create_bits_from_hex_string(hash=None) -> np.array:
    bits_list = []
    
    hex_string_splitted = [[ hex_string[i:i+2] for i in range(0, len(hex_string), 2) ] for  hex_string in hash]

    # Create a for loop to perform the task we just did above
    for item in hex_string_splitted:
        bits_list.append(list(''.join([ np.binary_repr(int(i, base=16), width=8) for i in item])))

    # The data in the bit_list comes in the form of strings as can been seen previously
    # Let's get this as float vlaues by setting the dtype=np.float32
    return np.array(bits_list, dtype=np.float32)
    

In [9]:
# As always setting a seed so that our results can be deterministic
seed = 20
random.seed(seed)
raw_bits = create_bits_from_hex_string(hash=sample_hashes)
raw_bits[:10], raw_bits.shape


(array([[0., 1., 0., ..., 1., 1., 0.],
        [1., 1., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 1., 0., 1.],
        ...,
        [0., 1., 0., ..., 0., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [0., 1., 1., ..., 1., 0., 1.]], dtype=float32),
 (10, 256))

In [10]:
# These bits now needs to be reshaped to reflect the expected input to the network
# The expected input is batch, channels, height and width.
# Hence for our problem this is batch (-1), channels (1) because it is a black and white image, height (16) pixels, width (16) pixels
raw_bits = raw_bits.reshape(-1, 1, 16, 16)
raw_bits[:1], raw_bits.shape

(array([[[[0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0., 1.,
           0.],
          [0., 1., 0., 1., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 0.,
           1.],
          [0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1.,
           0.],
          [1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1.,
           0.],
          [0., 1., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
           0.],
          [1., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0.,
           0.],
          [0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0.,
           1.],
          [0., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1.,
           0.],
          [1., 0., 0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0.,
           0.],
          [0., 1., 0., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0.,
           0.],
          [0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 0., 0.,
           1.],
          [0., 0., 0., 0., 1., 0., 0., 1., 

- At this point, I should state, rather than generating those hashes, we could have started with the above as our input.   
- The model ultimately expects an image, hence this last step above was a good place to start.   
- However, it is important that we understand preprocessing our data. You have to ensure the same preprocessing steps used to train the model is followed at time of prediction   
- To get the 1s and 0s as in above, we could have done:    
np.random.randint(low=0, high=2, size=(10, 1, 16, 16))   


array([[[[1, 0, 0, ..., 0, 0, 1],  
        [1, 0, 1, ..., 0, 1, 1],  
        [0, 0, 1, ..., 0, 0, 0],  
        ...,  
        [0, 1, 0, ..., 0, 1, 1],  
        [1, 0, 1, ..., 1, 0, 0],  
        [0, 1, 0, ..., 0, 0, 1]]],  


In [11]:
# Let's consolidate these two steps into one function
def create_model_input(batch_size:int=10, hash_length:int=64):
    sample_hashes = generate_hashes(batch_size=batch_size, hash_length=hash_length)
    raw_bits = create_bits_from_hex_string(hash=sample_hashes)
    return raw_bits.reshape(-1, 1, 16, 16)

In [12]:
# Testing the consolidation function
seed = 20
random.seed(seed)
X_hashes = create_model_input(batch_size=1000)
X_hashes[:1], X_hashes.shape

(array([[[[0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0., 1.,
           0.],
          [0., 1., 0., 1., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 0.,
           1.],
          [0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1.,
           0.],
          [1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1.,
           0.],
          [0., 1., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
           0.],
          [1., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0.,
           0.],
          [0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0.,
           1.],
          [0., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1.,
           0.],
          [1., 0., 0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0.,
           0.],
          [0., 1., 0., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0.,
           0.],
          [0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 0., 0.,
           1.],
          [0., 0., 0., 0., 1., 0., 0., 1., 

In [13]:
# Define the number of classes
nb_classes =5

### Step 4:   
Validate the mlflow environment is running   

- Now let's look at this from the perspective of attacking the API endpoint, as we know we will not have the model in most of these attacks. 

- load the inference server   
- $export MLFLOW_TRACKING_URI=http://0.0.0.0:9999      


### Note: Your previous lab from malware_classification_mal_net_tiny_multi_class.ipynb should still be running.    
### If it is not, you may have to change this URI models:/m-8c05baea8f704e41b08afb780c9dd720, depending on when you did the previous lab. 
### You can grab this information by going back to the lab **malware_classification_mal_net_tiny_multi_class.ipynb.   

$ mlflow models serve -m models:/m-8c05baea8f704e41b08afb780c9dd720 -p 5000 --no-conda | ts '[%Y-%m-%d %H:%M:%S]' |  tee /tmp/mlflow_serve.log   
Downloading artifacts: 100%|████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1300.69it/s]   
2025/05/08 15:03:53 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'    
Downloading artifacts: 100%|█████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 419.91it/s]    
2025/05/08 15:03:54 INFO mlflow.pyfunc.backend: === Running command 'exec uvicorn --host 127.0.0.1 --port 5000 --workers 1 mlflow.pyfunc.scoring_server.app:app'   
INFO:     Started server process [807184]  
INFO:     Waiting for application startup.  
INFO:     Application startup complete.  
INFO:     Uvicorn running on http://127.0.0.1:5000 (Press CTRL+C to quit)    



### Step 5:  
Setup the environment for remote API inference   

In [14]:
# Preparing to target the remote API
from art.estimators.classification import BlackBoxClassifier
import requests
import json

In [15]:
# Verify the version of the server hosting the API
requests.get(url='http://127.0.0.1:5000/version').content

b'3.1.1'

In [16]:
# Setup the API endpoint
inference_url = 'http://localhost:5000/invocations'
inference_url

'http://localhost:5000/invocations'

In [17]:
# Setup the HTTP Headers
headers = {'Content-Type' : 'application/json', 'User-agent': 'ThreatActor', 'X-Forwarded-For': '10.0.0.1'}
headers

{'Content-Type': 'application/json',
 'User-agent': 'ThreatActor',
 'X-Forwarded-For': '10.0.0.1'}

In [18]:
# Get some sample data from the existing training set
payload = json.dumps(
    {
        'inputs' : X_hashes.tolist()     
    }
)

payload

'{"inputs": [[[[0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0], [0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0], [0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0], [1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0], [0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0], [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [

In [20]:
# Get the predictions
# First send the request
prediction = requests.post(url=inference_url, data=payload, headers=headers).json()

# The predictions are returned in JSON format
# Let's grab the key
prediction = prediction['predictions']
print(prediction)

[[6.666568756103516, 1.7396376132965088, -6.564285755157471, 1.719645380973816, -10.321782112121582], [1.4838752746582031, -3.7930665016174316, 5.244775295257568, -10.026774406433105, 0.8434817790985107], [6.7509379386901855, 9.578500747680664, -5.796075344085693, -7.746459484100342, -11.633014678955078], [7.549092769622803, 2.487220048904419, -2.530186176300049, -5.2780327796936035, -8.98377799987793], [4.359328269958496, -0.515809953212738, 0.36093762516975403, -7.403295516967773, -3.8457655906677246], [1.775102138519287, 1.8083856105804443, 0.659511148929596, -5.590231418609619, -5.489138603210449], [10.834227561950684, -2.819416046142578, 1.4477946758270264, -10.70745849609375, -4.720353603363037], [-0.06977766007184982, 2.8086564540863037, -2.0829660892486572, -8.218449592590332, 0.5062010884284973], [0.8612001538276672, 4.569129943847656, -1.7677454948425293, -2.5181496143341064, -7.889617919921875], [6.319479942321777, 0.008424308151006699, 1.819360375404358, -6.799070835113525,

With the understanding that we can interact with the model, let's go ahead and create a function to handle this task


In [21]:
# Create our prediction function

def get_victim_model_output(input_data, inference_url=inference_url,  x_forwared_for_ip='10.0.0.1'):
    headers = {'Content-Type' : 'application/json', 'User-agent': 'ThreatActor', 'X-Forwarded-For': x_forwared_for_ip}
    payload = json.dumps( {'inputs' : input_data.tolist() } )

    # Make the request to the endpoint and capture the response
    response = requests.post(url=inference_url, data=payload, headers=headers)
    if response.status_code == 200:
        logits = response.json()['predictions']
        return torch.as_tensor(data=logits, dtype=torch.float32)
    else:
        raise Exception(f'[!] Error making inference: {response.status_code}')
    
    return response



In [22]:
# Get the sample predictions from the remote endpoint
y_logits = get_victim_model_output(input_data=X_hashes, inference_url=inference_url)

# Look at the first 10 logits
y_logits[:10]

tensor([[ 6.6666e+00,  1.7396e+00, -6.5643e+00,  1.7196e+00, -1.0322e+01],
        [ 1.4839e+00, -3.7931e+00,  5.2448e+00, -1.0027e+01,  8.4348e-01],
        [ 6.7509e+00,  9.5785e+00, -5.7961e+00, -7.7465e+00, -1.1633e+01],
        [ 7.5491e+00,  2.4872e+00, -2.5302e+00, -5.2780e+00, -8.9838e+00],
        [ 4.3593e+00, -5.1581e-01,  3.6094e-01, -7.4033e+00, -3.8458e+00],
        [ 1.7751e+00,  1.8084e+00,  6.5951e-01, -5.5902e+00, -5.4891e+00],
        [ 1.0834e+01, -2.8194e+00,  1.4478e+00, -1.0707e+01, -4.7204e+00],
        [-6.9778e-02,  2.8087e+00, -2.0830e+00, -8.2184e+00,  5.0620e-01],
        [ 8.6120e-01,  4.5691e+00, -1.7677e+00, -2.5181e+00, -7.8896e+00],
        [ 6.3195e+00,  8.4243e-03,  1.8194e+00, -6.7991e+00, -8.0118e+00]])

### Step 6:   
Setup the BlackBoxClassifier  

In [23]:
# Setup our BlackBox Classifier to target the API
victim_classifier = BlackBoxClassifier(
    predict_fn=get_victim_model_output, 
    input_shape=X_hashes.shape[1:], 
    nb_classes=nb_classes)
victim_classifier

BlackBoxClassifier(model=None, clip_values=None, preprocessing=StandardisationMeanStd(mean=0.0, std=1.0, apply_fit=True, apply_predict=True), preprocessing_defences=None, postprocessing_defences=None, preprocessing_operations=[StandardisationMeanStd(mean=0.0, std=1.0, apply_fit=True, apply_predict=True)], nb_classes=5, predict_fn=<function get_victim_model_output at 0x7f946cf88d60>, input_shape=(1, 16, 16))

In [24]:
# Define the number of classes 
nb_classes = 5

### Step 7:  
Construct the knock-off net

In [25]:
# To ensure we can run this section on its own, let's redefine our model
# Basically, we are just copying the model defined above here
# I only changed the model name, so we have something to differentiate with above. 
# Only brought it here so we can run this section on its own
torch.manual_seed(10)

num_filters = 64
linear_out = 256

# This is the model we would like to learn the parameters from the victim model
# Noticed immediately that the architecture is different from or victim model 

api_knockoff_model = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=num_filters, kernel_size=(3, 3), stride=1, padding=1, padding_mode='zeros'),
    # Notice that we do not have a batch normalization layer as in the original model
    nn.ReLU(),

    nn.Conv2d(in_channels=num_filters, out_channels=num_filters*2, kernel_size=(3, 3), stride=1, padding=1, padding_mode='zeros'),
    # Notice that we do not have a batch normalization layer as in the original model
    nn.ReLU(),

    nn.Conv2d(in_channels=num_filters*2, out_channels=num_filters*3, kernel_size=(3, 3), stride=1, padding=1, padding_mode='zeros'),
    # Notice that we do not have a batch normalization layer as in the original model
    nn.ReLU(),

    # Notice now the introduction of the flatten layer followed by linear layers
    nn.Flatten(start_dim=1, end_dim=-1),

    # Remember, the image is 16*16 and the last convolution layer pushed out 32*2 filters. Hence the 64
    nn.Linear(in_features=16*16*num_filters*3, out_features=linear_out * 4, bias=True),
    nn.ReLU(),
    nn.Linear(in_features=linear_out * 4, out_features=nb_classes, bias=True),  # 5 here represents the number of classes that came out of the testing above

)


In [26]:
# Same as above

# Define the loss function
loss_fn = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = torch.optim.Adam(params=api_knockoff_model.parameters(), lr=0.001)
optimizer

Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0
)

In [27]:
# Setup our 
api_knockoff_classifier = PyTorchClassifier(model=api_knockoff_model, 
                                            loss=loss_fn, 
                                            input_shape=X_hashes[1:], 
                                            nb_classes=nb_classes, 
                                            optimizer=optimizer, 
                                            device_type=device)

#api_knockoff_classifier

In [28]:
# Setup KnofOffNet attack
knockoffnets_api_attack = KnockoffNets(
    classifier=victim_classifier, 
    batch_size_fit=32, 
    batch_size_query=32, 
    nb_epochs=10, 
    nb_stolen=2000,     # Play with this number a bit to see how the accuracy increases. The larger this value, the more likely you will replicate the victim model. 
    sampling_strategy='adaptive', 
    reward='all', 
    use_probability=False, 
    verbose=True
    )

knockoffnets_api_attack

KnockoffNets(batch_size_fit=32, batch_size_query=32, nb_epochs=10, nb_stolen=2000, sampling_strategy=adaptive, reward=all, verbose=True, use_probability=False, )

### Step 8:   
Generate the adversarial Examples  

In [None]:
# Extract the model 
# We are using the data from above
knockoff_net_api_model = knockoffnets_api_attack.extract(x=X_hashes, y=y_logits, thieved_classifier=api_knockoff_classifier)
knockoff_net_api_model

Knock-off nets:   0%|          | 0/2000 [00:00<?, ?it/s]

art.estimators.classification.pytorch.PyTorchClassifier(model=ModelWrapper(
  (_model): Sequential(
    (0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(128, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): Flatten(start_dim=1, end_dim=-1)
    (7): Linear(in_features=49152, out_features=1024, bias=True)
    (8): ReLU()
    (9): Linear(in_features=1024, out_features=5, bias=True)
  )
), loss=CrossEntropyLoss(), optimizer=Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0
), input_shape=[[[[1. 1. 0. ... 0. 1. 0.]
   [0. 0. 1. ... 0. 1. 0.]
   [1. 1. 0. ... 1. 0. 1.]
   ...
   [0. 1. 1. ... 0. 1. 1.]
   [1. 0. 0. ...

While above is running, look at the output from your mlflow console and you should be seeing all these resuts come in.

In [30]:
# Make some predictions with the knock-off net
knockoff_net_api_model_preds = knockoff_net_api_model.predict(x=X_hashes)

# Get the first 10
knockoff_net_api_model_preds[:10]

array([[  9.583767  ,  -2.2678874 ,  -1.7550653 , -24.39183   ,
         -9.272753  ],
       [ -4.2454753 ,  -6.3845196 ,   6.750105  , -16.977863  ,
        -24.104809  ],
       [ -0.81226945,   7.9786882 ,  -5.6751947 , -23.393467  ,
        -15.592325  ],
       [  9.900072  ,   0.6800824 ,  -3.5018222 , -27.2748    ,
         -7.491232  ],
       [  7.3848367 ,  -1.0093126 ,  -3.150879  , -16.987698  ,
        -13.587389  ],
       [ -4.3178864 ,   7.0491095 ,  -0.48725307, -19.415298  ,
        -19.55089   ],
       [  3.8964338 ,   3.033593  ,  -4.380377  , -24.602922  ,
        -17.043125  ],
       [ -1.6680342 ,   7.337214  ,  -7.5422945 , -13.928861  ,
        -16.154879  ],
       [  6.458444  ,   0.09275548,   2.1853082 , -36.220787  ,
        -10.303284  ],
       [ 10.871622  ,  -1.0873486 ,  -7.759781  , -19.981533  ,
        -24.852093  ]], dtype=float32)

In [31]:
# Get the Adversarial Success rate of the model
# Experiment with making this number larger and see what accuracy you get
knockoff_net_api_model_accuracy = (knockoff_net_api_model_preds.argmax(axis=-1) == y_logits.argmax(axis=-1)).sum() / y_logits.shape[0]
print(f'Model accuracy is: {knockoff_net_api_model_accuracy}')

Model accuracy is: 0.8450000286102295


### That's it for knockoff nets in the black-box setting! 

### Takeaways   
- We were able to leverage ART for a model we had direct access to and a model we needed to interact with via an API   
- This type of attack requires a large number of queries to be able to reconstruct the victim model  
- In our simple demo, it took 2000 attempts for us to be able to get an accuracy of over 50%. If we run this longer, we should be able to build an even better model

### Keep this lab running while we work on the next lab.
Let's now jump to the next lab where where we use nftables to migitate this via the firewall with nftables   
