<img style="max-width:20em; height:auto;" src="../graphics/A-Little-Book-on-Adversarial-AI-Cover.png"/>

Author: Nik Alleyne   
Author Blog: https://www.securitynik.com   
Author GitHub: github.com/securitynik   

Author Other Books: [   

            "https://www.amazon.ca/Learning-Practicing-Leveraging-Practical-Detection/dp/1731254458/",   
            
            "https://www.amazon.ca/Learning-Practicing-Mastering-Network-Forensics/dp/1775383024/"   
        ]   


This notebook ***(knockoff-nets.ipynb)*** is part of the series of notebooks From ***A Little Book on Adversarial AI***  A free ebook released by Nik Alleyne

## KNOCK-OFF NETS

### Lab Objectives:   
- Learn how to use Adversarial Robustness Toolkit (ART) for black-box attacks 
- How to create copy of a model when you have access to the model  
- How to create a copy of a model when you only have access to an API endpoint   
- Understand that it takes a large number of query to be able to achieve this attack   
- Understand what knock-off nets are  
- If you wish build on this concept for copy-cat nets  


### Step 1:  
Obtaining the pre-trained model   


In [1]:
# Import some libraries that we will need
import torch
import torch.nn as nn
import torch.optim as optim

# This is for Adversarial Robustness Toolkit usage
import art
from art.estimators.classification import PyTorchClassifier
from art.attacks.extraction import KnockoffNets

In [2]:
### Version of key libraries used  
print(f'Torch version used:  {torch.__version__}')
print(f'ART version used:  {art.__version__}')

Torch version used:  2.7.1+cu126
ART version used:  1.19.1


In [44]:
# Setup the device to work with
# This should ensure if there are accelerators in place, such as Apple backend or CUDA, 
# we should be able to take advantage of it.

if torch.cuda.is_available():
    print('Setting the device to cuda')
    device = 'cuda'
elif torch.backends.mps.is_available():
    print('Setting the device to Apple mps')
    device = 'mps'
else:
    print('Setting the device to CPU')
    device = torch.device('cpu')

Setting the device to CPU


Earlier in the notebookbook **mal_net_tiny_malware_classification_multi_class.ipynb**, we created the malware classifier for the Tiny Mal Net Malware dataset. Let us target that classifier. 

Realistically, in the real-world via a black-box attack, you will not have access to the model this way. You would instead interact with the model via an API. We will do that shortly. However, for now, as we have the model, let us use it this way also, to build our understanding.

In [45]:
# Load the victim model
loaded_victim_model = torch.jit.load(f=r'../data/mal_net_tiny_malware_clf.jit', map_location=device)
loaded_victim_model

RecursiveScriptModule(
  original_name=MalClassifier
  (conv_layers): RecursiveScriptModule(
    original_name=Sequential
    (0): RecursiveScriptModule(original_name=Conv2d)
    (1): RecursiveScriptModule(original_name=BatchNorm2d)
    (2): RecursiveScriptModule(original_name=ReLU)
    (3): RecursiveScriptModule(original_name=Conv2d)
    (4): RecursiveScriptModule(original_name=BatchNorm2d)
    (5): RecursiveScriptModule(original_name=ReLU)
    (6): RecursiveScriptModule(original_name=Conv2d)
    (7): RecursiveScriptModule(original_name=BatchNorm2d)
    (8): RecursiveScriptModule(original_name=ReLU)
  )
  (global_avg_pool): RecursiveScriptModule(original_name=AdaptiveAvgPool2d)
  (classifier): RecursiveScriptModule(original_name=Conv2d)
)

Let us start off with the assumption that we know that for virus detection, we can use Convolutional Network. From this perspective, we can build a Convolutional Network to try to mimic the real network which we do not have.

Note: Let us be clear, there is nothing that states that malware detection has to be done via Convoluational Neural Networks. In fact, you can use any architecture. Also in the **bodmas_malware_classifier.ipynb** we used linear layers. At the same time, you could use Gradient Boosting as was used in the BODMAS paper, or even Graphh Neural Networks that is used by MalNet also. So keep in mind, we are working with a fair assumption in our case but it is just that, an assumption.

Our scenario here, is that we have an API endpoint that predicts whether a file hash is malicious or not. We could even consider the VirusTotal interface as an example 

<img style="max-width:50em; height:auto;" src="../graphics/virustotal_file_hash.png"/>


We know that the endpoint expects a SHA-256 hash. We could get a set of files generate the hash and feed each one to the end point. In our Linux shell, we could do something such as:  
$ **sha256sum /tmp/tiny_mal_net_X_y_t** 
*c721f56288747a5d7b23a3589112379eed129d0d0d37256b6bbb531c6c7e2348  /tmp/tiny_mal_net_X_y_test.npz*   
*3772630f57c89a15f6b6924ff1ce5ff1d1f15df6d4b348e8e8e4e5363720b961  /tmp/tiny_mal_net_X_y_train.npz*    
  
However, even with these hashes, we still need to get the values as 0 or 1. This is because as we saw earlier when constructing the model, that the hash is preprocessed as a set of 0s and 1s and this is what is provided to the model. We also learnt, that in the paper on Copycat CNNs, they used random natural images, https://arxiv.org/pdf/1806.05476. We can create the same by either generating vectors of random 0s and 1 or random hashes.   

### Step 2:   
Get the data, via random generation of file hashes  

In [46]:
# Get our sample data
# Setup a variable for the values in the hex character set
# Remember, Hex can go from 0-9 in numbers and A-F in letters
possible_hex_values = '0123456789ABCDEF'
possible_hex_values

'0123456789ABCDEF'

In [47]:
# import more libraries
import random
import numpy as np
import matplotlib.pyplot as plt

In [48]:
# Set a random seed to ensure we both have the same results
random.seed(10)

# Generate a random SHA256 hash string using our character set which was defined above
sample_hash = ''.join(random.choice(possible_hex_values) for _ in range(64) )

# here is a sample hash we randomly generate
sample_hash

'1DF06EF851FA27B1D4BCD98E59B4E7EC107469B7AEDF2A57D711F9224CB433E5'

In [49]:
# Let's now update this to generate a batch of 10 samples
# Putting everything in a function, so we can call this anytime we wish

def generate_hashes(batch_size:int=10, hash_length=64):
    '''
    Takes the number of samples to be generated and the length of the hash

    Args:
        batch_size (int):  The number of items in the bash
        hash_length (int): default to SHA-256 but if you wish to use MD5, or SHA256, this can easily be changed
    
    '''
    batches = []

    for _ in range(batch_size):
        batches.append(''.join(random.choice(possible_hex_values) for _ in range(hash_length) ))

    return batches

In [50]:
# We are setting the seed here, so we can ensure consistency across our work
seed = 20

# Generate some hashes using our function
random.seed(seed)
sample_hashes = generate_hashes()
sample_hashes

['483A50DD234AFED66AAAD2FC267163268998530877710984AE48D55C34DB7316',
 'D552390ACE153EA0232A8291067050494D83C36767394C42EF0569C396B94308',
 '1CB9CE32F602A058930166C2B9E14CCC80E2FA769EAC61B42B03F74904BB407D',
 'E7CC992F8B4E7F4BA43816BC69F5448025A87A24883492F11FFC349D5EEF4566',
 '936ECE29366969DF3B75B940AC49425138D26EED1553F9578733EDA751E20908',
 'B89C2B015572ED106CE29E9A3E6431B565790317DC8A4A102F546B449359E623',
 '404B9B080E0A9FE9101209D8E82D20C08020A3FD3A00317E7BB23278EEF11E25',
 '56767914BEAEF093B6945178510363FE871D79A97BFD2666FEA4EC204F61F803',
 'E1730427C0A688A67D1807EADAB22CF584241473062A71F74C3E861AC2DEDD37',
 '7A30601A5DA17489F93A27E7906C362ECFD6424CF1E72CA22B4611740348FD85']

At this point we know how to generate some random hashes. For learning purposes, let's say that the application accepting the hashes then does the preprocessing into bits. Let's create a function to simulate that process.

Now that we know how to do those things, we should create a function to transform the hex values to bits.  This is basically the same steps as done above, now consolidated as a function. We will return the bits as well as the labels encoded

In [51]:
# Function to get the hashes to bits
def create_bits_from_hex_string(hash=None) -> np.array:
    bits_list = []
    
    hex_string_splitted = [[ hex_string[i:i+2] for i in range(0, len(hex_string), 2) ] for  hex_string in hash]

    # Create a for loop to perform the task we just did above
    for item in hex_string_splitted:
        bits_list.append(list(''.join([ np.binary_repr(int(i, base=16), width=8) for i in item])))

    # The data in the bit_list comes in the form of strings as can been seen previously
    # Let's get this as float vlaues by setting the dtype=np.float32
    return np.array(bits_list, dtype=np.float32)
    

In [52]:
# As always setting a seed so that our results can be deterministic
seed = 20
random.seed(seed)
raw_bits = create_bits_from_hex_string(hash=sample_hashes)
raw_bits[:10], raw_bits.shape


(array([[0., 1., 0., ..., 1., 1., 0.],
        [1., 1., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 1., 0., 1.],
        ...,
        [0., 1., 0., ..., 0., 1., 1.],
        [1., 1., 1., ..., 1., 1., 1.],
        [0., 1., 1., ..., 1., 0., 1.]], dtype=float32),
 (10, 256))

In [53]:
# These bits now needs to be reshaped to reflect the expected input to the network
# The expected input is batch, channels, height and width.
# Hence for our problem this is batch (-1), channels (1) because it is a black and white image, height (16) pixels, width (16) pixels
raw_bits = raw_bits.reshape(-1, 1, 16, 16)
raw_bits[:1], raw_bits.shape

(array([[[[0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0., 1.,
           0.],
          [0., 1., 0., 1., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 0.,
           1.],
          [0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1.,
           0.],
          [1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1.,
           0.],
          [0., 1., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
           0.],
          [1., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0.,
           0.],
          [0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0.,
           1.],
          [0., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1.,
           0.],
          [1., 0., 0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0.,
           0.],
          [0., 1., 0., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0.,
           0.],
          [0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 0., 0.,
           1.],
          [0., 0., 0., 0., 1., 0., 0., 1., 

- At this point, I should state, rather than generating those hashes, we could have started with the above as our input.   
- The model ultimately expects an image, hence this last step above was a good place to start.   
- However, it is important that we understand preprocessing our data. You have to ensure the same preprocessing steps used to train the model is followed at time of prediction   
- To get the 1s and 0s as in above, we could have done:    
np.random.randint(low=0, high=2, size=(10, 1, 16, 16))   


array([[[[1, 0, 0, ..., 0, 0, 1],  
        [1, 0, 1, ..., 0, 1, 1],  
        [0, 0, 1, ..., 0, 0, 0],  
        ...,  
        [0, 1, 0, ..., 0, 1, 1],  
        [1, 0, 1, ..., 1, 0, 0],  
        [0, 1, 0, ..., 0, 0, 1]]],  


In [54]:
# Let's consolidate these two steps into one function
def create_model_input(batch_size:int=10, hash_length:int=64):
    sample_hashes = generate_hashes(batch_size=batch_size, hash_length=hash_length)
    raw_bits = create_bits_from_hex_string(hash=sample_hashes)
    return raw_bits.reshape(-1, 1, 16, 16)

In [55]:
# Testing the consolidation function
seed = 20
random.seed(seed)
X_hashes = create_model_input(batch_size=1000)
X_hashes[:1], X_hashes.shape

(array([[[[0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0., 1.,
           0.],
          [0., 1., 0., 1., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 0.,
           1.],
          [0., 0., 1., 0., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 1.,
           0.],
          [1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1.,
           0.],
          [0., 1., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1., 0., 1.,
           0.],
          [1., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0.,
           0.],
          [0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0.,
           1.],
          [0., 1., 1., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1.,
           0.],
          [1., 0., 0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0.,
           0.],
          [0., 1., 0., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0.,
           0.],
          [0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 0., 0.,
           1.],
          [0., 0., 0., 0., 1., 0., 0., 1., 

### Step 3:   
Making predictions via the local model  

In [56]:
# With this data, let us now make predictions on this batch. 
# This is just sanity check to ensure everything is working as expected so far
y_logits = loaded_victim_model(torch.as_tensor(data=X_hashes, dtype=torch.float32)).detach().numpy()

# With the results returned, we are now able to see the first 5 logits returned
y_logits[:5], y_logits.shape

(array([[ 1.6725825 ,  2.5005682 , -4.713804  , -0.2533295 , -7.6128345 ],
        [-4.434327  , -1.2528223 ,  2.7963047 , -4.5251923 ,  1.1526003 ],
        [ 0.65790313,  1.5753008 , -3.8772802 , -1.4706424 , -3.8419957 ],
        [ 2.9383953 , -2.9466252 , -0.04147143, -1.3982009 , -5.4401417 ],
        [-5.7621503 ,  0.3885757 , -1.2469347 , -1.6378562 ,  0.73373187]],
       dtype=float32),
 (1000, 5))

In [57]:
# Get a glimpse at the diversity of the model's predictions
np.unique(np.argmax(y_logits, axis=-1), return_counts=True)

(array([0, 1, 2, 3, 4]), array([187, 207, 195, 210, 201]))

Looks like our victim model is ready to be attacked

In [58]:
# Capture the number of classes from this model
nb_classes = y_logits.shape[1]
nb_classes

5

### Step 4:  
Setting up our **knockoff_model**  

In [None]:
torch.manual_seed(10)

# The number of filters is immediately different from what the original network had
num_filters = 64

# The victim model does not have a linear layer at the end
linear_out = 256

# With this  model we would like to learn the parameters from the victim model
# Noticed immediately that the architecture is different from or victim model 

knockoff_model = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=num_filters, kernel_size=(3, 3), stride=1, padding=1, padding_mode='zeros'),
    # Notice that we do not have a batch normalization layer as in the original model
    nn.ReLU(),

    nn.Conv2d(in_channels=num_filters, out_channels=num_filters*2, kernel_size=(3, 3), stride=1, padding=1, padding_mode='zeros'),
    # Notice that we do not have a batch normalization layer as in the original model
    nn.ReLU(),

    nn.Conv2d(in_channels=num_filters*2, out_channels=num_filters*3, kernel_size=(3, 3), stride=1, padding=1, padding_mode='zeros'),
    # Notice that we do not have a batch normalization layer as in the original model
    nn.ReLU(),

    # Notice now the introduction of the flatten layer followed by linear layers
    nn.Flatten(start_dim=1, end_dim=-1),

    # Remember, the image is 16*16 and the last convolution layer pushed out 32*2 filters. Hence the 64
    nn.Linear(in_features=16*16*num_filters*3, out_features=linear_out * 4, bias=True),
    nn.ReLU(),

    # Ths is is new when compared to the original model
    nn.Linear(in_features=linear_out * 4, out_features=nb_classes, bias=True),  # 5 here rerpresents the number of classes that came out of the testing above

)


In [60]:
# Define the loss function
loss_fn = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = torch.optim.Adam(params=knockoff_model.parameters(), lr=0.001)
optimizer

Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0
)

In [62]:
# Just verifying this model can learn from this data
# If you wish to see this in action, just remove the comments
""" for i in range(100):
    optimizer.zero_grad(set_to_none=True)
    loss = loss_fn(knockoff_model(torch.as_tensor(X_hashes, dtype=torch.float32)), torch.as_tensor(y_tmp, dtype=torch.long))
    loss.backward()
    optimizer.step()

    if (i + 1) % 10 == 0:
        print(f'Epoch: {i+1}/100 \t loss: {loss.item()}')"""


" for i in range(100):\n    optimizer.zero_grad(set_to_none=True)\n    loss = loss_fn(knockoff_model(torch.as_tensor(X_hashes, dtype=torch.float32)), torch.as_tensor(y_tmp, dtype=torch.long))\n    loss.backward()\n    optimizer.step()\n\n    if (i + 1) % 10 == 0:\n        print(f'Epoch: {i+1}/100 \t loss: {loss.item()}')"

In [63]:
# If you wish to see the distribution of the classes predicted from above uncomment below
# Remember to uncomment and run above first
# np.unique(knockoff_model(torch.as_tensor(X_hashes)).detach().numpy().argmax(axis=-1), return_counts=True)

In [64]:
# If you wish to see the model's accuracy, uncomment below
# Remember to uncomment the two previous cells
# (knockoff_model(torch.as_tensor(X_hashes)).detach().numpy().argmax(axis=-1) == y_tmp).sum() / y_tmp.shape[0]

### Step 5:  
Creating the ART attack 

In [65]:
# Let's get to work
# Create the classifier to mimic the victim model
# Remember, this model would like to learn the parameters from the victim model
# PyTorchClassifier is from ART

knockoff_classifier = PyTorchClassifier( 
    model=knockoff_model,  # Our knockoff model
    loss=loss_fn,   # Measures how well the model is performing
    input_shape=X_hashes.shape[1:], 
    nb_classes=nb_classes, 
    optimizer=optimizer, 
    device_type=device)

# knockoff_classifier

In [66]:
# Create the victim classifier using ART
# Remember, for this scenario, we have access to the victim modem via the loaded_victim_model variable above
# https://adversarial-robustness-toolbox.readthedocs.io/en/latest/modules/estimators/classification.html
victim_classifier = PyTorchClassifier(
    model=loaded_victim_model, # This is the victim model we loaded at the beginning.
    loss=nn.CrossEntropyLoss(), 
    input_shape=X_hashes.shape[1:], 
    nb_classes=nb_classes,
    optimizer=torch.optim.Adam(params=loaded_victim_model.parameters(), lr=0.001), 
    device_type='cpu',
    )

# Uncomment this line below if you would like to see the ART information for this model
#victim_classifier

In [67]:
# Define our knockoff-net attack
# https://adversarial-robustness-toolbox.readthedocs.io/en/latest/modules/attacks/extraction.html#knockoff-nets

knockoffnets_attack = KnockoffNets(
    classifier=victim_classifier,  # Victim classifier that we are attempting to steal
    batch_size_fit=32, 
    batch_size_query=32, 
    nb_epochs=10,  # Number of epochs used for training
    nb_stolen=2000, # number of queries to submit to the victim, in order to steal it
    sampling_strategy='adaptive',  # The sampling strategy to use. if the classes are severly imbalanced, you should instead use 'random'
    reward='all', 
    verbose=True, 
    use_probability=False)
knockoffnets_attack

KnockoffNets(batch_size_fit=32, batch_size_query=32, nb_epochs=10, nb_stolen=2000, sampling_strategy=adaptive, reward=all, verbose=True, use_probability=False, )

### Step 6:  
Generating the adversarial examples   

In [68]:
# Perform the model extraction using the knock off net
# The thieved_classifier below, is the knockoff_classifier that we defined above. 
# This is our network which has to learn the parameters of the victim model 
# Depending on your system based on the configuration above, this will take just about 30 minutes to finish

knockoff_model = knockoffnets_attack.extract(x=X_hashes, y=y_logits, thieved_classifier=knockoff_classifier)
knockoff_model

Knock-off nets:   0%|          | 0/2000 [00:00<?, ?it/s]

art.estimators.classification.pytorch.PyTorchClassifier(model=ModelWrapper(
  (_model): Sequential(
    (0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(128, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): Flatten(start_dim=1, end_dim=-1)
    (7): Linear(in_features=49152, out_features=1024, bias=True)
    (8): ReLU()
    (9): Linear(in_features=1024, out_features=5, bias=True)
  )
), loss=CrossEntropyLoss(), optimizer=Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0
), input_shape=(1, 16, 16), nb_classes=5, channels_first=True, clip_values=None, preprocessing_defences=None, postprocessing_defences=None, prepro

### Step 7:   
Make predictions on the adversarial examples

In [26]:
# Make some predictions using our original data against the trained knockoff_model
knockoff_model_preds = knockoff_model.predict(x=X_hashes)

# Get the accuracy now, using the knockoff model's predictions vis the original logits which we obtained earlier
knockoff_accuracy = (y_logits.argmax(axis=-1) == knockoff_model_preds.argmax(axis=-1)).sum() / y_logits.shape[0]

print(f'The knockoff model accuracy is: {knockoff_accuracy}')

The knockoff model accuracy is: 0.544


- As we can see, we were able to train a knockoff network by constantly querying the model. We could adjust the hyperparameters above and maybe even train longer to get better accuracy.  
- The objective however at this point, is to show how it is done, not to ensure we have perfect accuracy. With 2000 queries, we got 54% accuracy.   

- Now let's look at this from the perspective of attacking the API endpoint, as we know we will not have the model in most of these attacks. 

- load the inference server   
- $export MLFLOW_TRACKING_URI=http://0.0.0.0:9999      


### Note: Your previous lab from malware_classification_mal_net_tiny_multi_class.ipynb should still be running.    
### If it is not, you may have to change this URI models:/m-8c05baea8f704e41b08afb780c9dd720, depending on when you did the previous lab. 
### You can grab this information by going back to the lab **malware_classification_mal_net_tiny_multi_class.ipynb.   

$ mlflow models serve -m models:/m-8c05baea8f704e41b08afb780c9dd720 -p 5000 --no-conda | ts '[%Y-%m-%d %H:%M:%S]' |  tee /tmp/mlflow_serve.log   
Downloading artifacts: 100%|████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1300.69it/s]   
2025/05/08 15:03:53 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'    
Downloading artifacts: 100%|█████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 419.91it/s]    
2025/05/08 15:03:54 INFO mlflow.pyfunc.backend: === Running command 'exec uvicorn --host 127.0.0.1 --port 5000 --workers 1 mlflow.pyfunc.scoring_server.app:app'   
INFO:     Started server process [807184]  
INFO:     Waiting for application startup.  
INFO:     Application startup complete.  
INFO:     Uvicorn running on http://127.0.0.1:5000 (Press CTRL+C to quit)    



### Step 8:  
Making through 

In [27]:
# Preparing to target the remote API
from art.estimators.classification import BlackBoxClassifier
import requests
import json

In [28]:
# Verify the version of the server hosting the API
requests.get(url='http://127.0.0.1:5000/version').content

b'3.0.0rc1'

In [29]:
# Setup the API endpoint
inference_url = 'http://localhost:5000/invocations'
inference_url

'http://localhost:5000/invocations'

In [30]:
# Setup the HTTP Headers
headers = {'Content-Type' : 'application/json', 'User-agent': 'ThreatActor', 'X-Forwarded-For': '10.0.0.1'}
headers

{'Content-Type': 'application/json',
 'User-agent': 'ThreatActor',
 'X-Forwarded-For': '10.0.0.1'}

In [31]:
# Get some sample data from the existing training set
payload = json.dumps(
    {
        'inputs' : X_hashes.tolist()     
    }
)

payload

'{"inputs": [[[[0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0], [0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0], [0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0], [1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0], [0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0], [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [

In [32]:
# Get the predictions
# First send the request
prediction = requests.post(url=inference_url, data=payload, headers=headers).json()

# The predictions are returned in JSON format
# Let's grab the key
prediction = prediction['predictions']
print(prediction)

[[1.0267462730407715, 3.8219075202941895, -1.9810057878494263, -2.3821656703948975, -8.767653465270996], [-5.197916030883789, 0.25996190309524536, 5.066939353942871, -6.348705291748047, -0.019582588225603104], [0.02147100493311882, 2.989788055419922, -1.37204110622406, -3.6161744594573975, -4.90530252456665], [2.274576425552368, -1.5898317098617554, 2.3772432804107666, -3.4615838527679443, -6.377541542053223], [-6.279557228088379, 1.7967536449432373, 1.1799814701080322, -3.649653911590576, -0.492011159658432], [-2.2376530170440674, 0.6865281462669373, 4.518747329711914, -5.679497718811035, -4.200299263000488], [1.6875861883163452, 3.9612843990325928, -1.46283757686615, -4.5688862800598145, -7.005717754364014], [1.7966386079788208, -0.3373558223247528, 2.942932367324829, -9.416394233703613, -1.7478870153427124], [-6.941361904144287, 4.837276935577393, 2.444013833999634, -3.676614761352539, -4.080903053283691], [1.2528899908065796, 0.21207545697689056, -0.1331779807806015, -5.64427280426

With the understanding that we can interact with the model, let's go ahead and create a function to handle this task


In [33]:
# Create our prediction function

def get_victim_model_output(input_data, inference_url=inference_url,  x_forwared_for_ip='10.0.0.1'):
    headers = {'Content-Type' : 'application/json', 'User-agent': 'ThreatActor', 'X-Forwarded-For': x_forwared_for_ip}
    payload = json.dumps( {'inputs' : input_data.tolist() } )

    # Make the request to the endpoint and capture the response
    response = requests.post(url=inference_url, data=payload, headers=headers)
    if response.status_code == 200:
        logits = response.json()['predictions']
        return torch.as_tensor(data=logits, dtype=torch.float32)
    else:
        raise Exception(f'[!] Error making inference: {response.status_code}')
    
    
    
    
    #prediction = requests.post(url=inference_url, data=payload, headers=headers).json()['predictions']
    return response



In [34]:
sample_preds = get_victim_model_output(input_data=raw_bits, inference_url=inference_url)
sample_preds

tensor([[ 1.0267,  3.8219, -1.9810, -2.3822, -8.7677],
        [-5.1979,  0.2600,  5.0669, -6.3487, -0.0196],
        [ 0.0215,  2.9898, -1.3720, -3.6162, -4.9053],
        [ 2.2746, -1.5898,  2.3772, -3.4616, -6.3775],
        [-6.2796,  1.7968,  1.1800, -3.6497, -0.4920],
        [-2.2377,  0.6865,  4.5187, -5.6795, -4.2003],
        [ 1.6876,  3.9613, -1.4628, -4.5689, -7.0057],
        [ 1.7966, -0.3374,  2.9429, -9.4164, -1.7479],
        [-6.9414,  4.8373,  2.4440, -3.6766, -4.0809],
        [ 1.2529,  0.2121, -0.1332, -5.6443, -2.6662]])

In [35]:
# Setup our BlackBox Classifier to target the API
victim_classifier = BlackBoxClassifier(
    predict_fn=get_victim_model_output, 
    input_shape=X_hashes.shape[1:], 
    nb_classes=nb_classes)
victim_classifier

BlackBoxClassifier(model=None, clip_values=None, preprocessing=StandardisationMeanStd(mean=0.0, std=1.0, apply_fit=True, apply_predict=True), preprocessing_defences=None, postprocessing_defences=None, preprocessing_operations=[StandardisationMeanStd(mean=0.0, std=1.0, apply_fit=True, apply_predict=True)], nb_classes=5, predict_fn=<function get_victim_model_output at 0x768fc4399940>, input_shape=(1, 16, 16))

In [36]:
# To ensure we can run this section on its own, let's redefine our model
# Basically, we are just copying the model defined above here
# I only changed the model name, so we have something to differentiate with above. 
# Only brought it here so we can run this section on its own
torch.manual_seed(10)

num_filters = 64
linear_out = 256

# This is the model we would like to learn the parameters from the victim model
# Noticed immediately that the architecture is different from or victim model 

api_knockoff_model = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=num_filters, kernel_size=(3, 3), stride=1, padding=1, padding_mode='zeros'),
    # Notice that we do not have a batch normalization layer as in the original model
    nn.ReLU(),

    nn.Conv2d(in_channels=num_filters, out_channels=num_filters*2, kernel_size=(3, 3), stride=1, padding=1, padding_mode='zeros'),
    # Notice that we do not have a batch normalization layer as in the original model
    nn.ReLU(),

    nn.Conv2d(in_channels=num_filters*2, out_channels=num_filters*3, kernel_size=(3, 3), stride=1, padding=1, padding_mode='zeros'),
    # Notice that we do not have a batch normalization layer as in the original model
    nn.ReLU(),

    # Notice now the introduction of the flatten layer followed by linear layers
    nn.Flatten(start_dim=1, end_dim=-1),

    # Remember, the image is 16*16 and the last convolution layer pushed out 32*2 filters. Hence the 64
    nn.Linear(in_features=16*16*num_filters*3, out_features=linear_out * 4, bias=True),
    nn.ReLU(),
    nn.Linear(in_features=linear_out * 4, out_features=nb_classes, bias=True),  # 5 here rerpresents the number of classes that came out of the testing above

)


In [37]:
# Same as above

# Define the loss function
loss_fn = nn.CrossEntropyLoss()

# Define the optimizer
optimizer = torch.optim.Adam(params=api_knockoff_model.parameters(), lr=0.001)
optimizer

Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0
)

In [38]:
# Setup our 
api_knockoff_classifier = PyTorchClassifier(model=api_knockoff_model, 
                                            loss=loss_fn, 
                                            input_shape=X_hashes[1:], 
                                            nb_classes=nb_classes, 
                                            optimizer=optimizer, 
                                            device_type='cpu')

#api_knockoff_classifier

In [39]:
# Setup KnofOffNet attack
knockoffnets_api_attack = KnockoffNets(
    classifier=victim_classifier, 
    batch_size_fit=32, 
    batch_size_query=32, 
    nb_epochs=10, 
    nb_stolen=2000,     # Play with this number a bit to see how the accuracy increases. The larger this value, the more likely you will replicate the victim model. 
    sampling_strategy='adaptive', 
    reward='all', 
    use_probability=False, 
    verbose=True
    )

knockoffnets_api_attack

KnockoffNets(batch_size_fit=32, batch_size_query=32, nb_epochs=10, nb_stolen=2000, sampling_strategy=adaptive, reward=all, verbose=True, use_probability=False, )

In [40]:
# Extract the model 
# We are using the data from above
knockoff_net_api_model = knockoffnets_api_attack.extract(x=X_hashes, y=y_logits, thieved_classifier=api_knockoff_classifier)
knockoff_net_api_model

Knock-off nets:   0%|          | 0/2000 [00:00<?, ?it/s]

art.estimators.classification.pytorch.PyTorchClassifier(model=ModelWrapper(
  (_model): Sequential(
    (0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(128, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): Flatten(start_dim=1, end_dim=-1)
    (7): Linear(in_features=49152, out_features=1024, bias=True)
    (8): ReLU()
    (9): Linear(in_features=1024, out_features=5, bias=True)
  )
), loss=CrossEntropyLoss(), optimizer=Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0
), input_shape=[[[[1. 1. 0. ... 0. 1. 0.]
   [0. 0. 1. ... 0. 1. 0.]
   [1. 1. 0. ... 1. 0. 1.]
   ...
   [0. 1. 1. ... 0. 1. 1.]
   [1. 0. 0. ...

While above is running, look at the output from your mlflow console and you should be seeing all these resuts come in.

In [41]:
# Make some predictions with the knock
knockoff_net_api_model_preds = knockoff_net_api_model.predict(x=X_hashes)
knockoff_net_api_model_preds

array([[  3.3165731 ,  -5.344152  ,  -3.4586158 ,  -1.0669612 ,
          4.588649  ],
       [ -3.7082884 ,  -2.6437514 ,   5.3601217 , -13.979237  ,
         -5.499132  ],
       [ -2.4773343 ,   6.7350526 ,  -3.5474515 ,  -4.666757  ,
         -9.274193  ],
       ...,
       [ -8.897909  ,  -0.8901205 ,   6.186191  ,  -9.692911  ,
         -6.3337913 ],
       [ -1.7683557 ,   0.9507092 ,  -1.8145837 ,  -3.4042287 ,
         -3.7613587 ],
       [ -6.863377  ,  -0.97998184,   6.384153  ,  -8.179533  ,
         -7.2956085 ]], dtype=float32)

In [42]:
# Get the accuracy of the model
# Obviously, we got a very low accuracy, we only trained for 10 epochs
# Experiment with making this number larger and see what accuracy you get
knockoff_net_api_model_accuracy = (knockoff_net_api_model_preds.argmax(axis=-1) == y_logits.argmax(axis=-1)).sum() / y_logits.shape[0]
print(f'Model accuracy is: {knockoff_net_api_model_accuracy}')

Model accuracy is: 0.562


### That's it for knockoff nets! 

### Takeaways   
- We were able to leverage ART for a model we had direct access to and a model we needed to interact with via an API   
- This type of attack requires a large number of queries to be able to reconstruct the victim model  
- In our simple demo, it took 2000 attempts for us to be able to get an accuracy of over 50%. If we run this longer, we should be able to build an even better model

### Keep this lab running while we work on the next lab.
Let's now jump to the next lab where where we use nftables to migitate this via the firewall with nftables   
