<img style="max-width:20em; height:auto;" src="../graphics/A-Little-Book-on-Adversarial-AI-Cover.png"/>

Author: Nik Alleyne   
Author Blog: https://www.securitynik.com   
Author GitHub: github.com/securitynik   

Author Other Books: [   

            "https://www.amazon.ca/Learning-Practicing-Leveraging-Practical-Detection/dp/1731254458/",   
            
            "https://www.amazon.ca/Learning-Practicing-Mastering-Network-Forensics/dp/1775383024/"   
        ]   


This notebook ***(basics_of_adversarial_examples.ipynb)*** is part of the series of notebooks From ***A Little Book on Adversarial AI***  A free ebook released by Nik Alleyne

### Basics of Adversarial Examples  

### Lab Objectives:   
- Simplifying the problem of adversarial examples   
- Work through a manual example of creating adversarial samples   
- Create a manual sample **X_input** consisting of 10 features   
- We then create a weight vector **weight_vector** parameter  
- We create a final bias parameter  


- Our linear formula for machine learning is X_input dot weight_vector (transpose) + bias
- The output is then passed to an activation function.

Our objective is to classify the output as 1 (malicious ... for example) if the output from the activation function (sigmoid) is greater than .5. If the output is lower than .5, we classify as 0 (benign ... for example)


### Step 1:  


In [1]:
# import the library
import numpy as np

In [2]:
### Version of key libraries used  
print(f'Numpy version used:  {np.__version__}')

Numpy version used:  2.1.3


In [3]:
# Create a vector of random values
np.random.seed(20)
x_input = np.random.randint(low=-2, high=2, size=(1,10) ).astype(dtype=np.float32)
x_input

array([[ 1.,  0.,  1.,  1., -2.,  0., -1., -2.,  1.,  0.]], dtype=float32)

In [4]:
# Create some random weights
np.random.seed(10)
weight_param = np.random.randint(low=-2, high=2, size=(1,10)).astype(dtype=np.float32)
weight_param

array([[-1., -1., -2.,  1., -2., -1.,  1., -2., -1., -1.]], dtype=float32)

In [5]:
# Define a bias:
bias = np.array(1, dtype=np.float32)
bias

array(1., dtype=float32)

In [6]:
# Perform the linear operation
z_output = x_input @ weight_param.transpose() + bias
z_output

array([[5.]], dtype=float32)

In [7]:
# At this point, we introduce nonlinearity. 
# This is where our activation function kicks in.
# Define a sigmoid activation function 

def sigmoid(z):
    return 1./(1. + np.exp(-z)).astype(np.float32)

In [8]:
# Apply the activation function to our z_output to get the final prediction
final_prediction = sigmoid(z_output)
'SUSPICIOUS ' if final_prediction.item() > 0.5 else ' NORMAL'

'SUSPICIOUS '

In [9]:
# Get the actual score
final_prediction

array([[0.9933072]], dtype=float32)

Above we can see our simple classifier is very confident that this sample is in class 1 at .99 probability or 99%.   
How can we change the x_input, so that this sample is no longer classified as 1 but instead as 0?   

Let's try a trick here, we know our weights above looks like:   
x_input = [[1., 0., 1., 1., 0., 0., 1., 0., 1., 0.]]   

and our input x_input looks like:   
x_input = [[ 1.,  0.,  1.,  1., -2.,  0., -1., -2.,  1.,  0.]]   

How about, wherever the weight is negative, we leave the original input (x_input) untouched. Wherever the weight is positive or 0, we add 1 to the existing values in x_input.

x_adversarial_input = np.array([[ 2,  1.,  2,  2, -2.,  1., -1., -2.,  2,  1.]])   

### Step 2:   
Manually crafting our adversarial example  

In [10]:
# Create the new adversarial example
x_adversarial_input = np.where( x_input >= 0, x_input + 1, x_input )
x_adversarial_input

array([[ 2.,  1.,  2.,  2., -2.,  1., -1., -2.,  2.,  1.]], dtype=float32)

In [11]:
# Perform the forward pass to make the prediction
adversarial_output = x_adversarial_input @ weight_param.transpose() + bias
adversarial_output

array([[-1.]], dtype=float32)

In [12]:
# Get the adversarial prediction
'SUSPICIOUS ' if sigmoid(adversarial_output).item() > 0.5 else ' NORMAL'

' NORMAL'

In [13]:
# The probability returned is:
sigmoid(adversarial_output)

array([[0.2689414]], dtype=float32)

As we see above, by simply adding one to all the positive values including 0s, we shifted the prediction from suspicious to normal.  
In this example, we used 10 features. From that 10, we only modified a subset. We could have made smaller changes to all of them and achieve the same effect. 

Now imagine a scenario, where your input has many more features. In this case, just slightly modifying a few of the features or all of them in a small way, should provide the same effect. 

### Lab Takeaways:   
- We created our adversarial example by modifying the input   
- We kept the weights and bias, the parameters fixed   
- We saw how the probability (confidence scores) dropped from high confidence of being suspicious to low confidence of it being suspicious.   



Reference:   
- https://karpathy.github.io/2015/03/30/breaking-convnets/   