<img style="max-width:20em; height:auto;" src="../graphics/A-Little-Book-on-Adversarial-AI-Cover.png"/>

Author: Nik Alleyne   
Author Blog: https://www.securitynik.com   
Author GitHub: github.com/securitynik   

Author Other Books: [   

            "https://www.amazon.ca/Learning-Practicing-Leveraging-Practical-Detection/dp/1731254458/",   
            
            "https://www.amazon.ca/Learning-Practicing-Mastering-Network-Forensics/dp/1775383024/"   
        ]   


This notebook ***(Beginning Gradient Descent.ipynb)*** is part of the series of notebooks From ***A Little Book on Adversarial AI***  A free ebook released by Nik Alleyne

### Beginning Gradient Descent

### Step 1:  

### Lab Objectives:   
- Understanding Backpropagation         
- Understand Gradient Descent    
- Implement Backpropagation and Gardient Descent   
- Validate our work with Pytorch   
- Implement a simple model for Gradient Descent      



The forward pass being performed on a simple network   
<img style="max-width:50em; height:auto;" src="../graphics/Forward_Pass_Beginning_Gradient_Descent.png"/>

In [1]:
# import the libraries
import torch

In [2]:
### Version of key libraries used  
print(f'Torch version used:  {torch.__version__}')

Torch version used:  2.7.1+cu128


In [3]:
# Setup the device to work with
# This should ensure if there are accelerators in place, such as Apple backend or CUDA, 
# we should be able to take advantage of it.

if torch.cuda.is_available():
    print('Setting the device to cuda')
    device = 'cuda'
elif torch.backends.mps.is_available():
    print('Setting the device to Apple mps')
    device = 'mps'
else:
    print('Setting the device to CPU')
    device = torch.device('cpu')

Setting the device to cuda


In [4]:
# Define the learning rate
learning_rate = 0.1

# Define my desired initial weight
init_weight = 0.15

# Define my x - feature
x = 5.

# define my y - target/label
y = 0.92

init_weight, x, y, learning_rate

(0.15, 5.0, 0.92, 0.1)

### Step 2:   
Perform manual prediction   

In [5]:
# Making a prediction manually
def forward_pass(w, x):
    ''' Make a forward pass on the training data '''
    return w*x

In [6]:
# Make a prediction
y_pred = forward_pass(init_weight, x)
y_pred

0.75

### Step 3:   
Define the loss function  

In [7]:
# Calculate the loss on this pass
# Because I have one sample, there is no need to devide by num of samples
# So I'm cheating here
def my_MSELoss(y_pred, y_true):
    ''' Calculates the Mean Squared Error Loss '''
    return (y_pred - y_true)**2

In [8]:
# Calculates the loss on this sample. 
# Similar to Stochastic Gradient Descent (SGD)
mseloss = my_MSELoss(y_pred, y)
mseloss

0.028900000000000012

With the above done manually, let's confirm with PyTorch that we calculated the forward pass correctly

### Step 4:   
Create the torch model   

In [9]:
# Import the needed libraries
import torch
import torch.nn
import torch.nn.functional as F

In [10]:
# https://medium.com/codex/how-to-build-a-pytorch-model-42ae8473a41e
# https://machinelearningmastery.com/pytorch-tutorial-develop-deep-learning-models/

class SimpleModel(torch.nn.Module):
    ''' Subclass the torch module '''
    def __init__(self):
        super(SimpleModel, self).__init__()

        # Define a layer with 1 input feature and one output target
        # Turn off the bias
        self.linear = torch.nn.Linear(in_features=1, out_features=1, bias=None)

    # Setup the forward pass
    def forward(self, x):
        return self.linear(x)

In [11]:
# Instantiate the model
# and move it to the device
simple_model = SimpleModel().to(device=device)

# Add the loss function
loss_fn = torch.nn.MSELoss(reduction='mean')

# Define the optimizer
optimizer = torch.optim.SGD(simple_model.parameters(), lr=0.1)

# Setup the weights I would like to use
# https://www.askpython.com/python-modules/initialize-model-weights-pytorch
torch.nn.init.constant_(simple_model.linear.weight.data, init_weight)

tensor([[0.1500]], device='cuda:0')

In [12]:
# Visualize the model
print(simple_model)

SimpleModel(
  (linear): Linear(in_features=1, out_features=1, bias=False)
)


In [13]:
# Define the x and y data
torch_x = torch.tensor(data=[x], dtype=torch.float32, device=device)
torch_y = torch.tensor(data=[y], dtype=torch.float32, device=device)

# Print x and y
torch_x, torch_y

(tensor([5.], device='cuda:0'), tensor([0.9200], device='cuda:0'))

In [14]:
# Make the prediction
y_pred = simple_model(torch_x)

# Calculate the loss
loss  = loss_fn(y_pred, torch_y)

print(f'loss: {loss.item():.4f} | Prediction: {y_pred.item()}')

loss: 0.0289 | Prediction: 0.75


Awesome! We have confirmed with PyTorch that the manual calculations for the first epoch (forward pass) was correct. The results match our manual process.   


### Step 5:   

Time to move on to Gradient Descent via Backpropagation    


<img style="max-width:50em; height:auto;" src="../graphics/Backward_Pass_Beginning_Gradient_Descent.png"/>


First up, time to calculate the gradients by leveraging backpropagation and the chain rule.

Put a link here to the backpropagation post on securitynik.com and Andrej Karpathy post on why you should learn back propagation.   


To calculate the gradients, we need to find the partial derivatives of the loss with respect to the weight. Meaning as the weight changes, how does the loss changes.

However, when the weight is changed, the prediction will be changed.    
When the prediction changes, the loss changes.   
Hence, we need to find the partial derivative of the loss with respect to the prediction.  
And the derivative of the prediction with respect to the weights.   
The **chain rule** can be used to solve this challenge.

**dl_dw = dl_dy * dy_dw**   

In [15]:
# First finding dl_dy - partial derivative of the loss with respect to the prediction
# mse loss = (y_pred-y_true) ** 2
# To find the partial derivative of the loss with respect to the prediction
# it is 2(y_pred - y_true)
dl_dy = 2*(0.75-0.92)
dl_dy

-0.3400000000000001

In [16]:
# Next up, find the partial derivative of 
# the prediction with respect to the weight
# formula for the prediction is y_hat = w*x
# hence the partial derivative of dy_dw = x
dy_dw = x
dy_dw

5.0

In [17]:
# hence dl_dw = dl_dy * dy_dw
dl_dw = dl_dy * dy_dw
dl_dw

-1.7000000000000004

### Step 6:  
Perform gradient descent   

Now that the gradient has been calculated, time to update the weight.   
To update the weight, the formula is:   
**new_weight = old_weight - learning_rate * (dl_dw)**   

In [18]:
# new_weight_0 = old_weight_0 - learning_rate * dl_dw
new_weight = init_weight - learning_rate*(dl_dw)
new_weight

0.32000000000000006

In [19]:
# Using the new weight to train the network
y_pred = forward_pass(new_weight, x)
y_pred

1.6000000000000003

In [20]:
# Calculate the loss
my_MSELoss(y_pred, y)

0.46240000000000037

Obviously, the process above should be done via a loop, rather than individually.  Before coding this up, time to verify with PyTorch that the gradients are the same

In [21]:
# First convert the init_weight to a torch tensor 
# By setting requires.grad_(True), we are saying this parameter can be updated/trained
init_weight = torch.tensor(data=init_weight, device=device).requires_grad_(True)
print(f'This is the init_weight: {init_weight:.4f}')

# Make the prediction  
y_hat = torch.multiply(init_weight, torch_x)
print(f'This is the prediction: {y_hat.detach().cpu().numpy().item()}')

# Calculate the loss  
loss = F.mse_loss(input=y_hat, target=torch_y)
print(f'This is the loss: {loss:.4f}')

# Perform backpropagation 
loss.backward()

This is the init_weight: 0.1500
This is the prediction: 0.75
This is the loss: 0.0289


In [22]:
# # Retrieve the gradients for the loss with respect to the weight
init_weight.grad.data

tensor(-1.7000, device='cuda:0')

### Step 7:   
Things are looking good so far. Build a Torch model to automate that process  

In [23]:
# Confirming with PyTorch

class SimpleModel(torch.nn.Module):
    ''' Subclass the torch module '''
    def __init__(self):
        super(SimpleModel, self).__init__()

        # Define a layer with 1 input feature and one output target
        # Turn off the bias
        self.linear = torch.nn.Linear(in_features=1, out_features=1, bias=None)

    # Setup the forward pass
    def forward(self, x):
        return self.linear(x)


# Define the model 
simple_model = SimpleModel().to(device=device)
optimizer = torch.optim.SGD(simple_model.parameters(), lr=0.1)
torch.nn.init.constant_(simple_model.linear.weight, init_weight)

torch_loss = []

# Train the model for two epochs
for i in range(0, 2):
    y_pred = simple_model(torch_x)
    loss = my_MSELoss(y_pred, torch_y)

    # Clear the gradients
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    torch_loss.append(loss.item())
    print(f'Epoch {i}: loss: {loss.item()}')


Epoch 0: loss: 0.02890000492334366
Epoch 1: loss: 0.4624001681804657


In [24]:
# This also confirms the manual process and the PyTorch Gradient calculation 
# Time to now code up our own and close this notebook off.
# Quick and dirty just for this purpose
def my_gradient(w, x, y):
    for i in range(0, 2):
        y_pred = forward_pass(w, x)
        loss = (y_pred - y) ** 2
        dl_dw = 2*(y_pred - y) * x
        w = w - 0.1 *(dl_dw)
        print(f'Epoch: {i}: Prediction: {y_pred} : Loss:{loss}' )

    #return y_pred, loss

In [25]:
# call the function and confirm or working.
my_gradient(init_weight, x, y)

Epoch: 0: Prediction: 0.75 : Loss:0.02890000492334366
Epoch: 1: Prediction: 1.6000001430511475 : Loss:0.4624001681804657


In [26]:
# With the training finish clear the GPU cache
# Setup the device to work with
if torch.cuda.is_available():
    # For CUDA GPU
    print(f'Cleaning {device} cache')
    torch.cuda.empty_cache()
elif torch.backends.mps.is_available():
    # For Apple devices
    print(f'Cleaning {device} cache')
    torch.mps.empty_cache()
else:
    # Default to cpu
    pass

Cleaning cuda cache


That's it for this lab        
### Lab Takeaways:  
- We learnt about Gradient Descent 
- We implemented Gradient Descent   
- We learned about backpropagation    
- We implemented backpropagation    
- We saw how frameworks such as Pytorch, helps to make this process extremely easier.  