<a href="https://github.com/maticvl/dataHacker/tree/master/pyTorch" target="_parent">This example extends the examples at this link</a>

### Computation graphs and Autograd in PyTorch

In [38]:
import numpy as np
import matplotlib.pyplot as plt
import torch

torch.set_printoptions(precision=4, sci_mode=False)

PyTorch's autograd (automatic differentiation) is a powerful tool that enables automatic computation of gradients for tensors. Gradients are essential for optimizing neural networks using techniques gradient descent. Autograd tracks the operations performed on tensors during forward pass and automatically computes the gradients of the output with respect to the input during the backward pass.

##  Gradients without autograd

In [50]:
# Define function f(x) that returns x^2
def f(x):
    return x ** 2

# Test the function f(x)
print("f(3) = ", f(3))

f(3) =  9


In [51]:
# Derivative without autograd
# Derivative of f(x) = x^2 is df(x) = 2x
def df(x):
    return 2 * x

# Test the function df(x)
print("df(3) = ", df(3))

df(3) =  6


In [52]:
# Find the minimum of f(x) using gradient descent
x_min = 10 # Initial guess
learning_rate = 0.1
n_iter = 100
for i in range(n_iter):
    x_min -= learning_rate * df(x_min)

print(f"x_min = {x_min:.4f}, f(x_min) ={f(x_min):.4f}")

x_min = 0.0000, f(x_min) =0.0000


##  Gradients with autograd
Let's now see a simple example of how the derivative is calculated. We will create a scalar tensor `x` and set the `requires_grad` parameter to `True`.

In [53]:
# Derivative with autograd
# Define a tensor with requires_grad=True to enable tracking of operations for gradient computation
x = torch.tensor(3., requires_grad=True)
print("x", x)

x tensor(3., requires_grad=True)


In [55]:
# Define a simple mathematical operation
y = f(x) # y = x^2
print("y = ", y)

# Perform a backward pass to compute the gradient of y with respect to x
# We call the backward() method on the output tensor y
# to initiate the computation of gradients
y.backward()

# Access the gradient computed for x
# The gradient represent the rate of change of the y with respect to x
gradient = x.grad
print("Gradient of y with respect to x:", gradient)

y =  tensor(9., grad_fn=<PowBackward0>)
Gradient of y with respect to x: tensor(12.)


## Another example of computing gradients using autograd
We will calculate `y` the following way:\
$ y = 3x^2 + 4x + 2$

Now let's see what we get for `x` = `3`:  
$ y = 3(3)^2 + 4(3) + 2 $\
$ y = 3*9 + 12 + 2$\
$ y = 27 + 12 + 2 $\
$ y = 41 $
\
The derivative of `y` with respect to the variable `x`:\
$\frac{dy}{dx} = 2*3x + 4 = 6x + 4$\
For `x` = `3`, we get the following:
$\frac{dy}{dx} = 6x + 4 = 6(3) + 4 = 18 + 4 = 22$\
So the gradient is equal to $22$\
Let's see how we can do this in code using PyTorch autograd.

In [56]:
y = 3*x**2 + 4*x + 2
print("y = 3x^2 + 4x + 2 = ", y.item())

y = 3x^2 + 4x + 2 =  41.0


### Call `y.backward()` to calculate the derivative for that function.

In [57]:
# Compute the derivative of `y` with respect to `x`
y.backward()
print("derivative of `y` with respect to `x` =", x.grad)

derivative of `y` with respect to `x` = tensor(34.)


### To turn off the gradient calculation, we can use requires_grad_(false) method or detach() method.

In [60]:
x = torch.tensor(3., requires_grad=True)
print(x)

x = x.requires_grad_(False)
print(x)

tensor(3., requires_grad=True)
tensor(3.)


In [62]:
x = x.detach()
print(x)

tensor(3.)


# Gradient accumulation

The auto gradient calculation does not reset the gradients automatically, therefore we have to reset them after each optimization. If we forget this step they could end up just accumulating.\
To reset the gradients for a particular tensor, you can simply call `x.grad.zero_()`

In [65]:
x = torch.tensor(3., requires_grad=True)

for epoch in range(3):
  y = 3*x**2 + 4*x + 2
  # auto gradient calculation does not reset the gradients automatically, 
  # gradients end up just accumulating
  y.backward()
  print(x.grad)

tensor(22.)
tensor(44.)
tensor(66.)


To reset the gradients for a particular tensor, you can simply call `x.grad.zero_()`

In [66]:
x = torch.tensor(3., requires_grad=True)

for epoch in range(3):
  y = 3*x**2 + 4*x + 2
  y.backward()
  print(x.grad)
  x.grad.zero_()

tensor(22.)
tensor(22.)
tensor(22.)


##  Gradients for multiple variables without autograd
Let's enhance the example with two input variables in the neural network. We'll use two input features (x1 and x2), two corresponding weights (w1 and w2), and one bias term (b). The output y is calculated as follows: $ y = w1*x1 + w2*x2 + b$


The gradient is defined as:
$$
\begin{align}
\frac{\partial J(w,b)}{\partial w}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} ( \hat{y}^{(i)} - y^{(i)})x^{(i)} \tag{1}\\
  \frac{\partial J(w,b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (\hat{y}^{(i)} - y^{(i)}) \tag{2}\\
\end{align}
$$
Where:
* m is the number of training examples in the data set

* $\hat{y}^{(i)} = f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value

In [85]:
# Define input features (x1 and x2)
x1 = torch.tensor([2, 3], dtype=torch.float32)
x2 = torch.tensor([2, 1], dtype=torch.float32)

print("x1:", x1)
print("x2:", x2)

# Define weights and bias
w1 = torch.tensor([3, 2], dtype=torch.float32)
w2 = torch.tensor([1, 4], dtype=torch.float32)
b = torch.tensor(1, dtype=torch.float32)

# Define the forward pass of a simple neural network
y_pred = x1 * w1 + x2 * w2 + b

# Define a target value (ground truth)
y_true = torch.tensor([11, 15], dtype=torch.float32)

# Compute the loss manually
loss = torch.mean((y_pred - y_true) ** 2)

# Compute gradients manually using chain rule
# Derivative of loss with respect to y_pred
    # .numel() method returns the number of elements in the tensor
dloss_dy = 2 * (y_pred - y_true) / y_pred.numel()  # dloss/dy_pred
# Derivative of loss with respect to w1, w2, and b
dloss_dw1 = (dloss_dy * x1).sum()  # dloss/dw1 = dloss/dy_pred * dy_pred/dw1 = dloss/dy_pred * x1
dloss_dw2 = (dloss_dy * x2).sum()  # dloss/dw2 = dloss/dy_pred * dy_pred/dw2 = dloss/dy_pred * x2
dloss_db = dloss_dy.sum()  # dloss/db = dloss/dy_pred * dy_pred/db = dloss/dy_pred * 1

# Print the computed gradients
print("Gradients of loss with respect to y_pred:", dloss_dy)
print("Gradients of loss with respect to w1:", dloss_dw1)
print("Gradients of loss with respect to w2:", dloss_dw2)
print("Gradient of loss with respect to b:", dloss_db.item())


x1: tensor([2., 3.])
x2: tensor([2., 1.])
Gradients of loss with respect to y_pred: tensor([-2., -4.])
Gradients of loss with respect to w1: tensor(-16.)
Gradients of loss with respect to w2: tensor(-8.)
Gradient of loss with respect to b: -6.0


##  Gradients for multiple variables with autograd
We'll use two input features (x1 and x2), two corresponding weights (w1 and w2), and one bias term (b) to calculate the output y as follows: $ y = w1*x1 + w2*x2 + b$ \
We'll perform a forward pass, compute the loss, and then compute the gradients of the loss with respect to the weights and bias.

In [86]:
# Define input features (x1 and x2) as tensors 
x1 = torch.tensor([2, 3], dtype=torch.float32)
x2 = torch.tensor([2, 1], dtype=torch.float32)

print("x1:", x1)
print("x2:", x2)

# Define weights and bias
# Set requires_grad=True to compute gradients with respect to these tensors
w1 = torch.tensor([3, 2], dtype=torch.float32, requires_grad=True)
w2 = torch.tensor([1, 4], dtype=torch.float32, requires_grad=True)
b = torch.tensor(1, dtype=torch.float32, requires_grad=True)

print("w1:", w1)
print("w2:", w2)
print("b:", b)

# Define the forward pass of a simple neural network
# to compute the predicted output y_pred using the defined weights, 
# input features, and bias term
y_pred = x1 * w1 + x2 * w2 + b
print("y_pred:", y_pred)

# Define a target value (ground truth)
y_true = torch.tensor([11, 15], dtype=torch.float32)
print("y_true:", y_true)

# Define a loss function (mean squared error)
# between the predicted output y_pred and the target y_true
loss = torch.mean((y_pred - y_true) ** 2)
print("loss:", loss)

# Perform a backward pass to compute the gradients of loss with respect to 
# w1 and w2, and b
loss.backward()

# Access the gradients computed for w1, w2, and b
# The gradients represent the rate of change of the loss with respect to w1, w2, and b
gradient_w1 = w1.grad
gradient_w2 = w2.grad
gradient_b = b.grad

# Print the gradients
print("Gradients of loss with respect to w1:", gradient_w1.sum())
print("Gradients of loss with respect to w2:", gradient_w2.sum())
print("Gradient of loss with respect to b:", gradient_b)

x1: tensor([2., 3.])
x2: tensor([2., 1.])
w1: tensor([3., 2.], requires_grad=True)
w2: tensor([1., 4.], requires_grad=True)
b: tensor(1., requires_grad=True)
y_pred: tensor([ 9., 11.], grad_fn=<AddBackward0>)
y_true: tensor([11., 15.])
loss: tensor(10., grad_fn=<MeanBackward0>)
Gradients of loss with respect to w1: tensor(-16.)
Gradients of loss with respect to w2: tensor(-8.)
Gradient of loss with respect to b: tensor(-6.)
