<a href="https://github.com/maticvl/dataHacker/tree/master/pyTorch" target="_parent">This example extends the examples at this link</a>

### Computation graphs and Autograd in PyTorch

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch

PyTorch's autograd (automatic differentiation) is a powerful tool that enables automatic computation of gradients for tensors. Gradients are essential for optimizing neural networks using techniques gradient descent. Autograd tracks the operations performed on tensors during forward pass and automatically computes the gradients of the output with respect to the input during the backward pass.

## Simple example with gradients

Let's now see a simple example of how the derivative is calculated. We will create a scalar tensor `x` and set the `requires_grad` parameter to `True`.

In [2]:
# Define a tensor with requires_grad=True to enable tracking of operations for gradient computation
x = torch.tensor(3., requires_grad=True)
print("x", x)

x tensor(3., requires_grad=True)


In [3]:
# Define a simple mathematical operation
y = x ** 2  # y = x^2

# Perform a backward pass to compute the gradient of y with respect to x
# We call the backward() method on the output tensor (y or loss) 
# to initiate the computation of gradients.
y.backward()

# Access the gradient computed for x
# The gradient represent the rate of change of the y with respect to x
gradient = x.grad
print("Gradient of y with respect to x", gradient)

Gradient of y with respect to x tensor(6.)


We will calculate `y` the following way:\
$ y = 3x^2 + 4x + 2$

Now let's see what we get when we replace the `x` with our value `3`:  
$ y = 3(3)^2 + 4(3) + 2 $\
$ y = 3*9 + 12 + 2$\
$ y = 27 + 12 + 2 $\
$ y = 41 $
\
Now comes the part were we take the derivative of `y` with respect to the variable `x`.\
$\frac{dy}{dx} = 2*3x + 4 = 6x + 4$\
If we replace `x` with our value `3`, we get the following:
$6x + 4 = 6(3) + 4 = 18 + 4 = 22$\
So the gradient is equal to $22$\
Let's see how we can do this in code.

In [4]:
y = 3*x**2 + 4*x + 2
print("y = 3x^2 + 4x + 2 = ", y.item())

y = 3x^2 + 4x + 2 =  41.0


### Call `y.backward()` to calculate the derivative for that function.

In [5]:
# Compute the derivative of `y` with respect to `x`
y.backward()
print("derivative of `y` with respect to `x` =", x.grad)

derivative of `y` with respect to `x` = tensor(28.)


### Is there a way to turn off the gradient calculation ?

The answer is yes, you can turn the gradient calculation anytime.

In [6]:
x = torch.tensor(3., requires_grad=True)
print(x)

tensor(3., requires_grad=True)


In [7]:
x = x.requires_grad_(False)
print(x)

tensor(3.)


In [8]:
x = x.detach()
print(x)

tensor(3.)


# Gradient accumulation

The auto gradient calculation does not reset the gradients automatically, therefore we have to reset them after each optimization. If we forget this step they could end up just accumulating.\
This sounds complex, but it is not, it's easy. To reset the gradients for a particular tensor, you can simply pass `x.grad.zero_()` and it will reset the gradient. 

In [9]:
x = torch.tensor(3., requires_grad=True)

for epoch in range(3):
  y = 3*x**2 + 4*x + 2
  y.backward()

  print(x.grad)
  x.grad.zero_()

tensor(22.)
tensor(22.)
tensor(22.)


In [10]:
import torch

# Define input features (x1 and x2)
x1 = torch.tensor([2, 3], dtype=torch.float32, requires_grad=True)
x2 = torch.tensor([2, 1], dtype=torch.float32, requires_grad=True)

print("x1:", x1)
print("x2:", x2)

# Define weights and bias
w1 = torch.tensor([3, 2], dtype=torch.float32, requires_grad=True)
w2 = torch.tensor([1, 4], dtype=torch.float32, requires_grad=True)
b = torch.tensor(1, dtype=torch.float32, requires_grad=True)

# Define the forward pass of a simple neural network
y_pred = x1 * w1 + x2 * w2 + b

# Define a target value (ground truth)
y_true = torch.tensor([11, 15], dtype=torch.float32)

# Compute the loss manually
loss = torch.mean((y_pred - y_true) ** 2)

# Compute gradients manually using chain rule
dloss_dy_pred = 2 * (y_pred - y_true) / y_pred.numel()  # dloss/dy_pred
dloss_dw1 = (dloss_dy_pred * x1).sum()  # dloss/dw1 = dloss/dy_pred * dy_pred/dw1 = dloss/dy_pred * x1
dloss_dw2 = (dloss_dy_pred * x2).sum()  # dloss/dw2 = dloss/dy_pred * dy_pred/dw2 = dloss/dy_pred * x2
dloss_db = dloss_dy_pred.sum()  # dloss/db = dloss/dy_pred * dy_pred/db = dloss/dy_pred * 1

# Print the computed gradients
print("Gradients of loss with respect to w1:", dloss_dw1.item())
print("Gradients of loss with respect to w2:", dloss_dw2.item())
print("Gradient of loss with respect to b:", dloss_db.item())


x1: tensor([2., 3.], requires_grad=True)
x2: tensor([2., 1.], requires_grad=True)
Gradients of loss with respect to w1: -16.0
Gradients of loss with respect to w2: -8.0
Gradient of loss with respect to b: -6.0


### Differentiation in Autograd
Let's enhance the example with two input variables in the neural network. We'll use two input features (x1 and x2), two corresponding weights (w1 and w2), and one bias term (b). We'll perform a forward pass, compute the loss, and then compute the gradients of the loss with respect to the weights and bias.

The gradient is defined as:
$$
\begin{align}
\frac{\partial J(w,b)}{\partial w}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} ( \hat{y}^{(i)} - y^{(i)})x^{(i)} \tag{1}\\
  \frac{\partial J(w,b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (\hat{y}^{(i)} - y^{(i)}) \tag{2}\\
\end{align}
$$
Where:
* m is the number of training examples in the data set

* $\hat{y}^{(i)} = f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value

In [11]:
def compute_gradient(X, y, y_pred): 
    m = X.shape[0]

    dw = (2/m) * np.dot(X.T, (y_pred - y))
    db = (2/m) * np.sum(y_pred - y)

    return dw, db

In [12]:
# Define input features (x1 and x2) as tensors 
# and set requires_grad=True to compute gradients with respect to 
# x1 and x2 during the backward pass
x1 = torch.tensor([2, 3], dtype=torch.float32, requires_grad=True)
x2 = torch.tensor([2, 1], dtype=torch.float32, requires_grad=True)

print("x1:", x1)
print("x2:", x2)

# Define weights and bias
w1 = torch.tensor([3, 2], dtype=torch.float32, requires_grad=True)
w2 = torch.tensor([1, 4], dtype=torch.float32, requires_grad=True)
b = torch.tensor(1, dtype=torch.float32, requires_grad=True)

print("w1:", w1)
print("w2:", w2)
print("b:", b)

# Define the forward pass of a simple neural network
# to compute the predicted output y_pred using the defined weights, 
# input features, and bias term
y_pred = x1 * w1 + x2 * w2 + b
print("y_pred:", y_pred)

# Define a target value (ground truth)
y_true = torch.tensor([11, 15], dtype=torch.float32)
print("y_true:", y_true)

# Define a loss function (mean squared error)
# between the predicted output y_pred and the target y_true
loss = torch.mean((y_pred - y_true) ** 2)
print("loss:", loss)

# Perform a backward pass to compute the gradients of loss with respect to 
# to compute the gradients of the loss with respect to the w1 and w2, and b
loss.backward()

# Access the gradients computed for w1, w2, and b
# The gradients represent the rate of change of the loss with respect to w1, w2, and b
gradient_w1 = w1.grad
gradient_w2 = w2.grad
gradient_b = b.grad

# Print the gradients
print("Gradients of loss with respect to w1:", gradient_w1)
print("Gradients of loss with respect to w2:", gradient_w2)
print("Gradient of loss with respect to b:", gradient_b)

X = np.array(list(zip(x1.detach().numpy(), x2.detach().numpy())))
#print("X:", X)
w = np.array(list(zip(w1.detach().numpy(), w2.detach().numpy())))
#print("w:", w)
y = y_true.detach().numpy()
#print("y:", y)
y_pred = y_pred.detach().numpy()
#print("y_pred:", y_pred)
b = b.detach().numpy()
#print("b:", b)
dw, db = compute_gradient(X, y, y_pred)
print("dw:", dw)
print("db:", db)

print("dw:", dw/len(X))
print("db:", db/len(X))


x1: tensor([2., 3.], requires_grad=True)
x2: tensor([2., 1.], requires_grad=True)
w1: tensor([3., 2.], requires_grad=True)
w2: tensor([1., 4.], requires_grad=True)
b: tensor(1., requires_grad=True)
y_pred: tensor([ 9., 11.], grad_fn=<AddBackward0>)
y_true: tensor([11., 15.])
loss: tensor(10., grad_fn=<MeanBackward0>)
Gradients of loss with respect to w1: tensor([ -4., -12.])
Gradients of loss with respect to w2: tensor([-4., -4.])
Gradient of loss with respect to b: tensor(-6.)
dw: [-16.  -8.]
db: -6.0
dw: [-8. -4.]
db: -3.0
