#### Some simple exploration of automatic differentiation - by Torch

Some references:
* https://marksaroufim.medium.com/automatic-differentiation-step-by-step-24240f97a6e6
* [PyTorch Tutorial](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#gradients)


#### Example 1

$y = f(x_1,x_2) = x_1x_2-sin(x_2)$ \
The goal is to compute:\
$f'(x_1=2,x_2=3)$

Firstly, some hand calculations: considering each operation as node

In [1]:
import math
x1 = 2
x2 = 3

# forward process
node_1 = 2
node_2 = 3
node_3 = node_1*node_2
node_4 = math.sin(node_2)
node_5 = node_3 - node_4
print(f'Final result is {node_5}')

Final result is 5.858879991940133


#### backward differentiation - take derivative to x2

Computatinoal graph - forward mode differtiation

$\frac{dx_1}{dx_2} = 0$
$\frac{dx_2}{dx_2} = 1$\
$\frac{dn_3}{dx_2} = \frac{dx_1*x_2}{dx_2}$ = $\frac{dx_1}{dx_2}*x2 + \frac{dx_2}{dx_2}*x_1$\
$\frac{dn_4}{dx_2}$ = $cos(n_2)$\
$\frac{dn_5}{dx_2}$ = $\frac{dn_3}{x_2}-\frac{dn_4}{x_2}$


In [2]:
dn1_x2 = 0 # dx1/x2 = 0
dn2_x2 = 1 # dx2/x2 = 1
dn3_x2 = 0*node_2 + 1*node_1 #d(x1*x2)/dx2 = dx1/dx2 * x2 + dx2/dx2*x1\
dn4_x2 = math.cos(node_2)
dn5_x5 = dn3_x2 - dn4_x2
print(f'Final derivative relative to x2 is {dn5_x5}')
print(f'node3 derivative is {dn3_x2}')
print(f'node4 derivative is {dn4_x2}')
print(f'node2 derivative is {dn2_x2}')
print(f'node1 derivative is {dn1_x2}')

Final derivative relative to x2 is 2.989992496600445
node3 derivative is 2
node4 derivative is -0.9899924966004454
node2 derivative is 1
node1 derivative is 0


#### Reverse mode differtiation - chain rule

$y = f(x_1,x_2) = x_1x_2-sin(x_2)$ \
The goal is to compute:\
$f'(x_1=2,x_2=3)$


$dn_5$ = $\frac{dn_5}{dn_5}$ = $1$

$dn_4$ = $ dn_5 * \frac{dn_5}{dn_4}$ = $-1$, remember negative sign before node 4\
$dn_3$ = $dn_5 * \frac{dn_5}{dn_3}$ = $1$\
$dn_2$ = $dn_4 * \frac{dn_4}{dn_2} +  dn_3 * \frac{dn_3}{dn_2}$ = $dn_4 * cos(dn_2) + dn_3* \frac{d (n_1 * n_2)}{dn_2} $ = -1* cos(3) + 1*2 = 1\
$dn_1$ = $dn_3 * \frac{dn_3}{dn_1}$ = $1*n_2 = 3$


In [3]:
# parent node
d5 = 1
d5_d4 = -1
d5_d3 = 1
# next parent node
d4 = d5*d5_d4
d3 = d5*d5_d3
# children node - to d2 (x2)
d4_d2 = d3*math.cos(node_2)
d3_d2 = d3*node_1

d2 = d4*d4_d2 + d3*d3_d2
d3_d1 = node_2 * 1
d1 = d3 * d3_d1
print(f'Final children x2 derivative is {d2}')
print(f'Final Children x1 derivative is {d1}')

Final children x2 derivative is 2.989992496600445
Final Children x1 derivative is 3


#### Test on pytorch

Automatic differentiations to different nodes: chain rule.

In [3]:
import torch
x1 = torch.FloatTensor([2])
x1.requires_grad = True
x2 = torch.FloatTensor([3])
x2.requires_grad = True
loss = x1*x2 - torch.sin(x2)
# loss_mean = loss.mean()


In [5]:
loss.backward()
print(x2.grad)
print(x1.grad)

tensor([2.9900])
tensor([3.])


#### Retain Graph

$y_1 = f(x_1,x_2) = x_1x_2-sin(x_2)$\
$y_2 = f(x_1,x_2) = x_1x_2+sin(x_2)$



In [6]:
x1 = torch.FloatTensor([2])
x1.requires_grad = True
x2 = torch.FloatTensor([3])
x2.requires_grad = True
loss1 = x1*x2 - torch.sin(x2)
loss2 = x1*x2 + torch.sin(x2)
loss1.backward()
print("The gradident of x2 from loss 1 is {}".format(x2.grad))

The gradident of x2 from loss 1 is tensor([2.9900])


In [7]:
x1 = torch.FloatTensor([2])
x1.requires_grad = True
x2 = torch.FloatTensor([3])
x2.requires_grad = True
loss1 = x1*x2 - torch.sin(x2)
loss2 = x1*x2 + torch.sin(x2)
loss2.backward()
print("The gradident of x2 from loss 2 is {}".format(x2.grad))


The gradident of x2 from loss 2 is tensor([1.0100])


In [8]:
x1 = torch.FloatTensor([2])
x1.requires_grad = True
x2 = torch.FloatTensor([3])
x2.requires_grad = True
loss1 = x1*x2 - torch.sin(x2)
loss2 = x1*x2 + torch.sin(x2)
loss1.backward()
loss2.backward()
print("The gradident of x2 from multiple backward accumulated is {}".format(x2.grad))

The gradident of x2 from multiple backward accumulated is tensor([4.])


In [9]:
x1 = torch.FloatTensor([2])
x1.requires_grad = True
x2 = torch.FloatTensor([3])
x2.requires_grad = True
loss1 = x1*x2 - torch.sin(x2)
loss2 = x1*x2 + torch.sin(x2)
loss = loss1 + loss2
loss.backward()
print("The gradident of x2 from one backpropagation but with summation is {}".format(x2.grad))

The gradident of x2 from one backpropagation but with summation is tensor([4.])


You can backpropagate twice with retain_graph=True

In [10]:
x1 = torch.FloatTensor([2])
x1.requires_grad = True
x2 = torch.FloatTensor([3])
x2.requires_grad = True
loss1 = x1*x2 - torch.sin(x2)
loss2 = x1*x2 + torch.sin(x2)
loss = loss1 + loss2
loss.backward(retain_graph=True)
loss.backward()
print("The gradident of x2 from one backpropagation but with summation is {}".format(x2.grad))

The gradident of x2 from one backpropagation but with summation is tensor([8.])


In [11]:
x1 = torch.FloatTensor([2])
x1.requires_grad = True
x2 = torch.FloatTensor([3])
x2.requires_grad = True
loss1 = x1*x2 - torch.sin(x2)
loss2 = x1*x2 + torch.sin(x2)
loss = loss1 + loss2*0.5
loss.backward()
print("The gradident of x2 from one backpropagation but with summation and weights is {}".format(x2.grad))

The gradident of x2 from one backpropagation but with summation and weights is tensor([3.4950])


### You have to retain graph = true if you backward twice since there are shared intermediate variables

In [13]:
x1 = torch.FloatTensor([2])
x1.requires_grad = True
x2 = torch.FloatTensor([3])
x2.requires_grad = True

# y is an intermediate variable.
y = x1 * x2  

# Two losses that depend on the intermediate variable y.
loss1 = y - torch.sin(x2)
loss2 = y + torch.sin(x2)

# Backpropagating on loss1 requires us to retain the graph if we want to backpropagate on loss2 afterwards,
# because loss2 depends on the same y that was created before.
loss1.backward(retain_graph=True)  # retain_graph=True is needed if we want to backpropagate through y again.
loss2.backward()  # Now it's fine without retain_graph=True because this is the last time we backpropagate.

print(x1.grad)  # The gradient of x1 has accumulated from both backward passes.
print(x2.grad)  # The gradient of x2 has also accumul

tensor([6.])
tensor([4.])


#### Shared intermediate variables version :)

In [4]:
x1 = torch.FloatTensor([2])
x1.requires_grad = True
x2 = torch.FloatTensor([3])
x2.requires_grad = True

# y is an intermediate variable.
y = x1 * x2  

# Two losses that depend on the intermediate variable y.
loss1 = y - torch.sin(x2)
loss2 = y + torch.sin(x2)

# Backpropagating on loss1 requires us to retain the graph if we want to backpropagate on loss2 afterwards,
# because loss2 depends on the same y that was created before.
loss1.backward()  # retain_graph=True is needed if we want to backpropagate through y again.
loss2.backward()  # Now it's fine without retain_graph=True because this is the last time we backpropagate.

print(x1.grad)  # The gradient of x1 has accumulated from both backward passes.
print(x2.grad)  # The gradient of x2 has also accumul

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.