# PyTorch Computational Graphs

PyTorch computational graphs are dynamic graphs.

A PyTorch Tensor it nothing but an n-dimensional array. The framework provides a lot of functions for operating on these Tensors.

In PyTorch: 
* The autograd package provides automatic differentiation to automate the computation of the backward passes in neural networks. 

* The forward pass of your network defines the computational graph; 

    * nodes in the graph are Tensors

    * edges are functions that produced the output Tensors from input Tensors. 
    
    * Back-propagation through this graph then gives the gradients.

Every Tensor in PyTorch has a flag: `required_grad` that allows for fine-grained exclusion of subgraphs from gradient computation and can increase efficiency. If x is a Tensor that has `x.requires_grad=True` then x.grad is another Tensor holding the gradient of x with respect to some scalar value.    

In [1]:
import torch

x = torch.randn(3,3) # requires_grad=False by default
y = torch.randn(3,3) #requires_grad=False by default
z = torch.randn((3,3),requires_grad=True)
a = x+y # since both x and y don't require gradients, a also doesn't require gradients
print(a.requires_grad) #output: False
b = a+z #since z requires gradient, b also requires gradient
print(b.requires_grad) #output: True

False
True


As seen from the above example, if there is a single input to an operation that requires gradient, its output will also require gradient. Conversely, only if all inputs don’t require gradient, the output also won’t require it.

## Autograd 

Conceptually, autograd keeps a graph recording of all of the operations that created the data as you execute operations, giving you a `directed acyclic graph` whose leaves are the input tensors and roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the `chain rule (back-propagation)`.

Internally, autograd represents this graph as a graph of Function objects, which can be apply()-ed to compute the result of evaluating the graph. When computing the forward pass, autograd simultaneously performs the requested computations and builds up a graph representing the function that computes the gradient (the .grad_fn attribute of each torch.Tensor is an entry point into this graph). When the forward pass completed, the graph is evaluated in the backwards pass to compute the gradients.

The computational graphs in PyTorch are dynamic and thus are recreated from scratch at every iteration, and this is exactly what allows for using arbitrary Python control flow statements that can change the overall shape and size of the graph at every iteration. You don’t have to encode all possible paths before you launch the training — what you run is what you differentiate.

Every primitive autograd operator is two functions that operate on Tensors. The forward function computes output Tensors from input Tensors. The backward function receives the gradient of the output Tensors with respect to some scalar and computes the gradient of the input Tensors with respect to that same scalar.

To summarize, Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of the computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user since their grad_fn is None). If you want to compute the derivatives, you can call .backward() on a Tensor. After the call to the backwards function the gradient values are stored as tensors in grad attribute.

These concepts can be represented as following diagram.

<img src="figs/0_p9_fUhKXCf0LWAxh.png">

So for example if you create two Tensors a and b. Followed by c = a/b. The grad_fn of c would be DivBackward which is the backward function for the / operator. And as discussed earlier a collection of these grad_fn makes the backward graph. The forward and backward function are a member of torch.autograd.Function. You can define your own autograd operator by defining a subclass of torch.autograd.Function.

is_leaf: All Tensors that have requires_grad which is False are leaf Tensors by convention. For Tensors that have requires_grad with is True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is None. Only leaf Tensors have their grad populated during a call to backward(). To get grad populated for non-leaf Tensors, you can use retain_grad().

In [1]:
import torch

# Define the graph a,b,c,d are leaf nodes and e is the root node
# The graph is constructed with every line since the 
# computational graphs are dynamic in PyTorch
a = torch.tensor([2.0],requires_grad=True)
b = torch.tensor([3.0],requires_grad=True)
c = torch.tensor([5.0],requires_grad=True)
d = torch.tensor([10.0],requires_grad=True)
u = a*b
t = torch.log(d)
v = t*c
t.retain_grad()
e = u+v

In [2]:
print(a.is_leaf)
print(a.grad_fn)
print(a.grad)
print()

print(e.is_leaf)
print(e.grad_fn)
print(e.grad)
print()

print(t.is_leaf)
print(t.grad_fn)
print(t.grad)

True
None
None

False
<AddBackward0 object at 0x7f86305835e0>
None

False
<LogBackward0 object at 0x7f86305835e0>
None


  print(e.grad)


The leaves don’t have grad_fn but will have gradients. Non leaf nodes have grad_fn but don’t have gradients. Before the backward() is called there are no grad values.

In [3]:
from IPython.display import display, Math

e.backward()
display(Math(fr'\frac{{\partial e}}{{\partial a}} = {a.grad.item()}'))
print()
display(Math(fr'\frac{{\partial e}}{{\partial b}} = {b.grad.item()}'))
print()
display(Math(fr'\frac{{\partial e}}{{\partial c}} = {c.grad.item()}'))
print()
display(Math(fr'\frac{{\partial e}}{{\partial d}} = {d.grad.item()}'))

<IPython.core.display.Math object>




<IPython.core.display.Math object>




<IPython.core.display.Math object>




<IPython.core.display.Math object>

In [4]:
print(a.is_leaf)
print(a.grad_fn)
print(a.grad)
print()

print(e.is_leaf)
print(e.grad_fn)
print(e.grad)
print()

print(t.is_leaf)
print(t.grad_fn)
print(t.grad)

True
None
tensor([3.])

False
<AddBackward0 object at 0x7f86305a3e20>
None

False
<LogBackward0 object at 0x7f86305a3e20>
tensor([5.])


  print(e.grad)
