Central to all neural network in python is autograd package. Autograd package provides automatic differentiation for all operation of tensors. It is define-by-tun framework, which means backprop is defined by how the code is run, and that every single iteration can be different.
<br><br>
torch.Tensor: central class of package<br><br>
if .requires_grad_(True), it starts tracking all operations on it.<br><br> 
When computation is finished, we can call .backward() and have gradinets computed automatically.<br><br> 
Gradient will be accumulated to .grad attribute<br><br>
To stop a tensor from tracking history call .detach() function to detach it from computation history and to prevent future computation from being tracked.

To prevent tracking history , you can also wrap code block in with torch.no_grad()

This can be helpful when evaluating a model because a model may have trainable parameters with requires_grad True, but for which we don't need gradients <br><br>



# Tensor and Function

Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a grad_fn attribute that references a Function that has created the tensor. <br><br>

If you want to compute the derivative, call .backward() on a Tensor. If tensor is a scalar, no arguments needed on backward().<br><br>

If it has more elements, you need to specify a gradient argument that is a tensor of matching type

In [13]:
import torch

In [14]:
#Create a tensor and set requires_grad=True to track computation with it
X=torch.ones(2,2,requires_grad=True)
print(X)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


In [15]:
#Tensor operation
y=X+2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


In [16]:
#y was created as a result of operation, so it has a grad_fn
print(y.grad_fn)

<AddBackward0 object at 0x7fd2f194da20>


In [17]:
#More operation on y
z=y*y*3
out=z.mean()
print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


In [18]:
#requires_grad(...) changes to existing Tensor's requires_grad flag-in-plae. Input flag defaults to false if not given.


In [19]:
a=torch.randn(2,2)
a=((a*3)/(a-1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b=(a*a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x7fd2f04ea9b0>


# Gradients

Backpropagate now. Because out contains single scalar, out.backwards() is equivalent to out.backward(torch.tensor(1))

In [20]:
out.backward()

In [21]:
print(X.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [22]:
# we can stop autograd from tracking history on tensor by 2 ways
# 1st way
print(X.requires_grad)
print((X**2).requires_grad)
with torch.no_grad():
    print((X**2).requires_grad)

True
True
False


In [24]:
# 2nd way, by calling detach
print(X.requires_grad)
y=X.detach()
print(y.requires_grad)
print(X.eq(y).all)

True
False
<built-in method all of Tensor object at 0x7fd2f1915fc0>
