##  2. Autograd

#### Importing libraries

In [1]:
import torch

#### Table of content

* [Autograd](#first-bullet)
* [Tensor](#second-bullet)
* [Function](#third-bullet)
* [Examples](#forth-bullet)
* [Gradients](#fifth-bullet)
    * Calling the gradient with 1 element tensor 
    * Calling the gradient with more than 1 element tensor
    * Stoping autograd from tracking the history on Tensors

## Autograd <a class="anchor" id="first-bullet"></a>

Central to all network is the `autograd` package
    - It provides automatic differentiation for all operations on Tensors. 
    - It is a define-by-run framework
        - Your backprop is defined by how your code is run and that every single iteration can be different 

## Tensor <a class="anchor" id="second-bullet"></a>

torch.Tensor is the central class of the package 
    - .requires_grad if this is set as True, it will start to track all the operations on it.
        - When we are done with our computation, we can call .backward() and have all the gradients computed automatically
        - The gradient for this tensor will be accumulated into .grad attribute
    
    - To stop a tensor from tracking history, we have to call .detach()
    - Or to prevent tracking history and using memory, we can wrapte the code block in {with torch.no_grad():}
        - Useful when the code block has trainable parameters with requires_grad=True but we do not need it for evaluation

## Function <a class="anchor" id="third-bullet"></a>

Tensor and Function are interconnected and build up an ayclic graph that encodes a complete history of computation
    - Each tensor has a .grad_fn attribute that references a Function that has created the Tensor
        - If created by users, the .grad_fn = None
    - If we want to compute the deriavatives, we call .backward() on a tensor 
    - If the Tensor is a scalar (ie, 1 element), we dont have to specify any arguments
    - Else, we have to specify a gradient argument that is a tensor of matching shape

## Examples <a class="anchor" id = "forth-example"></a>

#### Create a tensor that requires gradient to track computation

In [7]:
x = torch.ones(2,2, requires_grad=True)
print(x)
print()

print("grad_fn of x : ")
print(x.grad_fn)

tensor([[ 1.,  1.],
        [ 1.,  1.]])

grad_fn of x : 
None


#### Do an operation of tensor:

In this case, y was created because of an operation and not by the user --> Hence it has a grad_fn.

In [4]:
y = x + 2
print(y)

tensor([[ 3.,  3.],
        [ 3.,  3.]])


In [5]:
print(y.grad_fn)

<AddBackward0 object at 0x00000159912B3B00>


#### Setting the requires_grad() for a Tensor

In [21]:
a = torch.ones(2,2)
a = ((a * 3) / (a  + 1)) 
print(a)
print(a.requires_grad) # Because a was created by the user, it does not have a grad_fn
print(a.grad_fn)
print()

# We can set requires_grad to be True by using the function requires_grad_ --> Defaults to true if True is not given
a.requires_grad_(True)
print("a.requires_grad after calling the function is:")
print(a.requires_grad)
print()

# b has grad_fn because it was created from an operation
b = (a * a).sum()
print(b)
print(b.grad_fn)

tensor([[ 1.5000,  1.5000],
        [ 1.5000,  1.5000]])
False
None

a.requires_grad after calling the function is:
True

tensor(9.)
<SumBackward0 object at 0x0000015994475780>


## Gradients <a class="anchor" id="fifth-bullet"></a>

Lets FINALLY do a backprop now!


#### Creating the tensor

In [50]:
x = torch.ones(2,2, requires_grad = True)
y = x + 2
z = y * y + 3 # Recall y = x + 2 and x = torch.ones
print(z)
print()

print("Creating out tensor")
out = z.mean()
print(out)

tensor([[ 12.,  12.],
        [ 12.,  12.]])

Creating out tensor
tensor(12.)


#### Calling backprop on a 1 element tensor

Because out contains a single scalar, out.backward() is equivalent to out.backward(torch.tensor(1))

In [41]:
out.backward()

In [42]:
print(x.grad)

tensor([[ 1.5000,  1.5000],
        [ 1.5000,  1.5000]])


#### Calling backprop

In [55]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000: # math.sqrt(sum(element ** 2))
    y = y * 2

print(y)

tensor([-1513.8644,  -498.0468,  1011.6290])


In [68]:
gradients = torch.tensor([0, 0, 0], dtype = torch.float) # Just a random initialization
y.backward(gradients)
print(x.grad)

tensor([  512.0000,  5120.0000,  2048.1128])


#### Stoping autograd from tracking history on Tensors

In [82]:
x = torch.ones(2,2, requires_grad= True)
print(x.requires_grad)
print((x ** 2).requires_grad)
print()

# 1 way - detach X
x = x.detach()
print(x.requires_grad)

# 2 way - wrap execution
# We wrap the executions that do not require tracking under torch.no_grad()
with torch.no_grad():
    print((x ** 2).requires_grad)

True
True

False
False
