# 2. Autograd

The autograd package provides automatic differentiation for all operations on Tensors. Generally speaking, torch.autograd is an engine for computing the vector-Jacobian product. It computes partial derivates while applying the chain rule.

Set `requires_grad = True`:

In [23]:
import torch

# requires_grad = True -> tracks all operations on the tensor in a computational graph. 
x = torch.randn(3, requires_grad=True)   
y = x + 2

# y was created as a result of an operation, so it has a grad_fn attribute.
# grad_fn: references a Function that has created the Tensor

print("x:",x) # created by the user -> grad_fn is None
print("x.grad_fn:",x.grad_fn)

print("\ny:",y)
print("y.grad_fn:",y.grad_fn)

x: tensor([-0.2688,  0.8487, -0.0750], requires_grad=True)
x.grad_fn: None

y: tensor([1.7312, 2.8487, 1.9250], grad_fn=<AddBackward0>)
y.grad_fn: <AddBackward0 object at 0x00000188B5EAFAC0>


#### Why these these.grad_fn matters?  
Later, when you call y.backward(), PyTorch uses y.grad_fn to trace back the operations through the graph (e.g., x + 2, then maybe something else) and compute gradients for each variable involved

In [24]:
# Do more operations on y
z = y * y * 3
print("z:",z)
print("z.grad_fn:",z.grad_fn)

z = z.mean()
print("\nz:",z)
print("z.grad_fn:",z.grad_fn)

z: tensor([ 8.9912, 24.3458, 11.1170], grad_fn=<MulBackward0>)
z.grad_fn: <MulBackward0 object at 0x00000188B5EAE3E0>

z: tensor(14.8180, grad_fn=<MeanBackward0>)
z.grad_fn: <MeanBackward0 object at 0x00000188CCB1DD20>


In [25]:
# Let's compute the gradients with backpropagation
##### When we finish our computation we can call .backward() and have all the gradients computed automatically.
# The gradient for this tensor will be accumulated into .grad attribute.
# It is the partial derivate of the function w.r.t. the tensor

print(x.grad)
z.backward()   # here it computes dz/dx
print(x.grad) # dz/dx

# !!! Careful!!! backward() accumulates the gradient for this tensor into .grad attribute.  ---- DURING EACH EPOCH WE NEED TO EMPTY THE GRADIENT (VERY IMPORTANT)
# !!! We need to be careful during optimization !!! optimizer.zero_grad()

None
tensor([3.4624, 5.6975, 3.8500])


#### Stop a tensor from tracking history:
For example `during the training loop when we want to update our weights`, or `after training during evaluation`. These operations should not be part of the gradient computation. To prevent this, we can use:  

(3 diff ways)
- `x.requires_grad_(False)`
- `x.detach()`
- wrap in `with torch.no_grad():`

In [None]:
# .requires_grad_(...) changes an existing flag in-place.
a = torch.randn(2, 2)# by default requires_grad = False
b = (a * a).sum()
print(a.requires_grad)
print(b.grad_fn)

# a = torch.randn(2, 2, requires_grad = True) # this or below
a.requires_grad_(True)
b = (a * a).sum()
print(a.requires_grad)
print(b.grad_fn)

False
None
True
<SumBackward0 object at 0x00000188B5EAE5C0>


In [27]:
# .detach(): get a new Tensor with the same content but no gradient computation:
a = torch.randn(2, 2, requires_grad=True)
b = a.detach()
print(a.requires_grad)
print(b.requires_grad)

True
False


In [34]:
# we will see this often during model eval

# wrap in 'with torch.no_grad():'
a = torch.randn(2, 2, requires_grad=True)
print(a.requires_grad)
with torch.no_grad():
    b = a ** 2
    print(b.requires_grad)

True
False
