<h2>Tensor</h2>

The autograd package provides **automatic differentiation** for **all operations on Tensors**. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

`torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as True, it starts to track **all operations** on it. When you finish your computation you can call `.backward()` and have **all** the gradients computed automatically. <font color=red>**The gradient for this tensor will be accumulated into `.grad` attribute.**</font>

To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked.

To prevent tracking history (and using memory), you can also wrap the code block in with `torch.no_grad()`:. This can be particularly helpful when evaluating a model because the model may have trainable parameters with requires_grad=True, but for which we don’t need the gradients.

There’s one more class which is very important for autograd implementation - a `Function`.

`Tensor` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a `.grad_fn` attribute that references a `Function` that has created the `Tensor` (except for **Tensors created by the user - their `grad_fn` is `None`**).

If you want to compute the derivatives, you can call `.backward()` on a `Tensor`. If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to `backward()`, <font color=red>however if it has more elements, you need to specify a `gradient` argument that is a tensor of matching shape.</font>

In [1]:
import torch

In [2]:
# requires_grad

In [3]:
x = torch.ones(2, 2, requires_grad=True)
print(x)
print(x.grad_fn)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None


In [4]:
y = x + 2
print(y)
# y was created as a result of an operation, so it has a grad_fn.
print(y.grad_fn)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x7f79d4589410>


In [5]:
z = y * y * 3
print(z)
print(z.grad_fn)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
<MulBackward0 object at 0x7f795837a490>


In [6]:
out = z.mean()
print(out)
print(out.grad_fn)

tensor(27., grad_fn=<MeanBackward0>)
<MeanBackward0 object at 0x7f79d456fe90>


In [7]:
# .requires_grad_( ... ) changes an existing Tensor’s requires_grad 
# flag in-place. The input flag defaults to False if not given.

In [8]:
a = torch.randn(2, 2)

print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

print()

a.requires_grad_(True)

print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
None

True
<SumBackward0 object at 0x7f7958140990>


<h2>Gradients</h2>

In [9]:
# back prop

In [10]:
# Because out contains a single scalar, out.backward() is equivalent 
# to out.backward(torch.tensor(1.)). 
out.backward(retain_graph=True)

$$
\begin{aligned}
  \frac{\partial out}{\partial x} |_{x=1}
  &= \frac{\partial out}{\partial z} 
    * \frac{\partial z}{\partial y} 
    * \frac{\partial y}{\partial x} |_{x=1} \\
  &= \frac{\partial \frac{1}{4} (z_{0, 0} + z_{0, 1} + z_{1, 0} + z_{1, 1})}{\partial z} 
    * \frac{\partial 3y^2}{\partial y} 
    * \frac{\partial x + 2}{\partial x} |_{x=1} \\
  &= \frac{\partial \frac{1}{4} (z_{0, 0} + z_{0, 1} + z_{1, 0} + z_{1, 1})}{\partial 
     \begin{bmatrix}
    \frac{z_{0, 0} + c}{z_{0, 0}} & \frac{z_{0, 1} + c}{z_{0, 1}} \\
    \frac{z_{1, 0} + c}{z_{1, 0}} & \frac{z_{1, 1} + c}{z_{1, 1}} \\
    \end{bmatrix}} 
    * \frac{\partial 3y^2}{\partial y} 
    * \frac{\partial x + 2}{\partial x} |_{x=1} \\
  &= \frac{1}{4} * 6y * 1 |_{x=1} \\
  &= \frac{3}{2} (x + 2) |_{x=1} \\
  &= \begin{bmatrix}
    4.5 & 4.5 \\
    4.5 & 4.5 \\
    \end{bmatrix}
\end{aligned}
$$

In [11]:
# prints gradients d(out) / dx
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


In [12]:
# non-leaf tensors have no gradients computed
print(y.grad)
print(z.grad)
print(out.grad)

None
None
None


In [13]:
# gradients need to be cleared to get a new one
x.grad = torch.zeros(x.grad.size())
# z is not a scalar thus param for backward() is needed
z.backward(torch.ones(2, 2))

In [14]:
print(x.grad)

tensor([[18., 18.],
        [18., 18.]])


In [15]:
x.grad = torch.zeros(x.grad.size())
y.backward(torch.ones(2, 2))

In [16]:
print(x.grad)

tensor([[1., 1.],
        [1., 1.]])


<font color=red>???</font>

In [17]:
# an example of vector-Jacobian product

In [18]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2
print(y)

tensor([-1234.2517,  -409.7135,  -501.1111], grad_fn=<MulBackward0>)


In [19]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

print(x.grad)

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])


<font color=red>???</font>

In [20]:
# makes requires_grad be false

In [21]:
print(x.requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
False


In [22]:
print(x.requires_grad)

y = x.detach()

print(y.requires_grad)

# x = y in data
print(x.eq(y))

True
False
tensor([True, True, True])
