# Variable and Gradients

## 2.1 Variables

* A variable wraps a tensor
* Allows accumulation of gradients

In [61]:
import torch
from torch.autograd import Variable

In [62]:
# create our first variable
a = Variable(torch.full((2, 3), 3), requires_grad=True); a

tensor([[3., 3., 3.],
        [3., 3., 3.]], requires_grad=True)

In [63]:
# not a variable
torch.full((2, 3), 3)

tensor([[3., 3., 3.],
        [3., 3., 3.]])

In [64]:
# create another variable
b = Variable(torch.full((2, 3), 3), requires_grad=True)

In [65]:
# behaves similarly
a + b

tensor([[6., 6., 6.],
        [6., 6., 6.]], grad_fn=<AddBackward0>)

In [66]:
a * b

tensor([[9., 9., 9.],
        [9., 9., 9.]], grad_fn=<MulBackward0>)

In [67]:
# in place will not work 
#a.mul_(b)

## 2.2 Gradients

#### What is requires_grad?

* Allows calcuation of gradients wrt the variable

$$y_{i} = 5(x_{i} + 1)^{2}$$

In [68]:
x = Variable(torch.ones(2), requires_grad=True)
# uncomment to show random gradients
# x = Variable(torch.rand((2, 3)), requires_grad=True)
x

tensor([1., 1.], requires_grad=True)

$$y_{i}\mid_{x_{i}=1}=5(1+1)^{2}=5(2)^2 =5(4)=20$$

In [69]:
# create function in pytorch notice broadcasting
# both will be 20 due to matrix of ones
y = 5 * (x + 1) ** 2; y

tensor([20., 20.], grad_fn=<MulBackward0>)

**Backward should only be called on a scalar (i.e. 1-element tensor) or with gradient wrt the variable**

* So let's reduce to a scalar

$$o=\frac{1}{2}\sum y_{i}$$

where n = 2 because we have [20, 20]

In [70]:
o = (1/2) * torch.sum(y); o
# uncomment to show random differentiation
# o = (1/6) * torch.sum(y); o

tensor(20., grad_fn=<MulBackward0>)

**Recap y equation:** $$y_{i}=5(x_{i}+1)^{2}$$

**Recap o equation:** $$o=\frac{1}{2}\sum y_{i}$$

**Substitute y into o equation:** $$o=\frac{1}{2}\sum 5(x_{i}+1)^{2}$$

**Differentiate:** $$\frac{\partial o}{\partial x_{i}} = \frac{1}{2}[10(x_{i}+1)]$$

$$\frac{\partial o}{\partial x_{i}}\mid_{x_{i}=1} = \frac{1}{2}[10](1+1)=\frac{10}{2}(2)=10$$

In [71]:
o.backward(); o

tensor(20., grad_fn=<MulBackward0>)

In [72]:
x.grad

tensor([10., 10.])

## Summary

1. Variable
    * Wraps a tensor for gradient accumulation
2. Gradients
    * Define original equation
    * Substitute equation with x values
    * Reduce to Scalar output, through o means
    * Calculate gradients with o.backward()
    * Then access gradients of the x variable through x.grad