## Now that the basics are done, which all seemed very familiar so far, I'm moving on to the next segment of the class that dives a little deeper into the fundamentals of PyTorch.

## 2. Variables and Gradients

### 2.1 Variables
* A variable wraps a Tensor
* Allows accumulation of gradients

In [2]:
import torch
from torch.autograd import Variable

In [3]:
a = Variable(torch.ones(2,2), requires_grad=True)
a

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]

In [4]:
torch.ones(2,2) #Not a variable


 1  1
 1  1
[torch.FloatTensor of size 2x2]

In [5]:
#Behaves very similar to Tensors
b = Variable(torch.ones(2,2), requires_grad=True)
print(a + b)
print(torch.add(a,b))

Variable containing:
 2  2
 2  2
[torch.FloatTensor of size 2x2]

Variable containing:
 2  2
 2  2
[torch.FloatTensor of size 2x2]



In [6]:
print(a*b)
print(torch.mul(a,b))

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]



### 2.2 Gradients

#### What exactly is requires_grad?

* Allows calculation of gradients w.r.t. to the variable

In [7]:
x = Variable(torch.ones(2), requires_grad=True)
x

Variable containing:
 1
 1
[torch.FloatTensor of size 2]

In [8]:
y = 5 * (x+1)**2
y

Variable containing:
 20
 20
[torch.FloatTensor of size 2]

In [9]:
#Whoa, cool, you can do algebra with torch Variables

#### Backward should be called only on a scalar (i.e. 1-element tensor) or with gradient w.r.t. the variable

I'm not sure what this means, but maybe it won't be too critical going forward...
Let's reduce y to a scalar then...

In [10]:
o = (1/2) * torch.sum(y)  #Sum the y variables and divide by 2 (the length of the y array.. i.e. average)
o

Variable containing:
 20
[torch.FloatTensor of size 1]

#### Recap y equation: $y_i=5(x_i+1)^2$
#### Recap o equation: $o=\frac12\sum_iy_i$
#### Substitute y into o equation: $o=\frac12\sum_i5(x_i+1)^2$

$$ \frac{\partial o}{\partial x_i} = \frac12[10(x_i+1)]$$
$$ \frac{\partial o}{\partial x_i}|_{x_i=1}=\frac12[10(1+1)]=\frac{10}2(2)=10 $$

In [11]:
o.backward() #backward calculates the gradients

In [12]:
x.grad

Variable containing:
 10
 10
[torch.FloatTensor of size 2]

In [13]:
#backward in detail
o.backward(torch.FloatTensor([1.0,1.0]))
x.grad

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

The above line didn't work like the video, but I can go back and search on the error message (LOL, O RLY). Gradients make enough sense, but backward doesn't really click. I should dig into the docs to find out what the heck is going on with backward. 