# 1. Variables and Gradients

## 1.1 Variables

- A variable wraps a Tensor
- Allows accumulation of gradients

In [1]:
import torch
from torch.autograd import Variable

In [2]:
a = Variable(torch.ones((3, 2)), requires_grad=True)
a

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], requires_grad=True)

In [3]:
type(a)

torch.Tensor

In [4]:
b = Variable(torch.ones((3, 2)), requires_grad=True)
b

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], requires_grad=True)

In [6]:
print(a+b)
print(torch.add(a, b))

tensor([[2., 2.],
        [2., 2.],
        [2., 2.]], grad_fn=<AddBackward0>)
tensor([[2., 2.],
        [2., 2.],
        [2., 2.]], grad_fn=<AddBackward0>)


In [7]:
print(a*b)
print(torch.mul(a, b))

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], grad_fn=<MulBackward0>)
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], grad_fn=<MulBackward0>)


#### Useful links

- https://discuss.pytorch.org/t/what-is-the-difference-between-tensors-and-variables-in-pytorch/4914/4
- https://medium.com/@layog/a-comprehensive-overview-of-pytorch-7f70b061963f
- https://discuss.pytorch.org/t/tensor-and-variable-are-the-same-now/19749

## 1.2 Gradients 

----------------------------------------------------------------
### (Watch the PyTorch Gradient Tutorial if needed)

**What exactly is requires_grad?**
- Allows calculation of gradients w.r.t the variable


**Gradient basic:**
- A gradient is a vector. 
- Its components consist of the partial derivatives of a function and it points in the direction of the greatest rate of increase of the function.
- For example, if you have the function 𝑓(𝑥1,...𝑥𝑛), its gradient would consist of n partial derivatives and would represent the vector field.

In [34]:
x = Variable(torch.ones((2)), requires_grad=True)
x

tensor([1., 1.], requires_grad=True)

In [35]:
y = 5 * (x + 1) ** 2
y

tensor([20., 20.], grad_fn=<MulBackward0>)

#### Backward should be called only
- On a scalar (i.e. 1 element tensor)
- Or, with gradient w.r.t. the variable



let's reduce y to scalar then...to mean value

In [36]:
o = (1/2) * torch.sum(y)
o

tensor(20., grad_fn=<MulBackward0>)

In [37]:
o.backward()
x.grad

tensor([10., 10.])

#### Backward in detail - ??

In [38]:
o.backward(torch.FloatTensor([1.0, 1.0]))
x.grad

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

#### Useful links:

- https://www.youtube.com/watch?v=IHZwWFHWa-w
- https://towardsdatascience.com/machine-learning-101-an-intuitive-introduction-to-gradient-descent-366b77b52645
- https://stackoverflow.com/questions/43451125/pytorch-what-are-the-gradient-arguments