In [None]:
%matplotlib inline


Autograd: automatic differentiation
===================================

Central to all neural networks in PyTorch is the ``autograd`` package.

The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.


Back to Tensor
------------------

If you set ``torch.Tensor``'s attribute
``.requires_grad`` as ``True``, it starts to track all operations on it. When
you finish your computation you can call ``.backward()`` and have all the
gradients computed automatically. The gradient for this tensor will be
accumulated into ``.grad`` attribute.

To stop a tensor from tracking history, you can call ``.detach()`` to detach
it from the computation history, and to prevent future computation from being
tracked.

To prevent tracking history (and using memory), you can also wrap the code block
in ``with torch.no_grad():``. This can be particularly helpful when evaluating a
model because the model may have trainable parameters with `requires_grad=True`,
but for which we don't need the gradients.


To compute the derivatives, you can call ``.backward()`` on
a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to ``backward()``,
however if it has more elements, you need to specify a ``gradient``
argument that is a tensor of matching shape.

In [None]:
import torch

Create a tensor and set requires_grad=True to track computation with it



In [None]:
x = torch.ones(2, 2, requires_grad=True); x

Do an operation of tensor:



In [None]:
y = x + 2; y

``y`` was created as a result of an operation, so it has a ``grad_fn``.



In [None]:
y.grad_fn

Do more operations on y



In [None]:
z = y * y * 3
out = z.mean()

print(z, out)

``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
flag in-place. The input flag defaults to ``False`` if not given.



In [None]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)

In [None]:
a.requires_grad_(True)
print(a.requires_grad)

In [None]:
b = (a * a).sum()
print(b.grad_fn)

Gradients
---------
Let's backprop now.

Since ``out`` tensor defined above contains a single scalar, ``out.backward()`` we do not need to pass any argument to the ``backword()``:

In [None]:
out

In [None]:
out.backward()

print gradients d(out)/dx




In [None]:
print(x.grad)

You should have got a matrix of ``4.5``. Let’s call the ``out``
*Tensor* “$o$”.
We have that $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.
Therefore,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, hence
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.



**Read Later:**

Documentation of ``autograd`` and ``Function`` is at
http://pytorch.org/docs/autograd

