In [73]:
%matplotlib inline


Autograd: automatic differentiation
===================================

Central to all neural networks in PyTorch is the ``autograd`` package.
Let’s first briefly visit this, and we will then go to training our
first neural network.


The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.

autograd 包对所有Tensor上的计算提供自动求导。这是一个运行时定义的框架， 这意味着你的反向传播取决于正向计算如何运行，并且每一次迭代都可以不同。

Let us see this in more simple terms with some examples.

Variable
--------

``autograd.Variable`` is the central class of the package. It wraps a
Tensor, and supports nearly all of operations defined on it. Once you
finish your computation you can call ``.backward()`` and have all the
gradients computed automatically.

audograd.Variable 是这个包的核心类。 它包含一个Tensor， 支持定义在这个Tensor上的所有计算。 一旦你完成了你的计算， 你就可以调用 .backward() 自动计算所有的梯度。

You can access the raw tensor through the ``.data`` attribute, while the
gradient w.r.t. this variable is accumulated into ``.grad``.

你可以从中获得原始的 tensor 通过 .data 成员， 同时其梯度保存在 .grad 成员中。

.. figure:: /_static/img/Variable.png
   :alt: Variable

   Variable

There’s one more class which is very important for autograd
implementation - a ``Function``.

还有一个对于实现自动求梯度也是非常重要的类—— Function 类

``Variable`` and ``Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each variable has
a ``.grad_fn`` attribute that references a ``Function`` that has created
the ``Variable`` (except for Variables created by the user - their
``grad_fn is None``).

Variable 和 Function 类已经内部连接好了并建立了一个非循环的计算图， 它编码了所有的计算过程。 每个 Variable 变量都有一个 .grad_fn 成员， 这个成员引用一个创建了这个 Variable 对象的 Function 对象 （除非这个对象是由用户自己创建的）。

If you want to compute the derivatives, you can call ``.backward()`` on
a ``Variable``. If ``Variable`` is a scalar (i.e. it holds a one element
data), you don’t need to specify any arguments to ``backward()``,
however if it has more elements, you need to specify a ``grad_output``
argument that is a tensor of matching shape.

如果你想计算导数， 你可以在一个 Variable 对象上调用 .backward() 方法。 如果这个对象是一个标量（只有一个元素）， 你不必对 .backward() 指定任何参数, 然而， 如果它由多个元素， 你必须指定一个维度符合的 grad_output 参数。



In [74]:
import torch
from torch.autograd import Variable

Create a variable:

创建一个 Variable


In [75]:
x = Variable(torch.ones(2, 2), requires_grad=True)
print(x)

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]



Do an operation of variable:

在这个 Variable 上干一个操作


In [76]:
y = x + 2
print(y)

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]



``y`` was created as a result of an operation, so it has a ``grad_fn``.

``y`` 是由操作创建的， 因此它有一个 ``grad_fn`` 成员。

In [77]:
print(y.grad_fn)

<torch.autograd.function.AddConstantBackward object at 0x7f8abe413a98>


Do more operations on y



In [78]:
z = y * y * 3
out = z.mean()

print(z, out)

Variable containing:
 27  27
 27  27
[torch.FloatTensor of size 2x2]
 Variable containing:
 27
[torch.FloatTensor of size 1]



Gradients
---------
let's backprop now
``out.backward()`` is equivalent to doing ``out.backward(torch.Tensor([1.0]))``

注： out.backward() 方法会计算所有之前的梯度， 保存在之前的 Variable 对象的 grad 成员中。


In [79]:
out.backward()

print gradients d(out)/dx

out.backward() 之后， x.grad 将会存储 $\frac{\partial out}{\partial x}$ 的值, 而中间变量如 y.grad 将不会存储 $\frac{\partial out}{\partial y}$ 的值。


In [80]:
print(x.grad)

Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]



You should have got a matrix of ``4.5``. Let’s call the ``out``
*Variable* “$o$”.
We have that $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.
Therefore,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, hence
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.



You can do many crazy things with autograd!

如下程序，若 Variable 类对象 x 的 grad 成员已有定值， backward 之后会累加到上面， 默认初始化 Variable 类对象时 grad 成员默认为0;
backward（parameter） 里面的 parameter 参数是一个和 x.grad 维度相同的张量， 是最后求导之后的系数， $\frac{\partial out}{\partial x}$ 与之对应位置的值相乘再存入 x.grad 。

In [105]:
x = torch.ones(3)
print(x)
x = Variable(x, requires_grad=True)
x.grad = Variable(torch.rand(3))
print(x.grad)

y = x ** 3
while y.data.norm() < 1000:
    y = y * 2

print(y)


 1
 1
 1
[torch.FloatTensor of size 3]

Variable containing:
 0.4631
 0.1372
 0.7319
[torch.FloatTensor of size 3]

Variable containing:
 1
 1
 1
[torch.FloatTensor of size 3]



In [106]:
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(x.grad)

Variable containing:
 0.7631
 3.1372
 0.7322
[torch.FloatTensor of size 3]



**Read Later:**

Documentation of ``Variable`` and ``Function`` is at
http://pytorch.org/docs/autograd

