In [1]:
%matplotlib inline


Autograd: automatic differentiation 自动微分
===================================

Central to all neural networks in PyTorch is the ``autograd`` package.

PyTorch的所有神经网络的中心是“autograd”包。

Let’s first briefly visit this, and we will then go to training our first neural network.

让我们先简单地了解一下，然后我们将开始训练我们的第一个神经网络。

The ``autograd`` package provides automatic differentiation for all operations on Tensors. 

“autograd”包提供了对张量的所有操作的自动微分。

It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

它是一个define-by-run框架，这意味着您的backprop是由您的代码运行的方式定义的，并且每一个迭代都是不同的。

Let us see this in more simple terms with some examples.

让我们用更简单的术语来看看这个例子。

Tensor 张量
--------

``torch.Tensor`` is the central class of the package. 

``torch.Tensor``是这个包的中心类。

If you set its attribute``.requires_grad`` as ``True``, it starts to track all operations on it. 

如果你设置它的属性requires_grad为‘True’，他开始跟踪所有操作。

When you finish your computation you can call ``.backward()`` and have all the gradients computed automatically. 

当你完成计算时，你可以调用``.backward()``，并自动计算所有的梯度。

The gradient for this tensor will be accumulated into ``.grad`` attribute.

这个张量的梯度将被累积成“.grad“属性。

To stop a tensor from tracking history, you can call ``.detach()`` to detach it from the computation history, and to prevent future computation from being tracked.

停止一个张量跟踪历史，你可以调用``.detach()``来将它从计算历史中分离出来，并防止未来的计算被跟踪。

To prevent tracking history (and using memory), you can also wrap the code block in ``with torch.no_grad():``.

防止跟踪历史（和使用内存），您还可以用``with torch.no_grad():``来包装代码块。

This can be particularly helpful when evaluating a model because the model may have trainable parameters with `requires_grad=True`,but we don't need the gradients.

这在评估模型时特别有用，因为模型可能具有“requiresgrad=True”的可培训参数，但我们不需要梯度。

There’s one more class which is very important for autogradimplementation - a ``Function``.

还有一个类对于自动梯度法来说是非常重要的 - ``Function``

``Tensor`` and ``Function`` are interconnected and build up an acyclic graph, that encodes a complete history of computation. 

“张量”和“功能”是相互联系的，建立了一个非循环图，它编码了一个完整的计算历史。

Each variable has a ``.grad_fn`` attribute that references a ``Function`` that has created the ``Tensor`` (except for Tensors created by the user - their ``grad_fn is None``).

每个变量都有一个``.grad_fn``属性是“函数”，它创造了“张量”（除了用户创建的张量，他们的``grad_fn is None``）。

If you want to compute the derivatives, you can call ``.backward()`` on a ``Tensor``. 

如果你想计算导数，你可以在“张量”上调用``.backward()``。

If ``Tensor`` is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to ``backward()``,however if it has more elements, you need to specify a ``gradient``argument that is a tensor of matching shape.

如果“张量”是一个标量（也就是说它包含一个元素数据），你不需要为“backward（）”指定任何参数，但是如果它有更多的元素，你需要指定一个“gradient”参数，它是一个匹配形状的张量。

In [2]:
import torch

Create a tensor and set requires_grad=True to track computation with it

创建一个张量并设置requiresgrad=True来跟踪计算

In [4]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[ 1.,  1.],
        [ 1.,  1.]])


Do an operation of tensor:

做张量的运算：

In [5]:
y = x + 2
print(y)

tensor([[ 3.,  3.],
        [ 3.,  3.]])


``y`` was created as a result of an operation, so it has a ``grad_fn``.

y是作为一个操作的结果创建的，所以它有一个gradfn。

In [6]:
print(y.grad_fn)

<AddBackward0 object at 0x0000017BFE246A58>


In [8]:
y.grad_fn

<AddBackward0 at 0x17baec4eac8>

Do more operations on y

对y做更多的操作

In [9]:
z = y * y * 3
out = z.mean()

print(z, out)

tensor([[ 27.,  27.],
        [ 27.,  27.]]) tensor(27.)


In [10]:
z

tensor([[ 27.,  27.],
        [ 27.,  27.]])

In [11]:
out

tensor(27.)

``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``flag in-place. 

``.requires_grad_( ... )``改变一个现有张量的“requiresgrad”标志。

The input flag defaults to ``True`` if not given.

如果没有输入，输入标志默认为“True”。

In [12]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x0000017BAEC63588>


In [13]:
a

tensor([[ 0.0248,  1.4551],
        [ 0.3484, -4.4953]])

In [14]:
b

tensor(22.4472)

Gradients 梯度
---------
Let's backprop now Because ``out`` contains a single scalar, ``out.backward()`` is equivalent to ``out.backward(torch.tensor(1))``.

现在让我们backprop，因为“out”包含一个标量，``out.backward()``相当于``out.backward(torch.tensor(1))``.

In [15]:
out.backward()

print gradients d(out)/dx


打印梯度 d(out)/dx

In [16]:
print(x.grad)

tensor([[ 4.5000,  4.5000],
        [ 4.5000,  4.5000]])


You should have got a matrix of ``4.5``. 

你应该得到一个“4.5”的矩阵。

Let’s call the ``out``*Tensor* “$o$”.

We have that 

$o = \frac{1}{4}\sum_i z_i$,

$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.

Therefore,

$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, 

hence

$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.



You can do many crazy things with autograd!

你可以用autograd做很多疯狂的事情！

In [18]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

tensor([ 346.7196, -715.7305, -697.0422])


In [19]:
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)

print(x.grad)

tensor([  102.4000,  1024.0000,     0.1024])


You can also stops autograd from tracking history on Tensorswith requires_grad=True by wrapping the code block in ``with torch.no_grad():``

你也可以阻止autograd在使用requiresgrad=True的情况下跟踪历史，通过将代码块包装在``with torch.no_grad():``

In [20]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False


**Read Later:**

Documentation of ``autograd`` and ``Function`` is at http://pytorch.org/docs/autograd

“autograd”和“功能”的文档在 http://pytorch.org/docs/autograd