http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html

In [2]:
%matplotlib inline


Neural Networks 神经网络
===============

Neural networks can be constructed using the ``torch.nn`` package.

神经网络可以用``torch.nn``包来构建。

Now that you had a glimpse of ``autograd``, ``nn`` depends on ``autograd`` to define models and differentiate them.

你已经大致了解了``autograd``， ``nn``依赖于 ``autograd`` 去定义模型并对其进行区分。

An ``nn.Module`` contains layers, and a method ``forward(input)`` that returns the ``output``.

``nn.Module`` 包含层，方法``forward(input)``返回``output``

For example, look at this network that classifies digit images:

例如，下面这个网络可以给数字图片分类：

.. figure:: /_static/img/mnist.png
   :alt: convnet

   convnet

It is a simple feed-forward network. 

这是一个简单的前向传播网络。

It takes the input, feeds it through several layers one after the other, and then finally gives the output.

获取输入，将它通过几个层一个接一个地传递，然后最终给出输出。

A typical training procedure for a neural network is as follows:

典型的神经网络训练过程如下：

- Define the neural network that has some learnable parameters (or weights)
- 定义具有可学习参数（或权重）的神经网络
- Iterate over a dataset of inputs
- 迭代一个输入的数据集
- Process input through the network
- 通过神经网络处理输入值
- Compute the loss (how far is the output from being correct)
- 计算损失（输出值和正确值之间的差距）
- Propagate gradients back into the network’s parameters
- 将梯度反向传播给网络的参数
- Update the weights of the network, typically using a simple update rule:
- 更新网络的权重，通常使用一个简单的更新规则：

  ``weight = weight - learning_rate * gradient``
  
  权重 = 权重 - 学习速率 * 梯度

Define the network 定义网络
------------------

Let’s define this network:

让我们来定义这个网络:

![](http://pytorch.org/tutorials/_images/mnist.png)

subsampling是二次取样，在深度神经网络里面和pooling是一样的

卷积→子采样→卷积→子采样→全连接→全连接→高斯连接测试

https://blog.csdn.net/songyimin1208/article/details/68952210

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self): # 构造函数
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution kernel
        # 1个输入图像通道，6个输出通道，5x5平方的卷积核  
        #建立两个2维卷积层
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        #三个全连接层
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x): #注意，2D卷积层的输入data维数是 batchsize*channel*height*width
        # Max pooling over a (2, 2) window 
        # 通过（2，2）窗口最大池化
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        # 如果大小是正方形你只能指定一个数字
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension 除了batch维度之外的所有维度
        num_features = 1 
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


You just have to define the ``forward`` function, and the ``backward`` function (where gradients are computed) is automatically defined for you using ``autograd``.

你必须定义“forward”函数，使用“autograd”时“backward”函数（计算梯度）会自动定义

You can use any of the Tensor operations in the ``forward`` function.

在"forward"函数中，你可以使用所有的Tensor操作。

The learnable parameters of a model are returned by ``net.parameters()``

模型的学习参数``net.parameters()``由返回。

In [4]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 5, 5])


Let try a random 32x32 input

我们尝试输入一个32 x 32矩阵

Note: Expected input size to this net(LeNet) is 32x32. 

注意：这个网络的预期输入大小（LeNet）是32x32。

To use this net on MNIST dataset, please resize the images from the dataset to 32x32.

要在MNIST数据集上使用这个网络，请将数据从数据集调整到32x32。

In [5]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[-0.0412,  0.0261, -0.0126,  0.1076, -0.0326, -0.0807, -0.0452,
         -0.0570, -0.0551, -0.1073]])


Zero the gradient buffers of all parameters and backprops with random gradients:

梯度缓存参数全部置零，反向传播随机梯度。

In [6]:
net.zero_grad()
out.backward(torch.randn(1, 10))

<div class="alert alert-info"><h4>Note</h4><p>``torch.nn`` only supports mini-batches. The entire ``torch.nn``
    package only supports inputs that are a mini-batch of samples, and not a single sample.

    For example, ``nn.Conv2d`` will take in a 4D Tensor of
    ``nSamples x nChannels x Height x Width``.

    If you have a single sample, just use ``input.unsqueeze(0)`` to add
    a fake batch dimension.</p></div>

Before proceeding further, let's recap all the classes you’ve seen so far.

**Recap扼要重述:**
  -  ``torch.Tensor`` - A *multi-dimensional array* with support for autograd
     operations like ``backward()``. Also *holds the gradient* w.r.t. the
     tensor.
  -  ``nn.Module`` - Neural network module. *Convenient way of
     encapsulating parameters*, with helpers for moving them to GPU,
     exporting, loading, etc.
  -  ``nn.Parameter`` - A kind of Tensor, that is *automatically
     registered as a parameter when assigned as an attribute to a*
     ``Module``.
  -  ``autograd.Function`` - Implements *forward and backward definitions
     of an autograd operation*. Every ``Tensor`` operation, creates at
     least a single ``Function`` node, that connects to functions that
     created a ``Tensor`` and *encodes its history*.

**At this point, we covered在这一点上，我们谈到：**
  -  Defining a neural network 定义一个神经网络
  -  Processing inputs and calling backward 处理输入和向后调用

**Still Left 还剩下:**
  -  Computing the loss 计算损失
  -  Updating the weights of the network 更新网络的权重

Loss Function 损失函数
-------------
A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

一个损失函数获取一对输入（输出，目标），计算一个值来估计输出离目标有多远。

There are several different`loss functions <http://pytorch.org/docs/nn.html#loss-functions>`_ under the nn package .

nn包里面有几个不同的损失函数。

A simple loss is: ``nn.MSELoss`` which computes the mean-squared error between the input and the target.

计算均方误差

For example:



In [8]:
output = net(input)
target = torch.arange(1, 11)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(39.0210)


Now, if you follow ``loss`` in the backward direction, using its``.grad_fn`` attribute, you will see a graph of computations that looks like this:

::

    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
          -> view -> linear -> relu -> linear -> relu -> linear
          -> MSELoss
          -> loss

So, when we call ``loss.backward()``, the whole graph is differentiated w.r.t. the loss, and all Tensors in the graph that has ``requres_grad=True`` will have their ``.grad`` Tensor accumulated with the gradient.

For illustration, let us follow a few steps backward:



In [9]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward object at 0x00000214D9C90A90>
<AddmmBackward object at 0x00000214DA5F9748>
<ExpandBackward object at 0x00000214D9C90A90>


Backprop 反向传播
--------
To backpropagate the error all we have to do is to ``loss.backward()``.
You need to clear the existing gradients though, else gradients will be
accumulated to existing gradients.


Now we shall call ``loss.backward()``, and have a look at conv1's bias
gradients before and after the backward.



In [7]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([ 0.,  0.,  0.,  0.,  0.,  0.])
conv1.bias.grad after backward
tensor([-0.0153,  0.0579, -0.1072, -0.0156,  0.0183,  0.0125])


Now, we have seen how to use loss functions.

**Read Later:**

  The neural network package contains various modules and loss functions
  that form the building blocks of deep neural networks. A full list with
  documentation is `here <http://pytorch.org/docs/nn>`_.

**The only thing left to learn is:**

  - Updating the weights of the network

Update the weights 更新权值
------------------
The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):

在实践中使用的最简单的更新规则是随机梯度下降（SGD）：
     ``weight = weight - learning_rate * gradient``

We can implement this using simple python code:

我们可以使用简单的python代码来实现这一点：
.. code:: python

    learning_rate = 0.01
    for f in net.parameters():
        f.data.sub_(f.grad.data * learning_rate)

However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.

然而，当您使用神经网络时，您希望使用各种不同的更新规则，如SGD、Nesterov-SGD、Adam、rms道具等。

To enable this, we built a small package: ``torch.optim`` that implements all these methods. Using it is very simple:

为了实现这一目的，我们建造了一个小包：``torch.optim``实现了所有这些方法。使用它非常简单：

In [10]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

.. Note::

      Observe how gradient buffers had to be manually set to zero using
      ``optimizer.zero_grad()``. This is because gradients are accumulated
      as explained in `Backprop`_ section.

