In [1]:
# This mounts your Google Drive to the Colab VM.
from google.colab import drive
drive.mount('/content/drive')

# TODO: Enter the foldername in your Drive where you have saved the unzipped
# assignment folder, e.g. 'cs231n/assignments/assignment2/'
FOLDERNAME = 'cs231n_assignment2/assignment2/'
assert FOLDERNAME is not None, "[!] Enter the foldername."

# Now that we've mounted your Drive, this ensures that
# the Python interpreter of the Colab VM can load
# python files from within it.
import sys
sys.path.append('/content/drive/My Drive/{}'.format(FOLDERNAME))

# This downloads the CIFAR-10 dataset to your Drive
# if it doesn't already exist.
%cd /content/drive/My\ Drive/$FOLDERNAME/cs231n/datasets/
!bash get_datasets.sh
%cd /content/drive/My\ Drive/$FOLDERNAME

Mounted at /content/drive
/content/drive/My Drive/cs231n_assignment2/assignment2/cs231n/datasets
/content/drive/My Drive/cs231n_assignment2/assignment2


# Introduction to PyTorch

You've written a lot of code in this assignment to provide a whole host of neural network functionality. Dropout, Batch Norm, and 2D convolutions are some of the workhorses of deep learning in computer vision. You've also worked hard to make your code efficient and vectorized.

For the last part of this assignment, though, we're going to leave behind your beautiful codebase and instead migrate to one of two popular deep learning frameworks: in this instance, PyTorch.

## Why do we use deep learning frameworks?

* Our code will now run on GPUs! This will allow our models to train much faster. When using a framework like PyTorch you can harness the power of the GPU for your own custom neural network architectures without having to write CUDA code directly (which is beyond the scope of this class).
* In this class, we want you to be ready to use one of these frameworks for your project so you can experiment more efficiently than if you were writing every feature you want to use by hand.
* We want you to stand on the shoulders of giants! PyTorch is an excellent frameworks that will make your lives a lot easier, and now that you understand their guts, you are free to use them :)
* Finally, we want you to be exposed to the sort of deep learning code you might run into in academia or industry.

## What is PyTorch?

PyTorch is a system for executing dynamic computational graphs over Tensor objects that behave similarly as numpy ndarray. It comes with a powerful automatic differentiation engine that removes the need for manual back-propagation.

## How do I learn PyTorch?

One of our former instructors, Justin Johnson, made an excellent [tutorial](https://github.com/jcjohnson/pytorch-examples) for PyTorch.

You can also find the detailed [API doc](http://pytorch.org/docs/stable/index.html) here. If you have other questions that are not addressed by the API docs, the [PyTorch forum](https://discuss.pytorch.org/) is a much better place to ask than StackOverflow.

PyTorch简介
您在本作业中编写了大量代码，以提供一整套神经网络功能。Dropout、Batch Norm和2D卷积是计算机视觉中深度学习的一些主力。您还努力使代码高效和矢量化。

不过，在这项任务的最后一部分，我们将放弃你美丽的代码库，转而迁移到两个流行的深度学习框架之一：在这个例子中，PyTorch。

我们为什么要使用深度学习框架？
我们的代码现在将在GPU上运行！这将使我们的模型训练得更快。当使用像PyTorch这样的框架时，您可以利用GPU的强大功能来构建自己的自定义神经网络架构，而无需直接编写CUDA代码（这超出了本课程的范围）。
在本课程中，我们希望您准备好在项目中使用这些框架之一，这样您就可以比手工编写每个想要使用的功能更有效地进行实验。
我们希望你们站在巨人的肩膀上！PyTorch是一个优秀的框架，它将使你的生活变得更加轻松，现在你已经了解了它们的精髓，你可以自由地使用它们：）
最后，我们希望您能够接触到您在学术界或工业界可能遇到的那种深度学习代码。
PyTorch是什么？
PyTorch是一个在行为类似于numpy-ndarray的Tensor对象上执行动态计算图的系统。它配备了一个强大的自动微分引擎，消除了手动反向传播的需要。

我如何学习PyTorch？
我们的一位前讲师贾斯汀·约翰逊为PyTorch做了一个很好的教程。

您也可以在这里找到详细的API文档。如果你还有其他API文档没有解决的问题，PyTorch论坛是一个比StackOverflow更好的地方。

# Table of Contents

This assignment has 5 parts. You will learn PyTorch on **three different levels of abstraction**, which will help you understand it better and prepare you for the final project.

1. Part I, Preparation: we will use CIFAR-10 dataset.
2. Part II, Barebones PyTorch: **Abstraction level 1**, we will work directly with the lowest-level PyTorch Tensors.
3. Part III, PyTorch Module API: **Abstraction level 2**, we will use `nn.Module` to define arbitrary neural network architecture.
4. Part IV, PyTorch Sequential API: **Abstraction level 3**, we will use `nn.Sequential` to define a linear feed-forward network very conveniently.
5. Part V, CIFAR-10 open-ended challenge: please implement your own network to get as high accuracy as possible on CIFAR-10. You can experiment with any layer, optimizer, hyperparameters or other advanced features.

Here is a table of comparison:

| API           | Flexibility | Convenience |
|---------------|-------------|-------------|
| Barebone      | High        | Low         |
| `nn.Module`     | High        | Medium      |
| `nn.Sequential` | Low         | High        |

目录
这项作业有5个部分。您将在三个不同的抽象级别上学习PyTorch，这将帮助您更好地理解它，并为最终项目做好准备。

第一部分，准备：我们将使用CIFAR-10数据集。
第二部分，Barebones PyTorch：抽象级别1，我们将直接使用最低级别的PyTorch张量。
第三部分，PyTorch模块API：抽象级别2，我们将使用nn。定义任意神经网络架构的模块。
第四部分，PyTorch Sequential API：抽象级别3，我们将使用nn。顺序定义线性前馈网络非常方便。
第五部分，CIFAR-10开放式挑战：请实施您自己的网络，以在CIFAR-10上获得尽可能高的准确性。您可以尝试使用任何层、优化器、超参数或其他高级功能。
下面是一个比较表：

API灵活性便利性
裸骨高低
nn。模块高中
nn。顺序低高


# GPU

You can manually switch to a GPU device on Colab by clicking `Runtime -> Change runtime type` and selecting `GPU` under `Hardware Accelerator`. You should do this before running the following cells to import packages, since the kernel gets restarted upon switching runtimes.

GPU
您可以通过单击运行时->更改运行时类型并在硬件加速器下选择GPU，在Colab上手动切换到GPU设备。在运行以下单元格导入包之前，您应该这样做，因为内核会在切换运行时重新启动。

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np

USE_GPU = True
dtype = torch.float32 # We will be using float throughout this tutorial.

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss.
print_every = 100
print('using device:', device)

using device: cuda


# Part I. Preparation

Now, let's load the CIFAR-10 dataset. This might take a couple minutes the first time you do it, but the files should stay cached after that.

In previous parts of the assignment we had to write our own code to download the CIFAR-10 dataset, preprocess it, and iterate through it in minibatches; PyTorch provides convenient tools to automate this process for us.

第一部分准备
现在，让我们加载CIFAR-10数据集。第一次执行此操作可能需要几分钟，但之后文件应保持缓存状态。

在任务的前几部分中，我们必须编写自己的代码来下载CIFAR-10数据集，对其进行预处理，并以小批量迭代；PyTorch为我们提供了便利的工具来自动化这一过程。

In [3]:
NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
#torchvision.transforms包提供了预处理数据的工具
#并用于执行数据增强；在这里，我们设置了一个转换
#通过减去RGB平均值并除以
#每个RGB值的标准偏差；我们已经硬编码了mean和std。
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
#我们为每个分割（train/val/test）设置了一个Dataset对象；数据集加载
#一次训练一个示例，因此我们将每个数据集包装在DataLoader中
#迭代数据集并形成小批量。我们划分CIFAR-10
#通过将Sampler对象传递给
#DataLoader告诉它应该如何从底层数据集中采样。
cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64,
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64,
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True,
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified


# Part II. Barebones PyTorch

PyTorch ships with high-level APIs to help us define model architectures conveniently, which we will cover in Part II of this tutorial. In this section, we will start with the barebone PyTorch elements to understand the autograd engine better. After this exercise, you will come to appreciate the high-level model API more.

We will start with a simple fully-connected ReLU network with two hidden layers and no biases for CIFAR classification.
This implementation computes the forward pass using operations on PyTorch Tensors, and uses PyTorch autograd to compute gradients. It is important that you understand every line, because you will write a harder version after the example.

When we create a PyTorch Tensor with `requires_grad=True`, then operations involving that Tensor will not just compute values; they will also build up a computational graph in the background, allowing us to easily backpropagate through the graph to compute gradients of some Tensors with respect to a downstream loss. Concretely if x is a Tensor with `x.requires_grad == True` then after backpropagation `x.grad` will be another Tensor holding the gradient of x with respect to the scalar loss at the end.

第二部分。裸骨PyTorch
PyTorch附带了高级API，以帮助我们方便地定义模型架构，我们将在本教程的第二部分中介绍。在本节中，我们将从PyTorch的基本元素开始，以便更好地理解自签名引擎。在本练习之后，您将更加欣赏高级模型API。

我们将从一个简单的完全连接的ReLU网络开始，该网络有两个隐藏层，对CIFAR分类没有偏见。此实现使用对PyTorch张量的操作来计算前向传递，并使用PyTorch签名来计算梯度。理解每一行很重要，因为在示例之后你会写一个更难的版本。

当我们创建一个requires_grad=True的PyTorch张量时，涉及该张量的操作将不仅仅是计算值；他们还将在后台构建一个计算图，使我们能够轻松地通过该图进行反向传播，以计算一些张量相对于下游损失的梯度。具体来说，如果x是一个具有x.requires_grad==True的张量，那么在反向传播后，x.grad将是另一个张量，它相对于末尾的标量损失保持x的梯度。

### PyTorch Tensors: Flatten Function
A PyTorch Tensor is conceptionally similar to a numpy array: it is an n-dimensional grid of numbers, and like numpy PyTorch provides many functions to efficiently operate on Tensors. As a simple example, we provide a `flatten` function below which reshapes image data for use in a fully-connected neural network.

Recall that image data is typically stored in a Tensor of shape N x C x H x W, where:

* N is the number of datapoints
* C is the number of channels
* H is the height of the intermediate feature map in pixels
* W is the height of the intermediate feature map in pixels

This is the right way to represent the data when we are doing something like a 2D convolution, that needs spatial understanding of where the intermediate features are relative to each other. When we use fully connected affine layers to process the image, however, we want each datapoint to be represented by a single vector -- it's no longer useful to segregate the different channels, rows, and columns of the data. So, we use a "flatten" operation to collapse the `C x H x W` values per representation into a single long vector. The flatten function below first reads in the N, C, H, and W values from a given batch of data, and then returns a "view" of that data. "View" is analogous to numpy's "reshape" method: it reshapes x's dimensions to be N x ??, where ?? is allowed to be anything (in this case, it will be C x H x W, but we don't need to specify that explicitly).

PyTorch张量：平坦函数
PyTorch张量在概念上类似于numpy数组：它是一个n维数字网格，与numpy一样，PyTorch提供了许多函数来有效地操作张量。作为一个简单的例子，我们提供了一个扁平化函数，在该函数下，图像数据被重塑以用于全连接的神经网络。

回想一下，图像数据通常存储在形状为N x C x H x W的张量中，其中：

N是数据点的数量
C是通道数
H是中间特征图的高度（像素）
W是中间特征图的高度，单位为像素
当我们进行2D卷积时，这是表示数据的正确方法，这需要对中间特征之间的相对位置进行空间理解。然而，当我们使用完全连接的仿射层来处理图像时，我们希望每个数据点都由一个向量表示——分离数据的不同通道、行和列不再有用。因此，我们使用“展开”操作将每个表示的C x H x W值折叠成一个长向量。下面的flat函数首先从给定的数据批中读取N、C、H和W值，然后返回该数据的“视图”。“View”类似于numpy的“reshape”方法：它将x的维度重塑为N x？？，哪里？？允许为任何值（在这种情况下，它将是C x H x W，但我们不需要明确指定）。

In [4]:
def flatten(x):
    N = x.shape[0] # read in N, C, H, W
    return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image

def test_flatten():
    x = torch.arange(12).view(2, 1, 3, 2)
    print('Before flattening: ', x)
    print('After flattening: ', flatten(x))

test_flatten()

Before flattening:  tensor([[[[ 0,  1],
          [ 2,  3],
          [ 4,  5]]],


        [[[ 6,  7],
          [ 8,  9],
          [10, 11]]]])
After flattening:  tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]])


### Barebones PyTorch: Two-Layer Network

Here we define a function `two_layer_fc` which performs the forward pass of a two-layer fully-connected ReLU network on a batch of image data. After defining the forward pass we check that it doesn't crash and that it produces outputs of the right shape by running zeros through the network.

You don't have to write any code here, but it's important that you read and understand the implementation.

PyTorch：两层网络
在这里，我们定义了一个函数two_layer_fc，它对一批图像数据执行两层完全连接的ReLU网络的前向传递。定义前向传递后，我们检查它是否崩溃，以及它是否通过在网络中运行零来产生正确形状的输出。

您不必在这里编写任何代码，但阅读和理解实现很重要。

In [5]:
import torch.nn.functional as F  # useful stateless functions

def two_layer_fc(x, params):
    """
    A fully-connected neural networks; the architecture is:
    NN is fully connected -> ReLU -> fully connected layer.
    Note that this function only defines the forward pass;
    PyTorch will take care of the backward pass for us.

    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.

    Inputs:
    - x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
      w1 has shape (D, H) and w2 has shape (H, C).

    Returns:
    - scores: A PyTorch Tensor of shape (N, C) giving classification scores for
      the input data x.
      一个完全连接的神经网络；该架构是：
NN是全连接层->ReLU->全连接层。
请注意，此函数仅定义正向传递；
PyTorch将为我们处理后向通行证。

网络的输入将是形状各异的小批量数据
（N，d1，…，dM）其中d1*…*dM=D。隐藏层将具有H单位，
输出层将为C类生成分数。

输入：
-x：一个形状为（N，d1，…，dM）的PyTorch张量，给出一个小批量
输入数据。
-params:PyTorch张量的列表[w1，w2]，为网络赋予权重；
w1具有形状（D，H），w2具有形状（H，C）。

退货：
-scores：一个形状为（N，C）的PyTorch张量，给出分类分数
输入数据x。

    """
    # first we flatten the image
    x = flatten(x)  # shape: [batch_size, C x H x W]

    w1, w2 = params

    # Forward pass: compute predicted y using operations on Tensors. Since w1 and
    # w2 have requires_grad=True, operations involving these Tensors will cause
    # PyTorch to build a computational graph, allowing automatic computation of
    # gradients. Since we are no longer implementing the backward pass by hand we
    # don't need to keep references to intermediate values.
    # you can also use `.clamp(min=0)`, equivalent to F.relu()
    #正向传递：使用张量运算计算预测的y。由于w1和
#w2的requires_grad=True，涉及这些张量的操作将导致
#PyTorch构建计算图，允许自动计算
#梯度。由于我们不再实施手工倒传，我们
#不需要保留对中间值的引用。
#你也可以使用`.camp（min=0）`，相当于F.relu（）
    x = F.relu(x.mm(w1))
    x = x.mm(w2)
    return x


def two_layer_fc_test():
    hidden_layer_size = 42
    x = torch.zeros((64, 50), dtype=dtype)  # minibatch size 64, feature dimension 50
    w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)
    w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)
    scores = two_layer_fc(x, [w1, w2])
    print(scores.size())  # you should see [64, 10]

two_layer_fc_test()

torch.Size([64, 10])


### Barebones PyTorch: Three-Layer ConvNet

Here you will complete the implementation of the function `three_layer_convnet`, which will perform the forward pass of a three-layer convolutional network. Like above, we can immediately test our implementation by passing zeros through the network. The network should have the following architecture:

1. A convolutional layer (with bias) with `channel_1` filters, each with shape `KW1 x KH1`, and zero-padding of two
2. ReLU nonlinearity
3. A convolutional layer (with bias) with `channel_2` filters, each with shape `KW2 x KH2`, and zero-padding of one
4. ReLU nonlinearity
5. Fully-connected layer with bias, producing scores for C classes.

Note that we have **no softmax activation** here after our fully-connected layer: this is because PyTorch's cross entropy loss performs a softmax activation for you, and by bundling that step in makes computation more efficient.

**HINT**: For convolutions: http://pytorch.org/docs/stable/nn.html#torch.nn.functional.conv2d; pay attention to the shapes of convolutional filters!

Barebones PyTorch：三层卷积神经网络
在这里，您将完成函数three_layer_convnet的实现，该函数将执行三层卷积网络的前向传递。如上所述，我们可以通过在网络中传递零来立即测试我们的实现。网络应该具有以下架构：

带有channel_1滤波器的卷积层（带偏置），每个滤波器的形状为KW1 x KH1，零填充为2
ReLU非线性
带有channel_2滤波器的卷积层（带偏置），每个滤波器的形状为KW2 x KH2，零填充为1
ReLU非线性
带有偏见的完全连接层，为C类产生分数。
请注意，在我们的全连接层之后，这里没有softmax激活：这是因为PyTorch的交叉熵损失为您执行了softmax激活，并且通过捆绑该步骤使计算更高效。

提示：对于卷积：http://pytorch.org/docs/stable/nn.html#torch.nn.functional.conv2d；注意卷积滤波器的形状！

In [6]:
def three_layer_convnet(x, params):
    """
    Performs the forward pass of a three-layer convolutional network with the
    architecture defined above.

    Inputs:
    - x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
    - params: A list of PyTorch Tensors giving the weights and biases for the
      network; should contain the following:
      - conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
        for the first convolutional layer
      - conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
        convolutional layer
      - conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
        weights for the second convolutional layer
      - conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
        convolutional layer
      - fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
        figure out what the shape should be?
      - fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
        figure out what the shape should be?

    Returns:
    - scores: PyTorch Tensor of shape (N, C) giving classification scores for x

使用以下命令执行三层卷积网络的前向传递
上面定义的架构。

输入：
-x：一个形状为（N，3，H，W）的PyTorch张量，给出一小批图像
-params：一个PyTorch张量列表，给出了
网络；应包含以下内容：
-conv_w1:给出权重的形状为（channel_1，3，KH1，KW1）的PyTorch张量
对于第一卷积层
-conv_b1：形状为（channel_1，）的PyTorch张量，给出第一个张量的偏差
卷积层
-conv_w2:PyTorch张量的形状（channel_2，channel_1，KH2，KW2）给出
第二卷积层的权重
-conv_b2：形状为（channel_2）的PyTorch张量，给出第二个张量的偏差
卷积层
-fc_w:PyTorch张量为全连接层赋予权重。你能
弄清楚应该是什么形状？
-fc_b:PyTorch张量给出了全连接层的偏差。你能
弄清楚应该是什么形状？

退货：
-scores：形状为（N，C）的PyTorch张量给出x的分类分数
    """
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None
    ################################################################################
    # TODO: Implement the forward pass for the three-layer ConvNet.                #
    ################################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    x=F.conv2d(x,conv_w1,conv_b1,padding=2)
    x=F.relu(x)
    x=F.conv2d(x,conv_w2,conv_b2,padding=1)
    x=F.relu(x)
    x=flatten(x)
    scores=x.mm(fc_w)+fc_b

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ################################################################################
    #                                 END OF YOUR CODE                             #
    ################################################################################
    return scores

After defining the forward pass of the ConvNet above, run the following cell to test your implementation.

When you run this function, scores should have shape (64, 10).

在定义了上述ConvNet的前向传递后，运行以下单元格以测试您的实现。

当你运行这个函数时，分数应该有形状（64，10）。

In [7]:
def three_layer_convnet_test():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]

    conv_w1 = torch.zeros((6, 3, 5, 5), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b1 = torch.zeros((6,))  # out_channel
    conv_w2 = torch.zeros((9, 6, 3, 3), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b2 = torch.zeros((9,))  # out_channel

    # you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
    fc_w = torch.zeros((9 * 32 * 32, 10))
    fc_b = torch.zeros(10)

    scores = three_layer_convnet(x, [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b])
    print(scores.size())  # you should see [64, 10]
three_layer_convnet_test()

torch.Size([64, 10])


### Barebones PyTorch: Initialization
Let's write a couple utility methods to initialize the weight matrices for our models.

- `random_weight(shape)` initializes a weight tensor with the Kaiming normalization method.
- `zero_weight(shape)` initializes a weight tensor with all zeros. Useful for instantiating bias parameters.

The `random_weight` function uses the Kaiming normal initialization method, described in:

He et al, *Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification*, ICCV 2015, https://arxiv.org/abs/1502.01852

PyTorch：初始化
让我们编写几个实用方法来初始化模型的权重矩阵。

random_weight（shape）使用Kaiming归一化方法初始化权重张量。
zero_weight（shape）初始化一个全为零的权重张量。可用于实例化偏差参数。
random_weight函数使用Kaiming正常初始化方法，如中所述：

他等人，《深入研究校正器：在ImageNet分类上超越人类水平的性能》，ICCV 2015，https://arxiv.org/abs/1502.01852

In [8]:
def random_weight(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Kaiming normalization: sqrt(2 / fan_in)
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[0]
    else:
        fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
    # randn is standard normal distribution generator.
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
    w.requires_grad = True
    return w

def zero_weight(shape):
    return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)

# create a weight of shape [3 x 5]
# you should see the type `torch.cuda.FloatTensor` if you use GPU.
# Otherwise it should be `torch.FloatTensor`
random_weight((3, 5))

tensor([[ 0.1619, -0.5494, -0.3715,  0.5747, -0.7230],
        [-1.0305, -1.3565, -1.1575,  0.3873, -0.2423],
        [-0.8059,  0.0448, -0.2051, -0.6405, -0.4370]], device='cuda:0',
       requires_grad=True)

### Barebones PyTorch: Check Accuracy
When training the model we will use the following function to check the accuracy of our model on the training or validation sets.

When checking accuracy we don't need to compute any gradients; as a result we don't need PyTorch to build a computational graph for us when we compute scores. To prevent a graph from being built we scope our computation under a `torch.no_grad()` context manager.

In [9]:
def check_accuracy_part2(loader, model_fn, params):
    """
    Check the accuracy of a classification model.

    Inputs:
    - loader: A DataLoader for the data split we want to check
    - model_fn: A function that performs the forward pass of the model,
      with the signature scores = model_fn(x, params)
    - params: List of PyTorch Tensors giving parameters of the model

    Returns: Nothing, but prints the accuracy of the model
    """
    split = 'val' if loader.dataset.train else 'test'
    print('Checking accuracy on the %s set' % split)
    num_correct, num_samples = 0, 0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.int64)
            scores = model_fn(x, params)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))

### BareBones PyTorch: Training Loop
We can now set up a basic training loop to train our network. We will train the model using stochastic gradient descent without momentum. We will use `torch.functional.cross_entropy` to compute the loss; you can [read about it here](http://pytorch.org/docs/stable/nn.html#cross-entropy).

The training loop takes as input the neural network function, a list of initialized parameters (`[w1, w2]` in our example), and learning rate.

In [10]:
def train_part2(model_fn, params, learning_rate):
    """
    Train a model on CIFAR-10.

    Inputs:
    - model_fn: A Python function that performs the forward pass of the model.
      It should have the signature scores = model_fn(x, params) where x is a
      PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
      model weights, and scores is a PyTorch Tensor of shape (N, C) giving
      scores for the elements in x.
    - params: List of PyTorch Tensors giving weights for the model
    - learning_rate: Python scalar giving the learning rate to use for SGD

    Returns: Nothing
    """
    for t, (x, y) in enumerate(loader_train):
        # Move the data to the proper device (GPU or CPU)
        x = x.to(device=device, dtype=dtype)
        y = y.to(device=device, dtype=torch.long)

        # Forward pass: compute scores and loss
        scores = model_fn(x, params)
        loss = F.cross_entropy(scores, y)

        # Backward pass: PyTorch figures out which Tensors in the computational
        # graph has requires_grad=True and uses backpropagation to compute the
        # gradient of the loss with respect to these Tensors, and stores the
        # gradients in the .grad attribute of each Tensor.
        loss.backward()

        # Update parameters. We don't want to backpropagate through the
        # parameter updates, so we scope the updates under a torch.no_grad()
        # context manager to prevent a computational graph from being built.
        with torch.no_grad():
            for w in params:
                w -= learning_rate * w.grad

                # Manually zero the gradients after running the backward pass
                w.grad.zero_()

        if t % print_every == 0:
            print('Iteration %d, loss = %.4f' % (t, loss.item()))
            check_accuracy_part2(loader_val, model_fn, params)
            print()

### BareBones PyTorch: Train a Two-Layer Network
Now we are ready to run the training loop. We need to explicitly allocate tensors for the fully connected weights, `w1` and `w2`.

Each minibatch of CIFAR has 64 examples, so the tensor shape is `[64, 3, 32, 32]`.

After flattening, `x` shape should be `[64, 3 * 32 * 32]`. This will be the size of the first dimension of `w1`.
The second dimension of `w1` is the hidden layer size, which will also be the first dimension of `w2`.

Finally, the output of the network is a 10-dimensional vector that represents the probability distribution over 10 classes.

You don't need to tune any hyperparameters but you should see accuracies above 40% after training for one epoch.

BareBones PyTorch：训练一个两层网络
现在我们准备运行训练循环。我们需要显式地为全连接权重w1和w2分配张量。

CIFAR的每个小批量有64个示例，因此张量形状为[64,3,32,32]。

压平后，x形状应为[64,3*32*32]。这将是w1的第一维度的大小。w1的第二个维度是隐藏层大小，这也是w2的第一个维度。

最后，网络的输出是一个10维向量，表示10个类别上的概率分布。

你不需要调整任何超参数，但在训练一个历元后，你应该看到准确率超过40%。

In [11]:
hidden_layer_size = 4000
learning_rate = 1e-2

w1 = random_weight((3 * 32 * 32, hidden_layer_size))
w2 = random_weight((hidden_layer_size, 10))

train_part2(two_layer_fc, [w1, w2], learning_rate)

Iteration 0, loss = 3.9242
Checking accuracy on the val set
Got 149 / 1000 correct (14.90%)

Iteration 100, loss = 2.6173
Checking accuracy on the val set
Got 336 / 1000 correct (33.60%)

Iteration 200, loss = 2.0207
Checking accuracy on the val set
Got 337 / 1000 correct (33.70%)

Iteration 300, loss = 2.1644
Checking accuracy on the val set
Got 403 / 1000 correct (40.30%)

Iteration 400, loss = 1.5509
Checking accuracy on the val set
Got 380 / 1000 correct (38.00%)

Iteration 500, loss = 1.5550
Checking accuracy on the val set
Got 395 / 1000 correct (39.50%)

Iteration 600, loss = 2.0234
Checking accuracy on the val set
Got 426 / 1000 correct (42.60%)

Iteration 700, loss = 1.8338
Checking accuracy on the val set
Got 401 / 1000 correct (40.10%)



### BareBones PyTorch: Training a ConvNet

In the below you should use the functions defined above to train a three-layer convolutional network on CIFAR. The network should have the following architecture:

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. Fully-connected layer (with bias) to compute scores for 10 classes

You should initialize your weight matrices using the `random_weight` function defined above, and you should initialize your bias vectors using the `zero_weight` function above.

You don't need to tune any hyperparameters, but if everything works correctly you should achieve an accuracy above 42% after one epoch.

BareBones PyTorch：训练卷积神经网络
在下面，您应该使用上面定义的函数在CIFAR上训练一个三层卷积网络。网络应具有以下架构：

带有32个5x5滤波器的卷积层（带偏置），零填充为2
ReLU
带有16个3x3滤波器的卷积层（带偏置），零填充为1
ReLU
全连接层（有偏差）计算10个班级的分数
您应该使用上面定义的random_weight函数初始化权重矩阵，并且应该使用上面的zero_weight功能初始化偏差向量。

你不需要调整任何超参数，但如果一切正常，你应该在一个历元后达到42%以上的精度。

In [12]:
learning_rate = 3e-3

channel_1 = 32
channel_2 = 16

conv_w1 = None
conv_b1 = None
conv_w2 = None
conv_b2 = None
fc_w = None
fc_b = None

################################################################################
# TODO: Initialize the parameters of a three-layer ConvNet.                    #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

conv_w1=random_weight((channel_1,3,5,5))
conv_b1=zero_weight(channel_1)
conv_w2=random_weight((channel_2,channel_1,3,3))
conv_b2=zero_weight(channel_2)
fc_w=random_weight((channel_2*32*32,10))
fc_b=zero_weight(10)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)

Iteration 0, loss = 2.6943
Checking accuracy on the val set
Got 122 / 1000 correct (12.20%)

Iteration 100, loss = 1.8906
Checking accuracy on the val set
Got 351 / 1000 correct (35.10%)

Iteration 200, loss = 1.6444
Checking accuracy on the val set
Got 400 / 1000 correct (40.00%)

Iteration 300, loss = 1.5037
Checking accuracy on the val set
Got 418 / 1000 correct (41.80%)

Iteration 400, loss = 1.6900
Checking accuracy on the val set
Got 434 / 1000 correct (43.40%)

Iteration 500, loss = 1.5775
Checking accuracy on the val set
Got 457 / 1000 correct (45.70%)

Iteration 600, loss = 1.6780
Checking accuracy on the val set
Got 456 / 1000 correct (45.60%)

Iteration 700, loss = 1.4224
Checking accuracy on the val set
Got 468 / 1000 correct (46.80%)



# Part III. PyTorch Module API

Barebone PyTorch requires that we track all the parameter tensors by hand. This is fine for small networks with a few tensors, but it would be extremely inconvenient and error-prone to track tens or hundreds of tensors in larger networks.

PyTorch provides the `nn.Module` API for you to define arbitrary network architectures, while tracking every learnable parameters for you. In Part II, we implemented SGD ourselves. PyTorch also provides the `torch.optim` package that implements all the common optimizers, such as RMSProp, Adagrad, and Adam. It even supports approximate second-order methods like L-BFGS! You can refer to the [doc](http://pytorch.org/docs/master/optim.html) for the exact specifications of each optimizer.

To use the Module API, follow the steps below:

1. Subclass `nn.Module`. Give your network class an intuitive name like `TwoLayerFC`.

2. In the constructor `__init__()`, define all the layers you need as class attributes. Layer objects like `nn.Linear` and `nn.Conv2d` are themselves `nn.Module` subclasses and contain learnable parameters, so that you don't have to instantiate the raw tensors yourself. `nn.Module` will track these internal parameters for you. Refer to the [doc](http://pytorch.org/docs/master/nn.html) to learn more about the dozens of builtin layers. **Warning**: don't forget to call the `super().__init__()` first!

3. In the `forward()` method, define the *connectivity* of your network. You should use the attributes defined in `__init__` as function calls that take tensor as input and output the "transformed" tensor. Do *not* create any new layers with learnable parameters in `forward()`! All of them must be declared upfront in `__init__`.

After you define your Module subclass, you can instantiate it as an object and call it just like the NN forward function in part II.

### Module API: Two-Layer Network
Here is a concrete example of a 2-layer fully connected network:

#第三部分API PyTorch模块

Barebone PyTorch要求我们手动跟踪所有参数张量。这对于具有少量张量的小型网络来说是可以的，但在较大的网络中跟踪数十或数百个张量将非常不方便且容易出错。

PyTorch提供`nn。API模块可供您定义任意网络架构，同时为您跟踪每个可学习的参数。在第二部分中，我们自己实现了SGD。PyTorch还提供了“torch.optim”包，该包实现了所有常见的优化器，如RMSProp、Adagrad和Adam。它甚至支持像L-BFGS这样的近似二阶方法！你可以参考[文档](http://pytorch.org/docs/master/optim.html)了解每个优化器的确切规格。

要使用模块API，请按照以下步骤进行操作：

1.子类`nn。模块`。给你的网络类一个直观的名称，比如“TwoLayerFC”。

2.在构造函数`__init__（）`中，将所需的所有层定义为类属性。像`nn这样的图层对象。线性和nn。Conv2d本身就是nn。Module的子类包含可学习的参数，因此您不必自己实例化原始张量`nn。模块将为您跟踪这些内部参数。请参阅[文件](http://pytorch.org/docs/master/nn.html)了解有关数十个内置层的更多信息**警告**：别忘了调用`super（）__init__（）`先！

3.在`forward（）`方法中，定义网络的*连通性*。您应该使用`__init__`中定义的属性作为函数调用，将张量作为输入并输出“转换”后的张量。不要在`forward（）`中创建任何具有可学习参数的新层！所有这些都必须在`__init__`中预先声明。

定义Module子类后，可以将其实例化为对象，并像第二部分中的NN forward函数一样调用它。

###模块API：双层网络
以下是一个2层全连接网络的具体示例：

In [13]:
class TwoLayerFC(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        # assign layer objects to class attributes
        self.fc1 = nn.Linear(input_size, hidden_size)
        # nn.init package contains convenient initialization methods
        # http://pytorch.org/docs/master/nn.html#torch-nn-init
        nn.init.kaiming_normal_(self.fc1.weight)
        self.fc2 = nn.Linear(hidden_size, num_classes)
        nn.init.kaiming_normal_(self.fc2.weight)

    def forward(self, x):
        # forward always defines connectivity
        x = flatten(x)
        scores = self.fc2(F.relu(self.fc1(x)))
        return scores

def test_TwoLayerFC():
    input_size = 50
    x = torch.zeros((64, input_size), dtype=dtype)  # minibatch size 64, feature dimension 50
    model = TwoLayerFC(input_size, 42, 10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_TwoLayerFC()

torch.Size([64, 10])


### Module API: Three-Layer ConvNet
It's your turn to implement a 3-layer ConvNet followed by a fully connected layer. The network architecture should be the same as in Part II:

1. Convolutional layer with `channel_1` 5x5 filters with zero-padding of 2
2. ReLU
3. Convolutional layer with `channel_2` 3x3 filters with zero-padding of 1
4. ReLU
5. Fully-connected layer to `num_classes` classes

You should initialize the weight matrices of the model using the Kaiming normal initialization method.

**HINT**: http://pytorch.org/docs/stable/nn.html#conv2d

After you implement the three-layer ConvNet, the `test_ThreeLayerConvNet` function will run your implementation; it should print `(64, 10)` for the shape of the output scores.

现在轮到你实现一个3层ConvNet，然后是一个完全连接的层。网络架构应与第二部分相同：

带有通道_1 5x5滤波器的卷积层，零填充为2
ReLU
带有通道2 3x3滤波器的卷积层，零填充为1
ReLU
与num_classes类完全连接的层
您应该使用Kaiming正常初始化方法初始化模型的权重矩阵。

提示：http://pytorch.org/docs/stable/nn.html#conv2d

实现三层ConvNet后，test_ThreeLayerConvNet函数将运行您的实现；它应该打印（64，10）输出分数的形状。

In [17]:
class ThreeLayerConvNet(nn.Module):
    def __init__(self, in_channel, channel_1, channel_2, num_classes):
        super().__init__()
        ########################################################################
        # TODO: Set up the layers you need for a three-layer ConvNet with the  #
        # architecture defined above.                                          #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        self.conv1=nn.Conv2d(in_channel,channel_1,kernel_size=5,padding=2)
        self.conv2=nn.Conv2d(channel_1,channel_2,kernel_size=3,padding=1)
        self.fc3=nn.Linear(channel_2*32*32,num_classes)
        nn.init.kaiming_normal_(self.conv1.weight)
        nn.init.kaiming_normal_(self.conv2.weight)
        nn.init.kaiming_normal_(self.fc3.weight)

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                          END OF YOUR CODE                            #
        ########################################################################

    def forward(self, x):
        scores = None
        ########################################################################
        # TODO: Implement the forward function for a 3-layer ConvNet. you      #
        # should use the layers you defined in __init__ and specify the        #
        # connectivity of those layers in forward()                            #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        x=F.relu(self.conv1(x))
        x=F.relu(self.conv2(x))
        x=flatten(x)
        scores=self.fc3(x)


        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################
        return scores


def test_ThreeLayerConvNet():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]
    model = ThreeLayerConvNet(in_channel=3, channel_1=12, channel_2=8, num_classes=10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_ThreeLayerConvNet()

torch.Size([64, 10])


### Module API: Check Accuracy
Given the validation or test set, we can check the classification accuracy of a neural network.

This version is slightly different from the one in part II. You don't manually pass in the parameters anymore.

In [18]:
def check_accuracy_part34(loader, model):
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

### Module API: Training Loop
We also use a slightly different training loop. Rather than updating the values of the weights ourselves, we use an Optimizer object from the `torch.optim` package, which abstract the notion of an optimization algorithm and provides implementations of most of the algorithms commonly used to optimize neural networks.

模块API：培训循环
我们还使用了略有不同的训练循环。我们使用torch.optim包中的Optimizer对象，而不是自己更新权重值，该对象抽象了优化算法的概念，并提供了大多数常用于优化神经网络的算法的实现。

In [19]:
def train_part34(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.

    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for

    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss = F.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            if t % print_every == 0:
                print('Iteration %d, loss = %.4f' % (t, loss.item()))
                check_accuracy_part34(loader_val, model)
                print()

### Module API: Train a Two-Layer Network
Now we are ready to run the training loop. In contrast to part II, we don't explicitly allocate parameter tensors anymore.

Simply pass the input size, hidden layer size, and number of classes (i.e. output size) to the constructor of `TwoLayerFC`.

You also need to define an optimizer that tracks all the learnable parameters inside `TwoLayerFC`.

You don't need to tune any hyperparameters, but you should see model accuracies above 40% after training for one epoch.

模块API：训练两层网络
现在我们准备运行训练循环。与第二部分不同，我们不再显式分配参数张量。

只需将输入大小、隐藏层大小和类数（即输出大小）传递给TwoLayerFC的构造函数。

您还需要定义一个优化器，跟踪TwoLayerFC中所有可学习的参数。

你不需要调整任何超参数，但在训练一个历元后，你应该看到模型的准确率超过40%。

In [20]:
hidden_layer_size = 4000
learning_rate = 1e-2
model = TwoLayerFC(3 * 32 * 32, hidden_layer_size, 10)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

train_part34(model, optimizer)

Iteration 0, loss = 3.1534
Checking accuracy on validation set
Got 121 / 1000 correct (12.10)

Iteration 100, loss = 2.3903
Checking accuracy on validation set
Got 356 / 1000 correct (35.60)

Iteration 200, loss = 2.0580
Checking accuracy on validation set
Got 379 / 1000 correct (37.90)

Iteration 300, loss = 1.8311
Checking accuracy on validation set
Got 404 / 1000 correct (40.40)

Iteration 400, loss = 1.9012
Checking accuracy on validation set
Got 368 / 1000 correct (36.80)

Iteration 500, loss = 1.8962
Checking accuracy on validation set
Got 419 / 1000 correct (41.90)

Iteration 600, loss = 1.9353
Checking accuracy on validation set
Got 375 / 1000 correct (37.50)

Iteration 700, loss = 1.8701
Checking accuracy on validation set
Got 421 / 1000 correct (42.10)



### Module API: Train a Three-Layer ConvNet
You should now use the Module API to train a three-layer ConvNet on CIFAR. This should look very similar to training the two-layer network! You don't need to tune any hyperparameters, but you should achieve above above 45% after training for one epoch.

You should train the model using stochastic gradient descent without momentum.

您现在应该使用模块API来训练CIFAR上的三层ConvNet。这应该与训练两层网络非常相似！你不需要调整任何超参数，但在训练一个历元后，你应该达到45%以上。

你应该使用没有动量的随机梯度下降来训练模型。

In [22]:
learning_rate = 3e-3
channel_1 = 32
channel_2 = 16

model = None
optimizer = None
################################################################################
# TODO: Instantiate your ThreeLayerConvNet model and a corresponding optimizer #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

model=ThreeLayerConvNet(3,channel_1,channel_2,num_classes=10)
optimizer=optim.SGD(model.parameters(),lr=learning_rate)


# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

train_part34(model, optimizer)

Iteration 0, loss = 3.3212
Checking accuracy on validation set
Got 119 / 1000 correct (11.90)

Iteration 100, loss = 1.9465
Checking accuracy on validation set
Got 353 / 1000 correct (35.30)

Iteration 200, loss = 1.6664
Checking accuracy on validation set
Got 374 / 1000 correct (37.40)

Iteration 300, loss = 1.7595
Checking accuracy on validation set
Got 420 / 1000 correct (42.00)

Iteration 400, loss = 1.3971
Checking accuracy on validation set
Got 431 / 1000 correct (43.10)

Iteration 500, loss = 1.5881
Checking accuracy on validation set
Got 415 / 1000 correct (41.50)

Iteration 600, loss = 1.6327
Checking accuracy on validation set
Got 470 / 1000 correct (47.00)

Iteration 700, loss = 1.7299
Checking accuracy on validation set
Got 480 / 1000 correct (48.00)



# Part IV. PyTorch Sequential API

Part III introduced the PyTorch Module API, which allows you to define arbitrary learnable layers and their connectivity.

For simple models like a stack of feed forward layers, you still need to go through 3 steps: subclass `nn.Module`, assign layers to class attributes in `__init__`, and call each layer one by one in `forward()`. Is there a more convenient way?

Fortunately, PyTorch provides a container Module called `nn.Sequential`, which merges the above steps into one. It is not as flexible as `nn.Module`, because you cannot specify more complex topology than a feed-forward stack, but it's good enough for many use cases.

### Sequential API: Two-Layer Network
Let's see how to rewrite our two-layer fully connected network example with `nn.Sequential`, and train it using the training loop defined above.

Again, you don't need to tune any hyperparameters here, but you shoud achieve above 40% accuracy after one epoch of training.

第四部分PyTorch Sequential API
第三部分介绍了PyTorch模块API，它允许您定义任意可学习层及其连接。

对于像一堆前馈层这样的简单模型，你仍然需要经历3个步骤：子类nn。模块，在__init__中为类属性分配层，并在forward（）中逐一调用每一层。有更方便的方法吗？

幸运的是，PyTorch提供了一个名为nn的容器模块。顺序，将上述步骤合并为一个步骤。它不像nn那样灵活。模块，因为您不能指定比前馈堆栈更复杂的拓扑，但它对于许多用例来说已经足够好了。

顺序API：双层网络
让我们看看如何用nn重写我们的两层全连接网络示例。顺序训练，并使用上面定义的训练循环对其进行训练。

同样，你不需要在这里调整任何超参数，但在一个训练周期后，你应该达到40%以上的准确率。

In [23]:
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)

hidden_layer_size = 4000
learning_rate = 1e-2

model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 32 * 32, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 10),
)

# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer)

Iteration 0, loss = 2.3247
Checking accuracy on validation set
Got 145 / 1000 correct (14.50)

Iteration 100, loss = 1.7268
Checking accuracy on validation set
Got 410 / 1000 correct (41.00)

Iteration 200, loss = 1.7785
Checking accuracy on validation set
Got 414 / 1000 correct (41.40)

Iteration 300, loss = 1.6615
Checking accuracy on validation set
Got 400 / 1000 correct (40.00)

Iteration 400, loss = 1.5865
Checking accuracy on validation set
Got 429 / 1000 correct (42.90)

Iteration 500, loss = 1.8317
Checking accuracy on validation set
Got 419 / 1000 correct (41.90)

Iteration 600, loss = 1.7307
Checking accuracy on validation set
Got 439 / 1000 correct (43.90)

Iteration 700, loss = 1.9852
Checking accuracy on validation set
Got 462 / 1000 correct (46.20)



### Sequential API: Three-Layer ConvNet
Here you should use `nn.Sequential` to define and train a three-layer ConvNet with the same architecture we used in Part III:

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. Fully-connected layer (with bias) to compute scores for 10 classes

You can use the default PyTorch weight initialization.

You should optimize your model using stochastic gradient descent with Nesterov momentum 0.9.

Again, you don't need to tune any hyperparameters but you should see accuracy above 55% after one epoch of training.

顺序API：三层ConvNet
在这里，你应该使用nn。按顺序定义和训练一个三层ConvNet，其架构与我们在第三部分中使用的相同：

带有32个5x5滤波器的卷积层（带偏置），零填充为2
ReLU
带有16个3x3滤波器的卷积层（带偏置），零填充为1
ReLU
全连接层（有偏差）计算10个班级的分数
您可以使用默认的PyTorch权重初始化。

你应该使用Nesterov动量为0.9的随机梯度下降来优化你的模型。

同样，你不需要调整任何超参数，但在一个训练周期后，你应该看到准确率超过55%。

In [30]:
channel_1 = 32
channel_2 = 16
learning_rate = 1e-2

model = None
optimizer = None

################################################################################
# TODO: Rewrite the 3-layer ConvNet with bias from Part III with the           #
# Sequential API.                                                              #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

model=nn.Sequential(
    nn.Conv2d(3,channel_1,5,padding=2),
    nn.ReLU(),
    nn.Conv2d(channel_1,channel_2,3,padding=1),
    nn.ReLU(),
    Flatten(),
    nn.Linear(channel_2*32*32,10)
)
optimizer=optim.SGD(model.parameters(),lr=learning_rate,momentum=0.9,nesterov=True)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

train_part34(model, optimizer)

Iteration 0, loss = 2.3195
Checking accuracy on validation set
Got 146 / 1000 correct (14.60)

Iteration 100, loss = 1.4846
Checking accuracy on validation set
Got 476 / 1000 correct (47.60)

Iteration 200, loss = 1.3525
Checking accuracy on validation set
Got 494 / 1000 correct (49.40)

Iteration 300, loss = 1.3173
Checking accuracy on validation set
Got 540 / 1000 correct (54.00)

Iteration 400, loss = 1.2827
Checking accuracy on validation set
Got 538 / 1000 correct (53.80)

Iteration 500, loss = 1.1070
Checking accuracy on validation set
Got 567 / 1000 correct (56.70)

Iteration 600, loss = 1.3815
Checking accuracy on validation set
Got 561 / 1000 correct (56.10)

Iteration 700, loss = 1.3044
Checking accuracy on validation set
Got 588 / 1000 correct (58.80)



# Part V. CIFAR-10 open-ended challenge

In this section, you can experiment with whatever ConvNet architecture you'd like on CIFAR-10.

Now it's your job to experiment with architectures, hyperparameters, loss functions, and optimizers to train a model that achieves **at least 70%** accuracy on the CIFAR-10 **validation** set within 10 epochs. You can use the check_accuracy and train functions from above. You can use either `nn.Module` or `nn.Sequential` API.

Describe what you did at the end of this notebook.

Here are the official API documentation for each component. One note: what we call in the class "spatial batch norm" is called "BatchNorm2D" in PyTorch.

* Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
* Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
* Loss functions: http://pytorch.org/docs/stable/nn.html#loss-functions
* Optimizers: http://pytorch.org/docs/stable/optim.html


### Things you might try:
- **Filter size**: Above we used 5x5; would smaller filters be more efficient?
- **Number of filters**: Above we used 32 filters. Do more or fewer do better?
- **Pooling vs Strided Convolution**: Do you use max pooling or just stride convolutions?
- **Batch normalization**: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
- **Network architecture**: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
    - [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
- **Global Average Pooling**: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in [Google's Inception Network](https://arxiv.org/abs/1512.00567) (See Table 1 for their architecture).
- **Regularization**: Add l2 weight regularization, or perhaps use Dropout.

### Tips for training
For each network architecture that you try, you should tune the learning rate and other hyperparameters. When doing this there are a couple important things to keep in mind:

- If the parameters are working well, you should see improvement within a few hundred iterations
- Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
- Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- You should use the validation set for hyperparameter search, and save your test set for evaluating your architecture on the best parameters as selected by the validation set.

### Going above and beyond
If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are **not required** to implement any of these, but don't miss the fun if you have time!

- Alternative optimizers: you can try Adam, Adagrad, RMSprop, etc.
- Alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
- Model ensembles
- Data augmentation
- New Architectures
  - [ResNets](https://arxiv.org/abs/1512.03385) where the input from the previous layer is added to the output.
  - [DenseNets](https://arxiv.org/abs/1608.06993) where inputs into previous layers are concatenated together.
  - [This blog has an in-depth overview](https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

### Have fun and happy training!

第五部分.CIFAR-10开放式挑战
在本节中，您可以在CIFAR-10上尝试任何您想要的ConvNet架构。

现在，你的工作是尝试架构、超参数、损失函数和优化器，以训练一个在10个迭代周期内对CIFAR-10验证集达到至少70%准确率的模型。您可以使用上面的check_accuracy和train函数。您可以使用任一nn。模块或nn。顺序API。

描述一下你在本笔记本末尾做了什么。

以下是每个组件的官方API文档。注意：我们在PyTorch中称之为“空间批处理规范”的类称为“BatchNorm2D”。

torch.nn包中的层：http://pytorch.org/docs/stable/nn.html
激活：http://pytorch.org/docs/stable/nn.html#non-线性激活
损失函数：http://pytorch.org/docs/stable/nn.html#loss-功能
优化器：http://pytorch.org/docs/stable/optim.html
你可以尝试的事情：
过滤器尺寸：上面我们用的是5x5；更小的过滤器会更有效率吗？
过滤器数量：上面我们使用了32个过滤器。多做还是少做更好？
池化与步进卷积：你是使用最大池化还是只使用步进卷积？
批量归一化：尝试在卷积层之后添加空间批量归一化，在仿射层之后添加香草批量归一化。你的网络训练速度更快吗？
网络架构：上述网络有两层可训练参数。你能用深度网络做得更好吗？值得尝试的优秀架构包括：
[conv-relu pool]xN->[仿射]xM->[softmax或SVM]
[conv-relu-conv-relu池]xN->[仿射]xM->[softmax或SVM]
[batchnorm relu conv]xN->[仿射]xM->[softmax或SVM]
全局平均池化：不是先展平然后有多个仿射层，而是执行卷积，直到图像变小（7x7左右），然后执行平均池化操作，得到1x1的图像图片（1，1，过滤器#），然后将其重塑为（过滤器#）向量。这在谷歌的Inception网络中使用（其架构见表1）。
正则化：添加l2权重正则化，或者使用Dropout。
培训技巧
对于您尝试的每种网络架构，您应该调整学习率和其他超参数。在进行此操作时，有几件重要的事情需要牢记：

如果参数运行良好，您应该在几百次迭代后看到改进
记住超参数调优的从粗到细的方法：首先测试一系列超参数，只进行几次训练迭代，以找到完全有效的参数组合。
一旦你找到了一些似乎有效的参数集，就可以更仔细地搜索这些参数。你可能需要为更多的时代进行训练。
您应该使用验证集进行超参数搜索，并保存测试集，以便在验证集选择的最佳参数上评估架构。
超越自我
如果你喜欢冒险，你可以实现许多其他功能来尝试提高你的表现。你不需要实现这些，但如果你有时间，不要错过乐趣！

替代优化器：你可以尝试Adam、Adagrad、RMSprop等。
替代激活函数，如泄漏ReLU、参数ReLU、ELU或MaxOut。
模型集合
数据增强
新架构
ResNets，将上一层的输入添加到输出中。
DenseNets，将前一层的输入连接在一起。
这个博客有一个深入的概述
祝你训练愉快！



In [36]:
################################################################################
# TODO:                                                                        #
# Experiment with any architectures, optimizers, and hyperparameters.          #
# Achieve AT LEAST 70% accuracy on the *validation set* within 10 epochs.      #
#                                                                              #
# Note that you can use the check_accuracy function to evaluate on either      #
# the test set or the validation set, by passing either loader_test or         #
# loader_val as the second argument to check_accuracy. You should not touch    #
# the test set until you have finished your architecture and  hyperparameter   #
# tuning, and only run the test set once at the end to report a final value.   #

#待办事项：#
#尝试任何架构、优化器和超参数#
#在10个迭代周期内，*验证集*的准确率至少达到70%#
#                                                                              #
#请注意，您可以使用check_accuracy函数对以下任一对象进行计算#
#通过传递loadertest或#
#loader_val作为check_accuracy的第二个参数。你不应该碰#
#测试集，直到您完成架构和超参数#
#调优，最后只运行一次测试集以报告最终值#
################################################################################
model = None
optimizer = None

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
channel_1 = 32
channel_2 = 128
channel_3 = 256
learning_rate=1e-3
model=nn.Sequential(
    nn.Conv2d(3,channel_1,kernel_size=3,padding=1),
    nn.ReLU(),
    nn.Dropout2d(p=0.25),
    nn.MaxPool2d(kernel_size=2),
    nn.BatchNorm2d(channel_1),
    nn.Conv2d(channel_1,channel_2,kernel_size=3,padding=1),
    nn.ReLU(),
    nn.Dropout2d(p=0.25),
    nn.MaxPool2d(kernel_size=2),
    nn.BatchNorm2d(channel_2),
    nn.Conv2d(channel_2,channel_3,kernel_size=3,padding=1),
    nn.ReLU(),
    nn.Dropout2d(p=0.25),
    nn.MaxPool2d(kernel_size=2),
    Flatten(),
    nn.Linear(channel_3*4*4,1024),
)
optimizer=optim.SGD(model.parameters(),lr=learning_rate,momentum=0.9,nesterov=True)


# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

# You should get at least 70% accuracy.
# You may modify the number of epochs to any number below 15.
train_part34(model, optimizer, epochs=10)

Iteration 0, loss = 6.9827
Checking accuracy on validation set
Got 0 / 1000 correct (0.00)

Iteration 100, loss = 1.8385
Checking accuracy on validation set
Got 351 / 1000 correct (35.10)

Iteration 200, loss = 1.6544
Checking accuracy on validation set
Got 432 / 1000 correct (43.20)

Iteration 300, loss = 1.6194
Checking accuracy on validation set
Got 459 / 1000 correct (45.90)

Iteration 400, loss = 1.5315
Checking accuracy on validation set
Got 479 / 1000 correct (47.90)

Iteration 500, loss = 1.4984
Checking accuracy on validation set
Got 495 / 1000 correct (49.50)

Iteration 600, loss = 1.6153
Checking accuracy on validation set
Got 502 / 1000 correct (50.20)

Iteration 700, loss = 1.3286
Checking accuracy on validation set
Got 520 / 1000 correct (52.00)

Iteration 0, loss = 1.4006
Checking accuracy on validation set
Got 567 / 1000 correct (56.70)

Iteration 100, loss = 1.6409
Checking accuracy on validation set
Got 545 / 1000 correct (54.50)

Iteration 200, loss = 1.5149
Checking

## Describe what you did

In the cell below you should write an explanation of what you did, any additional features that you implemented, and/or any graphs that you made in the process of training and evaluating your network.

**Answer:**

在这个训练中 我使用了bn——conv——dropout——maxpool的循环  通过实验 1e-3左右的学习率为最佳

## Test set -- run this only once

Now that we've gotten a result we're happy with, we test our final model on the test set (which you should store in best_model). Think about how this compares to your validation set accuracy.

In [37]:
best_model = model
check_accuracy_part34(loader_test, best_model)

Checking accuracy on test set
Got 7275 / 10000 correct (72.75)
