In [1]:
# This mounts your Google Drive to the Colab VM.
from google.colab import drive
drive.mount('/content/drive')

# TODO: Enter the foldername in your Drive where you have saved the unzipped
# assignment folder, e.g. 'cs231n/assignments/assignment2/'
FOLDERNAME = 'cs231n/assignments/assignment2/'
assert FOLDERNAME is not None, "[!] Enter the foldername."

# Now that we've mounted your Drive, this ensures that
# the Python interpreter of the Colab VM can load
# python files from within it.
import sys
sys.path.append('/content/drive/My Drive/{}'.format(FOLDERNAME))

# This downloads the CIFAR-10 dataset to your Drive
# if it doesn't already exist.
%cd /content/drive/My\ Drive/$FOLDERNAME/cs231n/datasets/
!bash get_datasets.sh
%cd /content/drive/My\ Drive/$FOLDERNAME

Mounted at /content/drive
/content/drive/My Drive/cs231n/assignments/assignment2/cs231n/datasets
/content/drive/My Drive/cs231n/assignments/assignment2


# Introduction to PyTorch

You've written a lot of code in this assignment to provide a whole host of neural network functionality. Dropout, Batch Norm, and 2D convolutions are some of the workhorses of deep learning in computer vision. You've also worked hard to make your code efficient and vectorized.

For the last part of this assignment, though, we're going to leave behind your beautiful codebase and instead migrate to one of two popular deep learning frameworks: in this instance, PyTorch.

您在这项作业中编写了大量代码，以提供大量神经网络功能。Dropout、Batch Norm 和 2D 卷积是计算机视觉深度学习中的一些主力。您还努力使代码高效且矢量化。

不过，在这项作业的最后部分，我们将放弃您漂亮的代码库，而是迁移到两个流行的深度学习框架之一：在本例中为 PyTorch。

## Why do we use deep learning frameworks?

* Our code will now run on GPUs! This will allow our models to train much faster. When using a framework like PyTorch you can harness the power of the GPU for your own custom neural network architectures without having to write CUDA code directly (which is beyond the scope of this class).
* In this class, we want you to be ready to use one of these frameworks for your project so you can experiment more efficiently than if you were writing every feature you want to use by hand.
* We want you to stand on the shoulders of giants! PyTorch is an excellent frameworks that will make your lives a lot easier, and now that you understand their guts, you are free to use them :)
* Finally, we want you to be exposed to the sort of deep learning code you might run into in academia or industry.

## What is PyTorch?

PyTorch is a system for executing dynamic computational graphs over Tensor objects that behave similarly as numpy ndarray. It comes with a powerful automatic differentiation engine that removes the need for manual back-propagation.

## How do I learn PyTorch?

One of our former instructors, Justin Johnson, made an excellent [tutorial](https://github.com/jcjohnson/pytorch-examples) for PyTorch.

You can also find the detailed [API doc](http://pytorch.org/docs/stable/index.html) here. If you have other questions that are not addressed by the API docs, the [PyTorch forum](https://discuss.pytorch.org/) is a much better place to ask than StackOverflow.

## 我们为什么要使用深度学习框架？

* 我们的代码现在将在 GPU 上运行！这将使我们的模型训练得更快。使用像 PyTorch 这样的框架时，您可以利用 GPU 的强大功能来构建您自己的自定义神经网络架构，而无需直接编写 CUDA 代码（这超出了本课程的范围）。
* 在本课程中，我们希望您准备好将其中一个框架用于您的项目，这样您就可以比手动编写每个要使用的功能更有效地进行实验。
* 我们希望您站在巨人的肩膀上！PyTorch 是一个出色的框架，它将使您的生活变得轻松很多，现在您了解了它们的核心，您可以自由使用它们 :)
* 最后，我们希望您接触到您可能在学术界或行业中遇到的那种深度学习代码。

## 什么是 PyTorch？

PyTorch 是一个在 Tensor 对象上执行动态计算图的系统，其行为与 numpy ndarray 类似。它配备了强大的自动微分引擎，无需手动反向传播。

## 如何学习 PyTorch？

我们的一位前任讲师 Justin Johnson 为 PyTorch 制作了一个出色的 [教程](https://github.com/jcjohnson/pytorch-examples)。

您还可以在此处找到详细的 [API 文档](http://pytorch.org/docs/stable/index.html)。如果您有 API 文档未解决的其他问题，那么 [PyTorch 论坛](https://discuss.pytorch.org/) 是一个比 StackOverflow 更好的提问场所。

# Table of Contents

This assignment has 5 parts. You will learn PyTorch on **three different levels of abstraction**, which will help you understand it better and prepare you for the final project.

1. Part I, Preparation: we will use CIFAR-10 dataset.
2. Part II, Barebones PyTorch: **Abstraction level 1**, we will work directly with the lowest-level PyTorch Tensors.
3. Part III, PyTorch Module API: **Abstraction level 2**, we will use `nn.Module` to define arbitrary neural network architecture.
4. Part IV, PyTorch Sequential API: **Abstraction level 3**, we will use `nn.Sequential` to define a linear feed-forward network very conveniently.
5. Part V, CIFAR-10 open-ended challenge: please implement your own network to get as high accuracy as possible on CIFAR-10. You can experiment with any layer, optimizer, hyperparameters or other advanced features.

Here is a table of comparison:

| API           | Flexibility | Convenience |
|---------------|-------------|-------------|
| Barebone      | High        | Low         |
| `nn.Module`     | High        | Medium      |
| `nn.Sequential` | Low         | High        |


目录
此作业有 5 个部分。您将在三个不同的抽象级别上学习 PyTorch，这将帮助您更好地理解它并为最终项目做好准备。

第 I 部分，准备：我们将使用 CIFAR-10 数据集。

第 II 部分，PyTorch 基本版：抽象级别 1，我们将直接使用最低级别的 PyTorch 张量。

第 III 部分，PyTorch 模块 API：抽象级别 2，我们将使用 nn.Module 定义任意神经网络架构。

第 IV 部分，PyTorch Sequential API：抽象级别 3，我们将使用 nn.Sequential 非常方便地定义线性前馈网络。

第 V 部分，CIFAR-10 开放式挑战：请实现您自己的网络以在 CIFAR-10 上获得尽可能高的准确率。您可以尝试任何层、优化器、超参数或其他高级功能。
以下是比较表：

API 灵活性 便利性
Barebone 高 低
nn.Module 高 中
nn.Sequential 低 高

# GPU

You can manually switch to a GPU device on Colab by clicking `Runtime -> Change runtime type` and selecting `GPU` under `Hardware Accelerator`. You should do this before running the following cells to import packages, since the kernel gets restarted upon switching runtimes.

您可以通过点击“运行时”->“更改运行时类型”并在“硬件加速器”下选择“GPU”来手动切换到 Colab 上的 GPU 设备。您应该在运行以下单元以导入软件包之前执行此操作，因为切换运行时时内核会重新启动。

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset
import torchvision.transforms as T

import numpy as np

USE_GPU = True
dtype = torch.float32 # We will be using float throughout this tutorial.

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

# Constant to control how frequently we print train loss.
print_every = 100
print('using device:', device)

using device: cuda


# Part I. Preparation

Now, let's load the CIFAR-10 dataset. This might take a couple minutes the first time you do it, but the files should stay cached after that.

In previous parts of the assignment we had to write our own code to download the CIFAR-10 dataset, preprocess it, and iterate through it in minibatches; PyTorch provides convenient tools to automate this process for us.

第一部分。准备工作
现在，让我们加载 CIFAR-10 数据集。第一次执行此操作可能需要几分钟，但此后文件应该会保持缓存状态。

在作业的前几部分，我们必须编写自己的代码来下载 CIFAR-10 数据集、对其进行预处理并以小批量方式对其进行迭代；PyTorch 提供了方便的工具来为我们自动化此过程。

In [3]:
NUM_TRAIN = 49000

# The torchvision.transforms package provides tools for preprocessing data
# and for performing data augmentation; here we set up a transform to
# preprocess the data by subtracting the mean RGB value and dividing by the
# standard deviation of each RGB value; we've hardcoded the mean and std.
transform = T.Compose([
                T.ToTensor(),
                T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
            ])

# We set up a Dataset object for each split (train / val / test); Datasets load
# training examples one at a time, so we wrap each Dataset in a DataLoader which
# iterates through the Dataset and forms minibatches. We divide the CIFAR-10
# training set into train and val sets by passing a Sampler object to the
# DataLoader telling how it should sample from the underlying Dataset.
cifar10_train = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                             transform=transform)
loader_train = DataLoader(cifar10_train, batch_size=64,
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

cifar10_val = dset.CIFAR10('./cs231n/datasets', train=True, download=True,
                           transform=transform)
loader_val = DataLoader(cifar10_val, batch_size=64,
                        sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

cifar10_test = dset.CIFAR10('./cs231n/datasets', train=False, download=True,
                            transform=transform)
loader_test = DataLoader(cifar10_test, batch_size=64)

# Part II. Barebones PyTorch

PyTorch ships with high-level APIs to help us define model architectures conveniently, which we will cover in Part II of this tutorial. In this section, we will start with the barebone PyTorch elements to understand the autograd engine better. After this exercise, you will come to appreciate the high-level model API more.

We will start with a simple fully-connected ReLU network with two hidden layers and no biases for CIFAR classification.
This implementation computes the forward pass using operations on PyTorch Tensors, and uses PyTorch autograd to compute gradients. It is important that you understand every line, because you will write a harder version after the example.

When we create a PyTorch Tensor with `requires_grad=True`, then operations involving that Tensor will not just compute values; they will also build up a computational graph in the background, allowing us to easily backpropagate through the graph to compute gradients of some Tensors with respect to a downstream loss. Concretely if x is a Tensor with `x.requires_grad == True` then after backpropagation `x.grad` will be another Tensor holding the gradient of x with respect to the scalar loss at the end.

# 第二部分。PyTorch 的基本框架

PyTorch 附带高级 API，可帮助我们方便地定义模型架构，我们将在本教程的第二部分中介绍。在本节中，我们将从基本的 PyTorch 元素开始，以更好地理解自动求导引擎。完成此练习后，您将更加欣赏高级模型 API。

我们将从一个简单的全连接 ReLU 网络开始，该网络具有两个隐藏层，没有用于 CIFAR 分类的偏差。
此实现使用 PyTorch 张量上的操作计算前向传递，并使用 PyTorch 自动求导来计算梯度。理解每一行很重要，因为您将在示例之后编写更难的版本。

当我们使用 `requires_grad=True` 创建 PyTorch 张量时，涉及该张量的操作将不仅仅是计算值；它们还将在后台构建一个计算图，使我们能够轻松地通过该图反向传播来计算某些张量相对于下游损失的梯度。具体来说，如果 x 是一个具有 `x.requires_grad == True` 的张量，那么在反向传播之后，`x.grad` 将是另一个张量，它保存着 x 相对于最终标量损失的梯度。

### PyTorch Tensors: Flatten Function
A PyTorch Tensor is conceptionally similar to a numpy array: it is an n-dimensional grid of numbers, and like numpy PyTorch provides many functions to efficiently operate on Tensors. As a simple example, we provide a `flatten` function below which reshapes image data for use in a fully-connected neural network.

Recall that image data is typically stored in a Tensor of shape N x C x H x W, where:

* N is the number of datapoints
* C is the number of channels
* H is the height of the intermediate feature map in pixels
* W is the height of the intermediate feature map in pixels

This is the right way to represent the data when we are doing something like a 2D convolution, that needs spatial understanding of where the intermediate features are relative to each other. When we use fully connected affine layers to process the image, however, we want each datapoint to be represented by a single vector -- it's no longer useful to segregate the different channels, rows, and columns of the data. So, we use a "flatten" operation to collapse the `C x H x W` values per representation into a single long vector. The flatten function below first reads in the N, C, H, and W values from a given batch of data, and then returns a "view" of that data. "View" is analogous to numpy's "reshape" method: it reshapes x's dimensions to be N x ??, where ?? is allowed to be anything (in this case, it will be C x H x W, but we don't need to specify that explicitly).

### PyTorch 张量：Flatten 函数
PyTorch 张量在概念上类似于 numpy 数组：它是一个 n 维数字网格，与 numpy 一样，PyTorch 提供了许多函数来高效地操作张量。作为一个简单的例子，我们在下面提供了一个 `flatten` 函数，该函数可以重塑图像数据以用于全连接神经网络。

回想一下，图像数据通常存储在形状为 N x C x H x W 的张量中，其中：

* N 是数据点的数量
* C 是通道数
* H 是中间特征图的高度（以像素为单位）
* W 是中间特征图的高度（以像素为单位）

当我们进行类似 2D 卷积的操作时，这是表示数据的正确方法，这需要对中间特征相对于彼此的位置进行空间理解。然而，当我们使用完全连接的仿射层来处理图像时，我们希望每个数据点都由一个向量表示——将数据的不同通道、行和列分开不再有用。因此，我们使用“扁平化”操作将每个表示的“C x H x W”值折叠成一个长向量。下面的扁平化函数首先从给定的一批数据中读取 N、C、H 和 W 值，然后返回该数据的“视图”。 “视图”类似于 numpy 的“重塑”方法：它将 x 的维度重塑为 N x ??，其中 ?? 可以是任何值（在本例中，它将是 C x H x W，但我们不需要明确指定）。

In [11]:
def flatten(x):
    N = x.shape[0] # read in N, C, H, W
    return x.view(N, -1)  # "flatten" the C * H * W values into a single vector per image

def test_flatten():
    x = torch.arange(12).view(2, 1, 3, 2)
    print('Before flattening: ', x)
    print('After flattening: ', flatten(x))

def my_test_flatten1():
  x = torch.arange(60).view(10, 1, 3, 2)
  print('Before flattening: ', x)
  print('After flattening: ', flatten(x))

test_flatten()

# 简单来说，flatten就是按第一个维度展开为二维的

Before flattening:  tensor([[[[ 0,  1],
          [ 2,  3],
          [ 4,  5]]],


        [[[ 6,  7],
          [ 8,  9],
          [10, 11]]]])
After flattening:  tensor([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]])


### Barebones PyTorch: Two-Layer Network

Here we define a function `two_layer_fc` which performs the forward pass of a two-layer fully-connected ReLU network on a batch of image data. After defining the forward pass we check that it doesn't crash and that it produces outputs of the right shape by running zeros through the network.

You don't have to write any code here, but it's important that you read and understand the implementation.

### Barebones PyTorch：双层网络

我们在这里定义一个函数“two_layer_fc”，该函数对一批图像数据执行双层全连接 ReLU 网络的前向传递。定义前向传递后，我们检查它是否崩溃，以及它是否通过在网络中运行零来产生正确形状的输出。

您不必在这里编写任何代码，但阅读并理解实现非常重要。

In [13]:
import torch.nn.functional as F  # useful stateless functions

# 包含有用的无状态函数

def two_layer_fc(x, params):
    """
    A fully-connected neural networks; the architecture is:
    NN is fully connected -> ReLU -> fully connected layer.
    Note that this function only defines the forward pass;
    PyTorch will take care of the backward pass for us.

    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.

    Inputs:
    - x: A PyTorch Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of PyTorch Tensors giving weights for the network;
      w1 has shape (D, H) and w2 has shape (H, C).

    Returns:
    - scores: A PyTorch Tensor of shape (N, C) giving classification scores for
      the input data x.
    """
    """
    一个全连接神经网络；架构为：
    全连接层 -> ReLU激活 -> 全连接层。
    注意这个函数只定义前向传播；
    PyTorch会自动处理反向传播。

    网络输入是一个小批量数据，形状为
    (N, d1, ..., dM)，其中 d1 * ... * dM = D。隐藏层包含H个单元，
    输出层生成C个类别的得分。

    输入：
    - x: 形状为(N, d1, ..., dM)的PyTorch张量，表示输入数据的小批量
    - params: 包含两个PyTorch张量的列表[w1, w2]，表示网络权重；
      w1形状为(D, H)，w2形状为(H, C)

    返回：
    - scores: 形状为(N, C)的PyTorch张量，表示输入数据x的分类得分
    """
    # first we flatten the image
    x = flatten(x)  # shape: [batch_size, C x H x W]

    w1, w2 = params

    # Forward pass: compute predicted y using operations on Tensors. Since w1 and
    # w2 have requires_grad=True, operations involving these Tensors will cause
    # PyTorch to build a computational graph, allowing automatic computation of
    # gradients. Since we are no longer implementing the backward pass by hand we
    # don't need to keep references to intermediate values.
    # you can also use `.clamp(min=0)`, equivalent to F.relu()

    # 前向传播：使用张量运算计算输出。由于w1和w2设置了requires_grad=True，
    # 涉及这些张量的操作会构建计算图，实现自动梯度计算。
    # 可以用`.clamp(min=0)`替代F.relu()

    x = F.relu(x.mm(w1))
    x = x.mm(w2)
    return x


def two_layer_fc_test():
    hidden_layer_size = 42
    x = torch.zeros((64, 50), dtype=dtype)  # minibatch size 64, feature dimension 50
    w1 = torch.zeros((50, hidden_layer_size), dtype=dtype)
    w2 = torch.zeros((hidden_layer_size, 10), dtype=dtype)
    scores = two_layer_fc(x, [w1, w2])
    print(scores.size())  # you should see [64, 10]

two_layer_fc_test()

torch.Size([64, 10])


### Barebones PyTorch: Three-Layer ConvNet

Here you will complete the implementation of the function `three_layer_convnet`, which will perform the forward pass of a three-layer convolutional network. Like above, we can immediately test our implementation by passing zeros through the network. The network should have the following architecture:

1. A convolutional layer (with bias) with `channel_1` filters, each with shape `KW1 x KH1`, and zero-padding of two
2. ReLU nonlinearity
3. A convolutional layer (with bias) with `channel_2` filters, each with shape `KW2 x KH2`, and zero-padding of one
4. ReLU nonlinearity
5. Fully-connected layer with bias, producing scores for C classes.

Note that we have **no softmax activation** here after our fully-connected layer: this is because PyTorch's cross entropy loss performs a softmax activation for you, and by bundling that step in makes computation more efficient.

**HINT**: For convolutions: http://pytorch.org/docs/stable/nn.html#torch.nn.functional.conv2d; pay attention to the shapes of convolutional filters!

### Barebones PyTorch：三层卷积网络

在这里，您将完成函数“three_layer_convnet”的实现，该函数将执行三层卷积网络的前向传递。与上文一样，我们可以通过在网络中传递零来立即测试我们的实现。网络应具有以下架构：

1. 卷积层（带偏差），带有“channel_1”滤波器，每个滤波器的形状为“KW1 x KH1”，并填充两个零

2. ReLU 非线性

3. 卷积层（带偏差），带有“channel_2”滤波器，每个滤波器的形状为“KW2 x KH2”，并填充一个零

4. ReLU 非线性

5. 带偏差的全连接层，为 C 类生成分数。

请注意，在完全连接层之后，我们这里没有 **softmax 激活**：这是因为 PyTorch 的交叉熵损失会为您执行 softmax 激活，并且通过捆绑该步骤可以提高计算效率。

**提示**：对于卷积：http://pytorch.org/docs/stable/nn.html#torch.nn. functional.conv2d；注意卷积滤波器的形状！

In [16]:
def three_layer_convnet(x, params):
    """
    Performs the forward pass of a three-layer convolutional network with the
    architecture defined above.

    Inputs:
    - x: A PyTorch Tensor of shape (N, 3, H, W) giving a minibatch of images
    - params: A list of PyTorch Tensors giving the weights and biases for the
      network; should contain the following:
      - conv_w1: PyTorch Tensor of shape (channel_1, 3, KH1, KW1) giving weights
        for the first convolutional layer
      - conv_b1: PyTorch Tensor of shape (channel_1,) giving biases for the first
        convolutional layer
      - conv_w2: PyTorch Tensor of shape (channel_2, channel_1, KH2, KW2) giving
        weights for the second convolutional layer
      - conv_b2: PyTorch Tensor of shape (channel_2,) giving biases for the second
        convolutional layer
      - fc_w: PyTorch Tensor giving weights for the fully-connected layer. Can you
        figure out what the shape should be?
      - fc_b: PyTorch Tensor giving biases for the fully-connected layer. Can you
        figure out what the shape should be?

    Returns:
    - scores: PyTorch Tensor of shape (N, C) giving classification scores for x
    """
    """
    实现一个三层卷积网络的前向传播，网络架构定义如下：
    - 卷积层1 -> 激活函数 -> 卷积层2 -> 激活函数 -> 展平 -> 全连接层

    输入：
    - x: 形状为 (N, 3, H, W) 的PyTorch张量，表示输入图像的小批量
    - params: 包含网络权重和偏置的PyTorch张量列表，按顺序应包含：
      - conv_w1: 形状为 (channel_1, 3, KH1, KW1) 的卷积层1权重
      - conv_b1: 形状为 (channel_1,) 的卷积层1偏置
      - conv_w2: 形状为 (channel_2, channel_1, KH2, KW2) 的卷积层2权重
      - conv_b2: 形状为 (channel_2,) 的卷积层2偏置
      - fc_w: 全连接层权重（需自行计算形状）
      - fc_b: 全连接层偏置（需自行计算形状）

    返回：
    - scores: 形状为 (N, C) 的PyTorch张量，表示输入x的分类得分
    """
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None
    ################################################################################
    # TODO: Implement the forward pass for the three-layer ConvNet.                #
    ################################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    x=F.relu(F.conv2d(x,conv_w1,conv_b1,padding=2))
    x=F.relu(F.conv2d(x,conv_w2,conv_b2,padding=1))
    x=flatten(x)
    x=x.mm(fc_w)+fc_b

    scores=x

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ################################################################################
    #                                 END OF YOUR CODE                             #
    ################################################################################
    return scores

After defining the forward pass of the ConvNet above, run the following cell to test your implementation.

When you run this function, scores should have shape (64, 10).

定义上述 ConvNet 的前向传递后，运行以下单元来测试您的实现。

运行此函数时，分数应具有形状 (64, 10)。

In [17]:
def three_layer_convnet_test():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]

    conv_w1 = torch.zeros((6, 3, 5, 5), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b1 = torch.zeros((6,))  # out_channel
    conv_w2 = torch.zeros((9, 6, 3, 3), dtype=dtype)  # [out_channel, in_channel, kernel_H, kernel_W]
    conv_b2 = torch.zeros((9,))  # out_channel

    # you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
    fc_w = torch.zeros((9 * 32 * 32, 10))
    fc_b = torch.zeros(10)

    scores = three_layer_convnet(x, [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b])
    print(scores.size())  # you should see [64, 10]
three_layer_convnet_test()

torch.Size([64, 10])


### Barebones PyTorch: Initialization
Let's write a couple utility methods to initialize the weight matrices for our models.

- `random_weight(shape)` initializes a weight tensor with the Kaiming normalization method.
- `zero_weight(shape)` initializes a weight tensor with all zeros. Useful for instantiating bias parameters.

The `random_weight` function uses the Kaiming normal initialization method, described in:

He et al, *Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification*, ICCV 2015, https://arxiv.org/abs/1502.01852

PyTorch 基本功能：初始化
让我们编写几个实用方法来初始化模型的权重矩阵。

random_weight(shape) 使用 Kaiming 正则化方法初始化权重张量。
zero_weight(shape) 使用全零初始化权重张量。用于实例化偏差参数。
random_weight 函数使用 Kaiming 正则初始化方法，描述如下：

He 等人，深入研究整流器：在 ImageNet 分类上超越人类水平的表现，ICCV 2015，https://arxiv.org/abs/1502.01852

In [21]:
def random_weight(shape):
    """
    Create random Tensors for weights; setting requires_grad=True means that we
    want to compute gradients for these Tensors during the backward pass.
    We use Kaiming normalization: sqrt(2 / fan_in)
    """
    """
    创建随机权重张量；设置 requires_grad=True 表示需要在反向传播时计算梯度。
    使用 Kaiming 初始化方法：标准差为 sqrt(2 / fan_in)
    """
    if len(shape) == 2:  # FC weight
        fan_in = shape[0]
    else:
        fan_in = np.prod(shape[1:]) # conv weight [out_channel, in_channel, kH, kW]
    # randn is standard normal distribution generator.
    w = torch.randn(shape, device=device, dtype=dtype) * np.sqrt(2. / fan_in)
    w.requires_grad = True
    return w

def zero_weight(shape):
    return torch.zeros(shape, device=device, dtype=dtype, requires_grad=True)

# create a weight of shape [3 x 5]
# you should see the type `torch.cuda.FloatTensor` if you use GPU.
# Otherwise it should be `torch.FloatTensor`
# 示例：创建一个形状为 [3 x 5] 的权重张量
# 如果使用 GPU，张量类型应为 `torch.cuda.FloatTensor`
# 否则为 `torch.FloatTensor`
random_weight((3, 5))

tensor([[-0.7439, -1.3634,  0.5936,  0.8660, -0.5562],
        [ 0.7128, -1.3202,  1.6780,  0.1482,  1.1876],
        [-0.5198,  0.4042, -0.9884,  0.7555,  0.7727]], device='cuda:0',
       requires_grad=True)

### Barebones PyTorch: Check Accuracy
When training the model we will use the following function to check the accuracy of our model on the training or validation sets.

When checking accuracy we don't need to compute any gradients; as a result we don't need PyTorch to build a computational graph for us when we compute scores. To prevent a graph from being built we scope our computation under a `torch.no_grad()` context manager.

PyTorch 基本功能：检查准确性
训练模型时，我们将使用以下函数检查模型在训练或验证集上的准确性。

检查准确性时，我们不需要计算任何梯度；因此，在计算分数时，我们不需要 PyTorch 为我们构建计算图。为了防止构建图表，我们将计算范围设在 torch.no_grad() 上下文管理器下。

In [22]:
def check_accuracy_part2(loader, model_fn, params):
    """
    Check the accuracy of a classification model.

    Inputs:
    - loader: A DataLoader for the data split we want to check
    - model_fn: A function that performs the forward pass of the model,
      with the signature scores = model_fn(x, params)
    - params: List of PyTorch Tensors giving parameters of the model

    Returns: Nothing, but prints the accuracy of the model
    """
    split = 'val' if loader.dataset.train else 'test'
    print('Checking accuracy on the %s set' % split)
    num_correct, num_samples = 0, 0
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.int64)
            scores = model_fn(x, params)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f%%)' % (num_correct, num_samples, 100 * acc))

### BareBones PyTorch: Training Loop
We can now set up a basic training loop to train our network. We will train the model using stochastic gradient descent without momentum. We will use `torch.functional.cross_entropy` to compute the loss; you can [read about it here](http://pytorch.org/docs/stable/nn.html#cross-entropy).

The training loop takes as input the neural network function, a list of initialized parameters (`[w1, w2]` in our example), and learning rate.

BareBones PyTorch：训练循环
我们现在可以设置一个基本的训练循环来训练我们的网络。我们将使用无动量的随机梯度下降来训练模型。我们将使用 torch. functional.cross_entropy 来计算损失；您可以在此处阅读相关内容。

训练循环将神经网络函数、初始化参数列表（在我们的示例中为 [w1, w2]）和学习率作为输入。

In [25]:
def train_part2(model_fn, params, learning_rate):
    """
    Train a model on CIFAR-10.

    Inputs:
    - model_fn: A Python function that performs the forward pass of the model.
      It should have the signature scores = model_fn(x, params) where x is a
      PyTorch Tensor of image data, params is a list of PyTorch Tensors giving
      model weights, and scores is a PyTorch Tensor of shape (N, C) giving
      scores for the elements in x.
    - params: List of PyTorch Tensors giving weights for the model
    - learning_rate: Python scalar giving the learning rate to use for SGD

    Returns: Nothing
    """
    """
    在 CIFAR-10 数据集上训练模型。

    输入：
    - model_fn: 定义模型前向传播的Python函数。其签名为 scores = model_fn(x, params)，
      其中x是输入图像数据的PyTorch张量，params是模型权重列表，scores是形状为 (N, C) 的张量，
      表示x中每个样本的类别得分。
    - params: 包含模型权重的PyTorch张量列表
    - learning_rate: 用于随机梯度下降（SGD）的学习率

    返回：无
    """
    for t, (x, y) in enumerate(loader_train):
        # Move the data to the proper device (GPU or CPU)
        x = x.to(device=device, dtype=dtype)
        y = y.to(device=device, dtype=torch.long)

        # Forward pass: compute scores and loss
        scores = model_fn(x, params)
        loss = F.cross_entropy(scores, y)

        # Backward pass: PyTorch figures out which Tensors in the computational
        # graph has requires_grad=True and uses backpropagation to compute the
        # gradient of the loss with respect to these Tensors, and stores the
        # gradients in the .grad attribute of each Tensor.
        loss.backward()

        # Update parameters. We don't want to backpropagate through the
        # parameter updates, so we scope the updates under a torch.no_grad()
        # context manager to prevent a computational graph from being built.
        with torch.no_grad():
            for w in params:
                w -= learning_rate * w.grad

                # Manually zero the gradients after running the backward pass
                w.grad.zero_()

        if t % print_every == 0:
            print('Iteration %d, loss = %.4f' % (t, loss.item()))
            check_accuracy_part2(loader_val, model_fn, params)
            print()

### BareBones PyTorch: Train a Two-Layer Network
Now we are ready to run the training loop. We need to explicitly allocate tensors for the fully connected weights, `w1` and `w2`.

Each minibatch of CIFAR has 64 examples, so the tensor shape is `[64, 3, 32, 32]`.

After flattening, `x` shape should be `[64, 3 * 32 * 32]`. This will be the size of the first dimension of `w1`.
The second dimension of `w1` is the hidden layer size, which will also be the first dimension of `w2`.

Finally, the output of the network is a 10-dimensional vector that represents the probability distribution over 10 classes.

You don't need to tune any hyperparameters but you should see accuracies above 40% after training for one epoch.

### BareBones PyTorch：训练两层网络
现在我们准备运行训练循环。我们需要为完全连接的权重 `w1` 和 `w2` 明确分配张量。

CIFAR 的每个小批量都有 64 个示例，因此张量形状为 `[64, 3, 32, 32]`。

展平后，`x` 形状应为 `[64, 3 * 32 * 32]`。这将是 `w1` 第一维的大小。

`w1` 的第二维是隐藏层大小，它也将是 `w2` 的第一维。

最后，网络的输出是一个 10 维向量，表示 10 个类的概率分布。

您不需要调整任何超参数，但训练一个时期后，您应该会看到 40% 以上的准确率。

In [24]:
hidden_layer_size = 4000
learning_rate = 1e-2

w1 = random_weight((3 * 32 * 32, hidden_layer_size))
w2 = random_weight((hidden_layer_size, 10))

train_part2(two_layer_fc, [w1, w2], learning_rate)

Iteration 0, loss = 3.4306
Checking accuracy on the val set
Got 133 / 1000 correct (13.30%)

Iteration 100, loss = 2.5513
Checking accuracy on the val set
Got 320 / 1000 correct (32.00%)

Iteration 200, loss = 2.2975
Checking accuracy on the val set
Got 411 / 1000 correct (41.10%)

Iteration 300, loss = 1.9108
Checking accuracy on the val set
Got 407 / 1000 correct (40.70%)

Iteration 400, loss = 1.8835
Checking accuracy on the val set
Got 417 / 1000 correct (41.70%)

Iteration 500, loss = 1.6343
Checking accuracy on the val set
Got 384 / 1000 correct (38.40%)

Iteration 600, loss = 1.6669
Checking accuracy on the val set
Got 430 / 1000 correct (43.00%)

Iteration 700, loss = 1.6792
Checking accuracy on the val set
Got 443 / 1000 correct (44.30%)



### BareBones PyTorch: Training a ConvNet

In the below you should use the functions defined above to train a three-layer convolutional network on CIFAR. The network should have the following architecture:

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. Fully-connected layer (with bias) to compute scores for 10 classes

You should initialize your weight matrices using the `random_weight` function defined above, and you should initialize your bias vectors using the `zero_weight` function above.

You don't need to tune any hyperparameters, but if everything works correctly you should achieve an accuracy above 42% after one epoch.

### BareBones PyTorch：训练 ConvNet

在下面，您应该使用上面定义的函数在 CIFAR 上训练三层卷积网络。网络应具有以下架构：

1. 卷积层（带偏差），具有 32 个 5x5 滤波器，零填充为 2
2. ReLU
3. 卷积层（带偏差），具有 16 个 3x3 滤波器，零填充为 1
4. ReLU
5. 全连接层（带偏差），用于计算 10 个类别的分数

您应该使用上面定义的 `random_weight` 函数初始化权重矩阵，并使用上面的 `zero_weight` 函数初始化偏差向量。

您不需要调整任何超参数，但如果一切正常，您应该在一个时期后实现 42% 以上的准确率。

In [27]:
learning_rate = 3e-3

channel_1 = 32
channel_2 = 16

conv_w1 = None
conv_b1 = None
conv_w2 = None
conv_b2 = None
fc_w = None
fc_b = None

################################################################################
# TODO: Initialize the parameters of a three-layer ConvNet.                    #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

fc_input_dim = channel_2 * 32 * 32
num_classes = 10

conv_w1 = random_weight((channel_1, 3, 5, 5))
conv_b1 = zero_weight((channel_1,))

conv_w2 = random_weight((channel_2, channel_1, 3, 3))
conv_b2 = zero_weight((channel_2,))

fc_w = random_weight((fc_input_dim, num_classes))
fc_b = zero_weight((num_classes,))

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
train_part2(three_layer_convnet, params, learning_rate)

Iteration 0, loss = 4.5546
Checking accuracy on the val set
Got 115 / 1000 correct (11.50%)

Iteration 100, loss = 1.7051
Checking accuracy on the val set
Got 327 / 1000 correct (32.70%)

Iteration 200, loss = 1.8469
Checking accuracy on the val set
Got 403 / 1000 correct (40.30%)

Iteration 300, loss = 1.5927
Checking accuracy on the val set
Got 408 / 1000 correct (40.80%)

Iteration 400, loss = 1.5964
Checking accuracy on the val set
Got 451 / 1000 correct (45.10%)

Iteration 500, loss = 1.6107
Checking accuracy on the val set
Got 454 / 1000 correct (45.40%)

Iteration 600, loss = 1.2837
Checking accuracy on the val set
Got 475 / 1000 correct (47.50%)

Iteration 700, loss = 1.4809
Checking accuracy on the val set
Got 460 / 1000 correct (46.00%)



# Part III. PyTorch Module API

Barebone PyTorch requires that we track all the parameter tensors by hand. This is fine for small networks with a few tensors, but it would be extremely inconvenient and error-prone to track tens or hundreds of tensors in larger networks.

PyTorch provides the `nn.Module` API for you to define arbitrary network architectures, while tracking every learnable parameters for you. In Part II, we implemented SGD ourselves. PyTorch also provides the `torch.optim` package that implements all the common optimizers, such as RMSProp, Adagrad, and Adam. It even supports approximate second-order methods like L-BFGS! You can refer to the [doc](http://pytorch.org/docs/master/optim.html) for the exact specifications of each optimizer.

To use the Module API, follow the steps below:

1. Subclass `nn.Module`. Give your network class an intuitive name like `TwoLayerFC`.

2. In the constructor `__init__()`, define all the layers you need as class attributes. Layer objects like `nn.Linear` and `nn.Conv2d` are themselves `nn.Module` subclasses and contain learnable parameters, so that you don't have to instantiate the raw tensors yourself. `nn.Module` will track these internal parameters for you. Refer to the [doc](http://pytorch.org/docs/master/nn.html) to learn more about the dozens of builtin layers. **Warning**: don't forget to call the `super().__init__()` first!

3. In the `forward()` method, define the *connectivity* of your network. You should use the attributes defined in `__init__` as function calls that take tensor as input and output the "transformed" tensor. Do *not* create any new layers with learnable parameters in `forward()`! All of them must be declared upfront in `__init__`.

After you define your Module subclass, you can instantiate it as an object and call it just like the NN forward function in part II.

### Module API: Two-Layer Network
Here is a concrete example of a 2-layer fully connected network:


1,968 / 5,000
# 第三部分。PyTorch 模块 API

PyTorch 的 Barebone 要求我们手动跟踪所有参数张量。这对于具有少量张量的小型网络来说没问题，但在大型网络中跟踪数十或数百个张量会非常不方便且容易出错。

PyTorch 为您提供 `nn.Module` API，以便您定义任意网络架构，同时为您跟踪每个可学习参数。在第二部分中，我们自己实现了 SGD。PyTorch 还提供了 `torch.optim` 包，该包实现了所有常见的优化器，例如 RMSProp、Adagrad 和 Adam。它甚至支持近似二阶方法，例如 L-BFGS！您可以参考 [doc](http://pytorch.org/docs/master/optim.html) 了解每个优化器的确切规格。

要使用模块 API，请按照以下步骤操作：

1. 子类化 `nn.Module`。给你的网络类起一个直观的名字，比如 `TwoLayerFC`。

2. 在构造函数 `__init__()` 中，将你需要的所有层定义为类属性。层对象，如 `nn.Linear` 和 `nn.Conv2d`，本身就是 `nn.Module` 子类，包含可学习的参数，这样你就不必自己实例化原始张量。`nn.Module` 将为你跟踪这些内部参数。请参阅 [doc](http://pytorch.org/docs/master/nn.html) 以了解有关数十个内置层的更多信息。**警告**：不要忘记先调用 `super().__init__()`！

3. 在 `forward()` 方法中，定义网络的 *连接性*。你应该使用 `__init__` 中定义的属性作为函数调用，以张量为输入并输出“转换后的”张量。不要在 `forward()` 中创建任何具有可学习参数的新层！所有这些参数都必须在 `__init__` 中预先声明。

定义模块子类后，您可以将其实例化为对象并像第二部分中的 NN forward 函数一样调用它。

### 模块 API：双层网络
以下是 2 层全连接网络的具体示例：

In [28]:
class TwoLayerFC(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        # assign layer objects to class attributes
        self.fc1 = nn.Linear(input_size, hidden_size)
        # nn.init package contains convenient initialization methods
        # http://pytorch.org/docs/master/nn.html#torch-nn-init
        nn.init.kaiming_normal_(self.fc1.weight)
        self.fc2 = nn.Linear(hidden_size, num_classes)
        nn.init.kaiming_normal_(self.fc2.weight)

    def forward(self, x):
        # forward always defines connectivity
        x = flatten(x)
        scores = self.fc2(F.relu(self.fc1(x)))
        return scores

def test_TwoLayerFC():
    input_size = 50
    x = torch.zeros((64, input_size), dtype=dtype)  # minibatch size 64, feature dimension 50
    model = TwoLayerFC(input_size, 42, 10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_TwoLayerFC()

torch.Size([64, 10])


### Module API: Three-Layer ConvNet
It's your turn to implement a 3-layer ConvNet followed by a fully connected layer. The network architecture should be the same as in Part II:

1. Convolutional layer with `channel_1` 5x5 filters with zero-padding of 2
2. ReLU
3. Convolutional layer with `channel_2` 3x3 filters with zero-padding of 1
4. ReLU
5. Fully-connected layer to `num_classes` classes

You should initialize the weight matrices of the model using the Kaiming normal initialization method.

**HINT**: http://pytorch.org/docs/stable/nn.html#conv2d

After you implement the three-layer ConvNet, the `test_ThreeLayerConvNet` function will run your implementation; it should print `(64, 10)` for the shape of the output scores.

### 模块 API：三层卷积网络
现在轮到您实现一个 3 层卷积网络，后面跟着一个全连接层。网络架构应与第二部分相同：

1. 卷积层，带有 `channel_1` 5x5 滤波器，零填充为 2
2. ReLU
3. 卷积层，带有 `channel_2` 3x3 滤波器，零填充为 1
4. ReLU
5. 全连接层到 `num_classes` 类

您应该使用 Kaiming 常规初始化方法初始化模型的权重矩阵。

**提示**：http://pytorch.org/docs/stable/nn.html#conv2d

实现三层卷积网络后，`test_ThreeLayerConvNet` 函数将运行您的实现；它应该打印 `(64, 10)` 作为输出分数的形状。

In [29]:
class ThreeLayerConvNet(nn.Module):
    def __init__(self, in_channel, channel_1, channel_2, num_classes):
        super().__init__()
        ########################################################################
        # TODO: Set up the layers you need for a three-layer ConvNet with the  #
        # architecture defined above.                                          #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        self.conv1=nn.Conv2d(in_channel,channel_1,kernel_size=5, padding=2)
        self.conv2=nn.Conv2d(channel_1,channel_2,kernel_size=3,padding=1)
        self.fc=nn.Linear(channel_2 * 32 * 32,num_classes)

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                          END OF YOUR CODE                            #
        ########################################################################

    def forward(self, x):
        scores = None
        ########################################################################
        # TODO: Implement the forward function for a 3-layer ConvNet. you      #
        # should use the layers you defined in __init__ and specify the        #
        # connectivity of those layers in forward()                            #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        x=F.relu(self.conv1(x))
        x=F.relu(self.conv2(x))
        x=x.view(x.size(0), -1)

        scores=self.fc(x)

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################
        return scores


def test_ThreeLayerConvNet():
    x = torch.zeros((64, 3, 32, 32), dtype=dtype)  # minibatch size 64, image size [3, 32, 32]
    model = ThreeLayerConvNet(in_channel=3, channel_1=12, channel_2=8, num_classes=10)
    scores = model(x)
    print(scores.size())  # you should see [64, 10]
test_ThreeLayerConvNet()

torch.Size([64, 10])


### Module API: Check Accuracy
Given the validation or test set, we can check the classification accuracy of a neural network.

This version is slightly different from the one in part II. You don't manually pass in the parameters anymore.

In [30]:
def check_accuracy_part34(loader, model):
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

### Module API: Training Loop
We also use a slightly different training loop. Rather than updating the values of the weights ourselves, we use an Optimizer object from the `torch.optim` package, which abstract the notion of an optimization algorithm and provides implementations of most of the algorithms commonly used to optimize neural networks.

### 模块 API：训练循环
我们还使用略有不同的训练循环。我们不是自己更新权重值，而是使用来自 `torch.optim` 包的 Optimizer 对象，它抽象了优化算法的概念，并提供了优化神经网络常用的大多数算法的实现。

In [31]:
def train_part34(model, optimizer, epochs=1):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.

    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for

    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss = F.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            loss.backward()

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            optimizer.step()

            if t % print_every == 0:
                print('Iteration %d, loss = %.4f' % (t, loss.item()))
                check_accuracy_part34(loader_val, model)
                print()

### Module API: Train a Two-Layer Network
Now we are ready to run the training loop. In contrast to part II, we don't explicitly allocate parameter tensors anymore.

Simply pass the input size, hidden layer size, and number of classes (i.e. output size) to the constructor of `TwoLayerFC`.

You also need to define an optimizer that tracks all the learnable parameters inside `TwoLayerFC`.

You don't need to tune any hyperparameters, but you should see model accuracies above 40% after training for one epoch.

### 模块 API：训练双层网络

现在我们准备运行训练循环。与第二部分相比，我们不再明确分配参数张量。

只需将输入大小、隐藏层大小和类数（即输出大小）传递给 `TwoLayerFC` 的构造函数即可。

您还需要定义一个优化器来跟踪 `TwoLayerFC` 内所有可学习的参数。

您不需要调整任何超参数，但训练一个时期后，您应该会看到模型准确率超过 40%。

In [32]:
hidden_layer_size = 4000
learning_rate = 1e-2
model = TwoLayerFC(3 * 32 * 32, hidden_layer_size, 10)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

train_part34(model, optimizer)

Iteration 0, loss = 3.3105
Checking accuracy on validation set
Got 170 / 1000 correct (17.00)

Iteration 100, loss = 2.1243
Checking accuracy on validation set
Got 304 / 1000 correct (30.40)

Iteration 200, loss = 2.1553
Checking accuracy on validation set
Got 344 / 1000 correct (34.40)

Iteration 300, loss = 1.9221
Checking accuracy on validation set
Got 397 / 1000 correct (39.70)

Iteration 400, loss = 1.9671
Checking accuracy on validation set
Got 434 / 1000 correct (43.40)

Iteration 500, loss = 1.8783
Checking accuracy on validation set
Got 404 / 1000 correct (40.40)

Iteration 600, loss = 1.8568
Checking accuracy on validation set
Got 413 / 1000 correct (41.30)

Iteration 700, loss = 1.8013
Checking accuracy on validation set
Got 446 / 1000 correct (44.60)



### Module API: Train a Three-Layer ConvNet
You should now use the Module API to train a three-layer ConvNet on CIFAR. This should look very similar to training the two-layer network! You don't need to tune any hyperparameters, but you should achieve above above 45% after training for one epoch.

You should train the model using stochastic gradient descent without momentum.

### 模块 API：训练三层 ConvNet
现在您应该使用模块 API 在 CIFAR 上训练三层 ConvNet。这看起来应该与训练两层网络非常相似！您不需要调整任何超参数，但训练一个时期后您应该达到 45% 以上的水平。

您应该使用无动量的随机梯度下降来训练模型。

In [35]:
learning_rate = 3e-3
channel_1 = 32
channel_2 = 16

model = None
optimizer = None
################################################################################
# TODO: Instantiate your ThreeLayerConvNet model and a corresponding optimizer #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

model = ThreeLayerConvNet(3,channel_1,channel_2, 10)
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

train_part34(model, optimizer)

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

train_part34(model, optimizer)

Iteration 0, loss = 2.3149
Checking accuracy on validation set
Got 82 / 1000 correct (8.20)

Iteration 100, loss = 1.9524
Checking accuracy on validation set
Got 301 / 1000 correct (30.10)

Iteration 200, loss = 1.8812
Checking accuracy on validation set
Got 375 / 1000 correct (37.50)

Iteration 300, loss = 2.0543
Checking accuracy on validation set
Got 375 / 1000 correct (37.50)

Iteration 400, loss = 1.7353
Checking accuracy on validation set
Got 416 / 1000 correct (41.60)

Iteration 500, loss = 1.7509
Checking accuracy on validation set
Got 394 / 1000 correct (39.40)

Iteration 600, loss = 1.6186
Checking accuracy on validation set
Got 436 / 1000 correct (43.60)

Iteration 700, loss = 1.8156
Checking accuracy on validation set
Got 444 / 1000 correct (44.40)

Iteration 0, loss = 1.6422
Checking accuracy on validation set
Got 448 / 1000 correct (44.80)

Iteration 100, loss = 1.7141
Checking accuracy on validation set
Got 467 / 1000 correct (46.70)

Iteration 200, loss = 1.6477
Checkin

# Part IV. PyTorch Sequential API

Part III introduced the PyTorch Module API, which allows you to define arbitrary learnable layers and their connectivity.

For simple models like a stack of feed forward layers, you still need to go through 3 steps: subclass `nn.Module`, assign layers to class attributes in `__init__`, and call each layer one by one in `forward()`. Is there a more convenient way?

Fortunately, PyTorch provides a container Module called `nn.Sequential`, which merges the above steps into one. It is not as flexible as `nn.Module`, because you cannot specify more complex topology than a feed-forward stack, but it's good enough for many use cases.

### Sequential API: Two-Layer Network
Let's see how to rewrite our two-layer fully connected network example with `nn.Sequential`, and train it using the training loop defined above.

Again, you don't need to tune any hyperparameters here, but you shoud achieve above 40% accuracy after one epoch of training.

# 第四部分 PyTorch Sequential API

第三部分介绍了 PyTorch Module API，它允许你定义任意可学习的层及其连接。

对于像前馈层堆栈这样的简单模型，你仍然需要经历 3 个步骤：子类化 `nn.Module`，在 `__init__` 中将层分配给类属性，并在 `forward()` 中逐个调用每个层。有没有更方便的方法？

幸运的是，PyTorch 提供了一个名为 `nn.Sequential` 的容器模块，它将上述步骤合并为一个。它不像 `nn.Module` 那样灵活，因为你不能指定比前馈堆栈更复杂的拓扑，但它对许多用例来说已经足够好了。

### Sequential API：双层网络
让我们看看如何用 `nn.Sequential` 重写我们的双层全连接网络示例，并使用上面定义的训练循环对其进行训练。

再次强调，您不需要在这里调整任何超参数，但是经过一个时期的训练后您应该达到 40% 以上的准确率。

In [39]:
# We need to wrap `flatten` function in a module in order to stack it
# in nn.Sequential
class Flatten(nn.Module):
    def forward(self, x):
        return flatten(x)

hidden_layer_size = 4000
learning_rate = 1e-2

model = nn.Sequential(
    Flatten(),
    nn.Linear(3 * 32 * 32, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, 10),
)

# you can use Nesterov momentum in optim.SGD
optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)

train_part34(model, optimizer)

Iteration 0, loss = 2.3308
Checking accuracy on validation set
Got 199 / 1000 correct (19.90)

Iteration 100, loss = 1.7371
Checking accuracy on validation set
Got 372 / 1000 correct (37.20)

Iteration 200, loss = 1.6567
Checking accuracy on validation set
Got 416 / 1000 correct (41.60)

Iteration 300, loss = 1.8723
Checking accuracy on validation set
Got 424 / 1000 correct (42.40)

Iteration 400, loss = 1.4627
Checking accuracy on validation set
Got 434 / 1000 correct (43.40)

Iteration 500, loss = 1.9463
Checking accuracy on validation set
Got 414 / 1000 correct (41.40)

Iteration 600, loss = 2.0202
Checking accuracy on validation set
Got 445 / 1000 correct (44.50)

Iteration 700, loss = 2.0814
Checking accuracy on validation set
Got 432 / 1000 correct (43.20)



### Sequential API: Three-Layer ConvNet
Here you should use `nn.Sequential` to define and train a three-layer ConvNet with the same architecture we used in Part III:

1. Convolutional layer (with bias) with 32 5x5 filters, with zero-padding of 2
2. ReLU
3. Convolutional layer (with bias) with 16 3x3 filters, with zero-padding of 1
4. ReLU
5. Fully-connected layer (with bias) to compute scores for 10 classes

You can use the default PyTorch weight initialization.

You should optimize your model using stochastic gradient descent with Nesterov momentum 0.9.

Again, you don't need to tune any hyperparameters but you should see accuracy above 55% after one epoch of training.

### Sequential API：三层 ConvNet
在这里，您应该使用 `nn.Sequential` 来定义和训练一个三层 ConvNet，其架构与我们在第 III 部分中使用的相同：

1. 卷积层（带偏差），具有 32 个 5x5 滤波器，零填充为 2
2. ReLU
3. 卷积层（带偏差），具有 16 个 3x3 滤波器，零填充为 1
4. ReLU
5. 全连接层（带偏差）用于计算 10 个类的分数

您可以使用默认的 PyTorch 权重初始化。

您应该使用 Nesterov 动量为 0.9 的随机梯度下降来优化您的模型。

同样，您不需要调整任何超参数，但经过一个训练周期后，您应该会看到准确率高于 55%。

In [41]:
channel_1 = 32
channel_2 = 16
learning_rate = 1e-2

model = None
optimizer = None

################################################################################
# TODO: Rewrite the 3-layer ConvNet with bias from Part III with the           #
# Sequential API.                                                              #
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

model = nn.Sequential(
    nn.Conv2d(3,channel_1,kernel_size=5, padding=2),
    nn.ReLU(),
    nn.Conv2d(channel_1,channel_2,kernel_size=3,padding=1),
    nn.ReLU(),
    Flatten(),
    nn.Linear(channel_2 * 32 * 32, 10),
)

optimizer = optim.SGD(model.parameters(), lr=learning_rate,
                     momentum=0.9, nesterov=True)


# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

train_part34(model, optimizer)

Iteration 0, loss = 2.3067
Checking accuracy on validation set
Got 142 / 1000 correct (14.20)

Iteration 100, loss = 1.4723
Checking accuracy on validation set
Got 425 / 1000 correct (42.50)

Iteration 200, loss = 1.5183
Checking accuracy on validation set
Got 483 / 1000 correct (48.30)

Iteration 300, loss = 1.5964
Checking accuracy on validation set
Got 518 / 1000 correct (51.80)

Iteration 400, loss = 1.0263
Checking accuracy on validation set
Got 524 / 1000 correct (52.40)

Iteration 500, loss = 1.1190
Checking accuracy on validation set
Got 561 / 1000 correct (56.10)

Iteration 600, loss = 1.2294
Checking accuracy on validation set
Got 583 / 1000 correct (58.30)

Iteration 700, loss = 1.0154
Checking accuracy on validation set
Got 595 / 1000 correct (59.50)



# Part V. CIFAR-10 open-ended challenge

In this section, you can experiment with whatever ConvNet architecture you'd like on CIFAR-10.

Now it's your job to experiment with architectures, hyperparameters, loss functions, and optimizers to train a model that achieves **at least 70%** accuracy on the CIFAR-10 **validation** set within 10 epochs. You can use the check_accuracy and train functions from above. You can use either `nn.Module` or `nn.Sequential` API.

Describe what you did at the end of this notebook.

Here are the official API documentation for each component. One note: what we call in the class "spatial batch norm" is called "BatchNorm2D" in PyTorch.

* Layers in torch.nn package: http://pytorch.org/docs/stable/nn.html
* Activations: http://pytorch.org/docs/stable/nn.html#non-linear-activations
* Loss functions: http://pytorch.org/docs/stable/nn.html#loss-functions
* Optimizers: http://pytorch.org/docs/stable/optim.html


### Things you might try:
- **Filter size**: Above we used 5x5; would smaller filters be more efficient?
- **Number of filters**: Above we used 32 filters. Do more or fewer do better?
- **Pooling vs Strided Convolution**: Do you use max pooling or just stride convolutions?
- **Batch normalization**: Try adding spatial batch normalization after convolution layers and vanilla batch normalization after affine layers. Do your networks train faster?
- **Network architecture**: The network above has two layers of trainable parameters. Can you do better with a deep network? Good architectures to try include:
    - [conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax or SVM]
    - [batchnorm-relu-conv]xN -> [affine]xM -> [softmax or SVM]
- **Global Average Pooling**: Instead of flattening and then having multiple affine layers, perform convolutions until your image gets small (7x7 or so) and then perform an average pooling operation to get to a 1x1 image picture (1, 1 , Filter#), which is then reshaped into a (Filter#) vector. This is used in [Google's Inception Network](https://arxiv.org/abs/1512.00567) (See Table 1 for their architecture).
- **Regularization**: Add l2 weight regularization, or perhaps use Dropout.

### Tips for training
For each network architecture that you try, you should tune the learning rate and other hyperparameters. When doing this there are a couple important things to keep in mind:

- If the parameters are working well, you should see improvement within a few hundred iterations
- Remember the coarse-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
- Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- You should use the validation set for hyperparameter search, and save your test set for evaluating your architecture on the best parameters as selected by the validation set.

### Going above and beyond
If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are **not required** to implement any of these, but don't miss the fun if you have time!

- Alternative optimizers: you can try Adam, Adagrad, RMSprop, etc.
- Alternative activation functions such as leaky ReLU, parametric ReLU, ELU, or MaxOut.
- Model ensembles
- Data augmentation
- New Architectures
  - [ResNets](https://arxiv.org/abs/1512.03385) where the input from the previous layer is added to the output.
  - [DenseNets](https://arxiv.org/abs/1608.06993) where inputs into previous layers are concatenated together.
  - [This blog has an in-depth overview](https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

### Have fun and happy training!

# 第五部分 CIFAR-10 开放式挑战

在本节中，您可以尝试在 CIFAR-10 上使用任何您想要的 ConvNet 架构。

现在，您的工作是尝试架构、超参数、损失函数和优化器，以训练一个在 10 个时期内在 CIFAR-10 **验证**集上实现**至少 70%**准确率的模型。您可以使用上面的 check_accuracy 和 train 函数。您可以使用 `nn.Module` 或 `nn.Sequential` API。

在本笔记本的末尾描述您所做的工作。

这是每个组件的官方 API 文档。注意：我们在类中称为“空间批量规范”的东西在 PyTorch 中称为“BatchNorm2D”。

* torch.nn 包中的层：http://pytorch.org/docs/stable/nn.html
* 激活：http://pytorch.org/docs/stable/nn.html#non-linear-activations
* 损失函数：http://pytorch.org/docs/stable/nn.html#loss-functions
* 优化器：http://pytorch.org/docs/stable/optim.html

### 您可能尝试的事情：
- **过滤器大小**：上面我们使用了 5x5；较小的过滤器是否更有效？
- **过滤器数量**：上面我们使用了 32 个过滤器。数量越多还是越少效果越好？
- **池化与步进卷积**：您使用最大池化还是仅使用步进卷积？
- **批量归一化**：尝试在卷积层后添加空间批量归一化，在仿射层后添加普通批量归一化。您的网络训练速度是否更快？
- **网络架构**：上面的网络有两层可训练参数。深度网络能做得更好吗？值得尝试的好架构包括：
- [conv-relu-pool]xN -> [affine]xM -> [softmax 或 SVM]
- [conv-relu-conv-relu-pool]xN -> [affine]xM -> [softmax 或 SVM]
- [batchnorm-relu-conv]xN -> [affine]xM -> [softmax 或 SVM]
- **全局平均池化**：不是展平然后有多个仿射层，而是执行卷积直到图像变小（7x7 左右），然后执行平均池化操作以获得 1x1 图像图片 (1, 1 , Filter#)，然后将其重塑为 (Filter#) 向量。这在 [Google 的 Inception Network](https://arxiv.org/abs/1512.00567) 中使用（请参阅表 1 了解其架构）。
- **正则化**：添加 l2 权重正则化，或者使用 Dropout。

### 训练提示
对于您尝试的每种网络架构，您都应该调整学习率和其他超参数。执行此操作时，需要记住几个重要事项：

- 如果参数运行良好，您应该在几百次迭代内看到改进
- 记住超参数调整的由粗到细方法：首先测试大量超参数，仅进行几次训练迭代，以找到完全有效的参数组合。
- 找到一些似乎有效的参数集后，围绕这些参数进行更精细的搜索。您可能需要训练更多时期。
- 您应该使用验证集进行超参数搜索，并保存测试集，以便在验证集选择的最佳参数上评估您的架构。

### 超越自我
如果您喜欢冒险，您可以实现许多其他功能来尝试提高您的表现。您**不需要**实现其中任何一个，但如果您有时间，不要错过其中的乐趣！

- 替代优化器：您可以尝试 Adam、Adagrad、RMSprop 等。
- 替代激活函数，如泄漏 ReLU、参数 ReLU、ELU 或 MaxOut。
- 模型集成
- 数据增强
- 新架构
- [ResNets](https://arxiv.org/abs/1512.03385)，其中前一层的输入被添加到输出中。
- [DenseNets](https://arxiv.org/abs/1608.06993)，其中前几层的输入被连接在一起。
- [本博客有深入概述](https://chatbotslife.com/resnets-highwaynets-and-densenets-oh-my-9bb15918ee32)

### 玩得开心，训练愉快！

In [None]:
################################################################################
# TODO:                                                                        #
# Experiment with any architectures, optimizers, and hyperparameters.          #
# Achieve AT LEAST 70% accuracy on the *validation set* within 10 epochs.      #
#                                                                              #
# Note that you can use the check_accuracy function to evaluate on either      #
# the test set or the validation set, by passing either loader_test or         #
# loader_val as the second argument to check_accuracy. You should not touch    #
# the test set until you have finished your architecture and  hyperparameter   #
# tuning, and only run the test set once at the end to report a final value.   #
################################################################################
model = None
optimizer = None

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

pass

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

# You should get at least 70% accuracy.
# You may modify the number of epochs to any number below 15.
train_part34(model, optimizer, epochs=10)

## Describe what you did

In the cell below you should write an explanation of what you did, any additional features that you implemented, and/or any graphs that you made in the process of training and evaluating your network.

**Answer:**



## Test set -- run this only once

Now that we've gotten a result we're happy with, we test our final model on the test set (which you should store in best_model). Think about how this compares to your validation set accuracy.

In [None]:
best_model = model
check_accuracy_part34(loader_test, best_model)