<a href="https://colab.research.google.com/github/douzujun/Pytorch_Course/blob/master/1.two_layer_neural_net.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tensors

In [None]:
from __future__ import print_function
import torch

构造一个未初始化的 5x3 矩阵：

In [None]:
x = torch.empty(5, 3)
print(x)

tensor([[1.3400e-35, 0.0000e+00, 3.3631e-44],
        [0.0000e+00,        nan, 4.4842e-44],
        [1.1578e+27, 1.1362e+30, 7.1547e+22],
        [4.5828e+30, 1.2121e+04, 7.1846e+22],
        [9.2198e-39, 7.0374e+22, 0.0000e+00]])


构建一个随机初始化的矩阵：

In [None]:
x = torch.rand(5, 3)
print(x)

tensor([[0.0102, 0.3306, 0.1210],
        [0.6529, 0.7342, 0.5604],
        [0.7412, 0.4814, 0.9931],
        [0.8142, 0.9489, 0.5458],
        [0.7159, 0.1941, 0.2336]])


构建一个全部为0，类型为long的矩阵：

In [None]:
x = torch.zeros(5, 3, dtype=torch.long)
print(x)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])


从数据直接构建tensor：

In [None]:
x = torch.tensor([5.5, 3])
print(x)

tensor([5.5000, 3.0000])


也可以从一个已有的tensor构建一个tensor。
这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [None]:
x = x.new_ones(5, 3, dtype=torch.double)   # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float) # override dtype!
print(x, x.dtype)

print(torch.ones_like(x))

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[ 1.2668,  0.1598,  0.1749],
        [-0.0179, -0.7583,  0.1586],
        [ 0.8475, -0.3535,  0.3140],
        [ 1.8149,  0.6612, -2.3439],
        [-1.5723,  0.4698, -0.9355]]) torch.float32
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])


得到tensor的形状：

In [None]:
print(x.size())

torch.Size([5, 3])


<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



- 加法运算

In [None]:
y = torch.rand(5, 3)
print(x + y)

tensor([[ 1.3116,  1.0175,  0.4215],
        [ 0.3051, -0.2167,  0.2725],
        [ 1.0745, -0.2689,  0.7790],
        [ 2.7261,  1.1268, -2.2307],
        [-0.8309,  0.6785, -0.6331]])


In [None]:
print(torch.add(x, y))

tensor([[ 1.3116,  1.0175,  0.4215],
        [ 0.3051, -0.2167,  0.2725],
        [ 1.0745, -0.2689,  0.7790],
        [ 2.7261,  1.1268, -2.2307],
        [-0.8309,  0.6785, -0.6331]])


加法：把输出作为一个变量

In [None]:
result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

tensor([[ 1.3116,  1.0175,  0.4215],
        [ 0.3051, -0.2167,  0.2725],
        [ 1.0745, -0.2689,  0.7790],
        [ 2.7261,  1.1268, -2.2307],
        [-0.8309,  0.6785, -0.6331]])


in-place加法

In [None]:
y.add_(x)
print(y)

tensor([[ 1.3116,  1.0175,  0.4215],
        [ 0.3051, -0.2167,  0.2725],
        [ 1.0745, -0.2689,  0.7790],
        [ 2.7261,  1.1268, -2.2307],
        [-0.8309,  0.6785, -0.6331]])


<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [None]:
print(x[:, 1])

tensor([ 0.1598, -0.7583, -0.3535,  0.6612,  0.4698])


Resizing: 如果你希望resize/reshape一个tensor，可以使用torch.view：

In [None]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [None]:
x = torch.randn(1)
print(x)
print(x.item())

tensor([-0.3741])
-0.3740804195404053


# Numpy和Tensor之间的转化

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [None]:
a = torch.ones(5)
print(a)

tensor([1., 1., 1., 1., 1.])


In [None]:
b = a.numpy()
print(b, type(b))

b[1] = 999
print(a, b)

[1. 1. 1. 1. 1.] <class 'numpy.ndarray'>
tensor([  1., 999.,   1.,   1.,   1.]) [  1. 999.   1.   1.   1.]


改变numpy array里的值

In [None]:
a.add_(1)
print(a)
print(b)

a = a + 1
print(a)
print(b)

tensor([   2., 1000.,    2.,    2.,    2.])
[   2. 1000.    2.    2.    2.]
tensor([   3., 1001.,    3.,    3.,    3.])
[   2. 1000.    2.    2.    2.]


把Numpy ndarray 转成 Torch Tensor

In [None]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

a += 1   
print(a)
print(b)

a = a + 1
print(a, '\n', b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
[3. 3. 3. 3. 3.]
tensor([3., 3., 3., 3., 3.], dtype=torch.float64)
[4. 4. 4. 4. 4.] 
 tensor([3., 3., 3., 3., 3.], dtype=torch.float64)


所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

# 3. CUDA Tensors

使用``.to``方法，Tensor可以被移动到别的device上。


In [None]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
print(torch.cuda.is_available())
if torch.cuda.is_available():
    device = torch.device('cuda')      # a CUDA device object
    y = torch.ones_like(x, device=device) # directly create a tensor on GPU
    x = x.to(device)             # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to('cpu', torch.double))    # ``.to`` can also change dtype together!

True
tensor([0.6259], device='cuda:0')
tensor([0.6259], dtype=torch.float64)


In [None]:
# y.data.numpy() # 必须转换成cpu下时，才能得到numpy
y.to('cpu').data.numpy()

array([1.], dtype=float32)


# 4. 热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。




In [None]:
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    
    # loss = (y_pred - y) ** 2
    grad_y_pred = 2.0 * (y_pred - y)
    # 
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2    

0 31405759.25660486
1 28521942.114509955
2 31384182.897050723
3 34554886.164889365
4 32845865.650790207
5 24955497.597929917
6 14856300.555476323
7 7524348.902469195
8 3703306.6575033413
9 2015395.3054101765
10 1279786.051596574
11 929361.8443736541
12 735267.8564837433
13 609759.4114652772
14 518510.40573984326
15 447214.4496680094
16 389226.5034819355
17 341009.6777167224
18 300294.7381825887
19 265642.96707174287
20 235961.04965357675
21 210414.3851924779
22 188231.5977394986
23 168869.494425024
24 151935.23626429026
25 137042.73909903647
26 123897.82972305088
27 112277.03659919125
28 101960.9604297372
29 92758.29408899757
30 84528.72699886268
31 77154.379703862
32 70526.02190172988
33 64553.87708251132
34 59164.01221665183
35 54287.361962150055
36 49870.971038498734
37 45846.045312432805
38 42199.05846084689
39 38884.060154483
40 35861.10474765071
41 33100.92179304345
42 30577.59477209024
43 28268.67112259056
44 26152.367817959715
45 24210.243839771432
46 22427.388616584576
47 2078

# 5. PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。



In [None]:
import torch

dtype = torch.float
# device = torch.device('cpu')
device = torch.device('cuda:0')

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.matmul(w1)         # mm-->dot
    h_relu = h.clamp(min=0)   # 梯度裁剪，小于0的，都设置为0
    y_pred = h_relu.matmul(w2)   
    
    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().matmul(grad_y_pred)
    grad_h_relu = grad_y_pred.matmul(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().matmul(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 29533522.0
1 24030532.0
2 21642856.0
3 19319628.0
4 16096911.0
5 12222376.0
6 8550794.0
7 5673648.0
8 3694441.5
9 2434778.5
10 1658281.125
11 1180189.125
12 878865.75
13 682067.9375
14 547497.125
15 451178.375
16 379199.125
17 323381.875
18 278817.625
19 242378.0
20 212065.28125
21 186515.40625
22 164759.5
23 146125.90625
24 130034.21875
25 116055.1875
26 103868.96875
27 93195.8046875
28 83805.0625
29 75521.828125
30 68191.8671875
31 61688.2421875
32 55904.51171875
33 50751.5546875
34 46152.671875
35 42038.36328125
36 38349.65625
37 35046.34375
38 32072.8828125
39 29391.439453125
40 26970.91796875
41 24779.01953125
42 22792.193359375
43 20989.1953125
44 19350.44921875
45 17858.74609375
46 16498.70703125
47 15257.5966796875
48 14123.2890625
49 13085.330078125
50 12134.712890625
51 11263.25390625
52 10463.1015625
53 9727.9189453125
54 9051.7294921875
55 8429.21484375
56 7855.28173828125
57 7326.01318359375
58 6837.1669921875
59 6385.51171875
60 5967.73291015625
61 5580.9306640625
62 52

简单的autograd

In [None]:
# Create tensors.
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# Build a computational graph.
y = w * x + b    # y = 2 * x + 3

# Compute gradients.
y.backward(retain_graph=True)

# Print out the gradients.
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1 


tensor(2.)
tensor(1.)
tensor(1.)



# 6. PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [None]:
import torch

dtype = torch.float
# device = torch.device("cpu")
device = torch.device("cuda:0") # Uncomment this to run on GPU

# N 是 batch size; D_in 是 input dimension;
# H 是 hidden dimension; D_out 是 output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# 创建随机的Tensor来保存输入和输出
# 设定requires_grad=False表示在反向传播的时候我们不需要计算gradient
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)


# 创建随机的Tensor和权重。
# 设置requires_grad=True表示我们希望反向传播的时候计算Tensor的gradient
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # 前向传播:通过Tensor预测y；这个和普通的神经网络的前向传播没有任何不同，
    # 但是我们不需要保存网络的中间运算结果，因为我们不需要手动计算反向传播。
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # 通过前向传播计算loss
    # loss是一个形状为(1，)的Tensor
    # loss.item()可以给我们返回一个loss的scalar
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # PyTorch给我们提供了autograd的方法做反向传播。如果一个Tensor的requires_grad=True，
    # backward会自动计算loss相对于每个Tensor的gradient。在backward之后，
    # w1.grad和w2.grad会包含两个loss相对于两个Tensor的gradient信息。
    loss.backward()

    # 我们可以手动做gradient descent(后面我们会介绍自动的方法)。
    # 用torch.no_grad()包含以下statements，因为w1和w2都是requires_grad=True，
    # 但是在更新weights之后我们并不需要再做autograd。
    # 另一种方法是在weight.data和weight.grad.data上做操作，这样就不会对grad产生影响。
    # tensor.data会我们一个tensor，这个tensor和原来的tensor指向相同的内存空间，
    # 但是不会记录计算图的历史。
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()


0 35159312.0
1 31582474.0
2 29480328.0
3 25024532.0
4 18116026.0
5 11371694.0
6 6576342.5
7 3832941.0
8 2396170.0
9 1646586.875
10 1229429.75
11 974135.25
12 801709.5
13 675335.5
14 577424.8125
15 498606.1875
16 433632.625
17 379279.125
18 333300.5625
19 294115.1875
20 260506.90625
21 231568.75
22 206497.84375
23 184690.703125
24 165632.796875
25 148936.75
26 134247.03125
27 121274.53125
28 109795.046875
29 99593.4375
30 90503.1171875
31 82386.578125
32 75123.140625
33 68604.515625
34 62742.44140625
35 57466.5078125
36 52709.6953125
37 48410.921875
38 44513.23046875
39 40976.94921875
40 37763.36328125
41 34839.3359375
42 32175.39453125
43 29741.8984375
44 27517.701171875
45 25482.40234375
46 23618.46875
47 21908.14453125
48 20337.5859375
49 18894.279296875
50 17566.41015625
51 16342.896484375
52 15213.904296875
53 14172.056640625
54 13209.435546875
55 12319.3134765625
56 11495.169921875
57 10731.673828125
58 10024.1962890625
59 9367.9775390625
60 8759.1083984375
61 8193.73046875
62 766


# 7. PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。



In [None]:
import torch.nn as nn

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = nn.Sequential(
    nn.Linear(D_in, H),
    nn.ReLU(),
    nn.Linear(H, D_out),
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()
    
    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

0 738.046142578125
1 689.208740234375
2 646.4689331054688
3 608.5616455078125
4 574.579833984375
5 543.9375
6 515.7476196289062
7 489.66094970703125
8 465.3070983886719
9 442.39666748046875
10 420.83538818359375
11 400.4173583984375
12 380.75238037109375
13 361.9873046875
14 344.0550537109375
15 326.9266662597656
16 310.5025939941406
17 294.69061279296875
18 279.4354248046875
19 264.79608154296875
20 250.70297241210938
21 237.1044921875
22 224.08944702148438
23 211.6127166748047
24 199.66087341308594
25 188.24807739257812
26 177.3172149658203
27 166.91445922851562
28 157.04397583007812
29 147.66860961914062
30 138.76376342773438
31 130.30026245117188
32 122.28533172607422
33 114.69891357421875
34 107.52420043945312
35 100.7588882446289
36 94.38922119140625
37 88.40320587158203
38 82.77439880371094
39 77.48088073730469
40 72.51531982421875
41 67.8624038696289
42 63.517555236816406
43 59.45792007446289
44 55.656410217285156
45 52.09965896606445
46 48.78329849243164
47 45.681610107421875



# 8. PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [42]:
import torch.nn as nn

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

loss_fn = torch.nn.MSELoss(reduction='sum').to(device)

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())
    
    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()
    
    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

cuda:0
0 693.2886962890625
1 675.4269409179688
2 657.999267578125
3 641.0489501953125
4 624.6734619140625
5 608.700439453125
6 593.128662109375
7 577.971923828125
8 563.2371826171875
9 548.9273071289062
10 535.0888671875
11 521.7005615234375
12 508.67327880859375
13 495.9757080078125
14 483.6680908203125
15 471.6831359863281
16 460.02008056640625
17 448.66644287109375
18 437.7806396484375
19 427.1993103027344
20 416.86822509765625
21 406.7424621582031
22 396.85296630859375
23 387.1991271972656
24 377.8179931640625
25 368.7890625
26 359.98748779296875
27 351.4288330078125
28 343.0444030761719
29 334.8562927246094
30 326.8724670410156
31 319.0821533203125
32 311.50177001953125
33 304.09002685546875
34 296.81890869140625
35 289.6988525390625
36 282.77392578125
37 275.9880676269531
38 269.3233947753906
39 262.78778076171875
40 256.40264892578125
41 250.14442443847656
42 244.04185485839844
43 238.08856201171875
44 232.25778198242188
45 226.52987670898438
46 220.90704345703125
47 215.3986206


# 9. PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [2]:
import torch


class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

model.to(device)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum').to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)    

for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

cuda:0
0 679.3773193359375
1 661.974853515625
2 645.0308837890625
3 628.5623779296875
4 612.565185546875
5 596.9783935546875
6 581.80078125
7 567.0170288085938
8 552.7081298828125
9 538.79736328125
10 525.3388671875
11 512.3109741210938
12 499.68463134765625
13 487.39447021484375
14 475.55877685546875
15 464.08782958984375
16 452.9164733886719
17 442.10528564453125
18 431.6226501464844
19 421.4292297363281
20 411.5294189453125
21 401.88104248046875
22 392.5015869140625
23 383.3862609863281
24 374.4903259277344
25 365.81512451171875
26 357.363525390625
27 349.08465576171875
28 340.97943115234375
29 333.07916259765625
30 325.37042236328125
31 317.821533203125
32 310.4154357910156
33 303.1563415527344
34 296.05010986328125
35 289.1358642578125
36 282.36419677734375
37 275.7529296875
38 269.303466796875
39 262.984619140625
40 256.7935485839844
41 250.72503662109375
42 244.79364013671875
43 239.01327514648438
44 233.3563690185547
45 227.82699584960938
46 222.4119110107422
47 217.11134338378

# 10. FizzBuzz

FizzBuzz是一个简单的小游戏。游戏规则如下：从1开始往上数数，当遇到3的倍数的时候，说fizz，当遇到5的倍数，说buzz，当遇到15的倍数，就说fizzbuzz，其他情况下则正常数数。

我们可以写一个简单的小程序来决定要返回正常数值还是fizz, buzz 或者 fizzbuzz。

In [11]:
# One-hot encode the desired outputs: [number, "fizz", "buzz", "fizzbuzz"]
def fizz_buzz_encode(i):
    if   i % 15 == 0: return 3
    elif i % 5  == 0: return 2
    elif i % 3  == 0: return 1
    else:             return 0
    
def fizz_buzz_decode(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

print(fizz_buzz_decode(1, fizz_buzz_encode(1)))
print(fizz_buzz_decode(2, fizz_buzz_encode(2)))
print(fizz_buzz_decode(5, fizz_buzz_encode(5)))
print(fizz_buzz_decode(12, fizz_buzz_encode(12)))
print(fizz_buzz_decode(15, fizz_buzz_encode(15)))

1
2
buzz
fizz
fizzbuzz


我们首先定义模型的输入与输出(训练数据)

In [45]:
import numpy as np
import torch

NUM_DIGITS = 10

# 表示每个输入数组的二进制位数
def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

# print([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = torch.LongTensor([fizz_buzz_encode(i) for i in range(101, 2 ** NUM_DIGITS)])
print(trX.size(), '\n', trY.size())

torch.Size([923, 10]) 
 torch.Size([923])


In [46]:
# Define the model
NUM_HIDDEN = 100

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)

model = torch.nn.Sequential(
    torch.nn.Linear(NUM_DIGITS, NUM_HIDDEN),
    torch.nn.ReLU(),
    torch.nn.Linear(NUM_HIDDEN, 4)
)

cuda:0


- 为了让我们的模型学会FizzBuzz这个游戏，我们需要定义一个损失函数，和一个优化算法。
- 这个优化算法会不断优化（降低）损失函数，使得模型的在该任务上取得尽可能低的损失值。
- 损失值低往往表示我们的模型表现好，损失值高表示我们的模型表现差。
- 由于FizzBuzz游戏本质上是一个分类问题，我们选用Cross Entropyy Loss函数。
- 优化函数我们选用Stochastic Gradient Descent。


In [47]:
loss_fn = torch.nn.CrossEntropyLoss().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr = 0.05)

In [48]:
# Start training it
BATCH_SIZE = 128
for epoch in range(10000):
    for start in range(0, len(trX), BATCH_SIZE):
        end = start + BATCH_SIZE
        batchX = trX[start:end]
        batchY = trY[start:end]

        y_pred = model(batchX)
        loss = loss_fn(y_pred, batchY)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    loss = loss_fn(model(trX), trY).item()
    print('Epoch:', epoch, 'Loss:', loss)

[1;30;43m流式输出内容被截断，只能显示最后 5000 行内容。[0m
Epoch: 5001 Loss: 0.024533208459615707
Epoch: 5002 Loss: 0.024522418156266212
Epoch: 5003 Loss: 0.02451656013727188
Epoch: 5004 Loss: 0.0245111845433712
Epoch: 5005 Loss: 0.02448888309299946
Epoch: 5006 Loss: 0.02448340319097042
Epoch: 5007 Loss: 0.02446788363158703
Epoch: 5008 Loss: 0.024462362751364708
Epoch: 5009 Loss: 0.024452120065689087
Epoch: 5010 Loss: 0.024437688291072845
Epoch: 5011 Loss: 0.0244279857724905
Epoch: 5012 Loss: 0.024427618831396103
Epoch: 5013 Loss: 0.02440229058265686
Epoch: 5014 Loss: 0.02438896708190441
Epoch: 5015 Loss: 0.02438862808048725
Epoch: 5016 Loss: 0.02437213808298111
Epoch: 5017 Loss: 0.024360066279768944
Epoch: 5018 Loss: 0.024366235360503197
Epoch: 5019 Loss: 0.024343881756067276
Epoch: 5020 Loss: 0.024330424144864082
Epoch: 5021 Loss: 0.024326741695404053
Epoch: 5022 Loss: 0.024314161390066147
Epoch: 5023 Loss: 0.02430294267833233
Epoch: 5024 Loss: 0.024296926334500313
Epoch: 5025 Loss: 0.0242840368300676

In [49]:
# output now 
testX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(1, 101)])
model.eval()
with torch.no_grad():
    testY = model(testX)
    print(testY.size(), testY.max(1)[1].size())
predictions = zip(range(1, 101), list(testY.max(1)[1].data.tolist()))

print([fizz_buzz_decode(i, x) for (i, x) in predictions])

torch.Size([100, 4]) torch.Size([100])
['1', '2', 'fizz', '4', 'buzz', 'fizz', '7', '8', 'fizz', 'buzz', '11', 'fizz', '13', '14', 'fizzbuzz', '16', '17', 'fizz', '19', 'buzz', 'fizz', '22', '23', 'fizz', '25', '26', 'fizz', '28', '29', 'fizzbuzz', '31', '32', 'fizz', 'fizz', 'buzz', 'fizz', '37', '38', 'fizz', 'buzz', '41', 'fizz', '43', '44', 'fizzbuzz', '46', '47', 'fizz', '49', 'buzz', 'fizz', '52', '53', 'fizz', 'buzz', '56', 'fizz', '58', '59', 'fizzbuzz', '61', '62', 'fizz', '64', 'buzz', 'fizz', '67', '68', 'fizz', 'buzz', '71', 'fizz', '73', '74', 'fizzbuzz', '76', '77', 'fizz', '79', 'buzz', '81', '82', '83', 'fizz', 'buzz', '86', '87', '88', '89', 'fizzbuzz', '91', '92', 'fizz', '94', 'buzz', 'fizz', '97', '98', 'fizz', 'buzz']


In [25]:
print(np.sum(testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])))

testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])

97


array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True, False,  True,  True,  True,  True,  True,  True,
        True,  True, False,  True,  True,  True,  True,  True,  True,
        True])