# PyTorch-初步尝试
## PyTorch的两个主要核心特征：

- 一个n为张量，类似于numpy，但可以在GPU上运行
- 搭建和训练神经网络时的自动微分/求导机制

本节我们将使用全连接层的RELU网络作为运行示例。该网络将有一个单一的隐藏层，并将使用梯度下降训练，通过最小化网络输出和真正结果的欧氏距离，来拟合随机生成的数据。

## 张量
### 热身：Numpy
在介绍PyTorch之前，本章节将首先使用numpy实现网络。Numpy提供了一个n维数组对象，以及许多用于操作这些数组的函数。Numpy是用于科学计算的通用框架，他对计算图、深度学习和梯度一无所知。然而我们可以很容易地使用Numpy，手动实现网络的向前传播和反向传播，来拟合随机数据。

In [3]:
import numpy as np

# 参数初始化
N, Demonstration_in, H, Demonstration_out = 64,1000,100,10 # N是批量大小；D是输入维度；H是隐藏的维度；D_out是输出维度
epoch=500

# 创建随机数据
x = np.random.randn(N, Demonstration_in)  # 输入值
y = np.random.randn(N, Demonstration_out) # 真实值

# 随机初始化权重
w1 = np.random.randn(Demonstration_in,H)
w2 = np.random.randn(H,Demonstration_out)

# 学习速率
learning_rate = 1e-6

# 网络
for t in range(epoch):
    # 向前传播，计算预测值y
    h = x.dot(w1) # 计算x和w1的点积
    h_relu = np.maximum(h,0) # 手动实现ReLU函数，即对于0的值保留，小于则设为0
    y_pred = h_relu.dot(w2) # 将经过RELU的值与权重w1做点积得到预测值
    
    # 计算和打印损失值loss
    loss = np.square(y_pred - y).sum() # 计算预测值和真实值每个数值的差的平方和
    print('\tEpoch - {epoch},  Loss - {loss}'.format(epoch=t,loss=loss))
    
    # 反向传播
    grad_y_pred = 2.0 * (y_pred - y) # 预测值y的梯度
    grad_w2 = h_relu.T.dot(grad_y_pred) # 用预测值y的梯度来反向计算w2梯度
    grad_h_relu = grad_y_pred.dot(w2.T) # 计算隐藏层RELU的梯度
    # 隐藏层梯度时Relu梯度的
    grad_h = grad_h_relu.copy() # 深拷贝，不能直接复制，那样为浅拷贝
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h) # 计算w1的梯度
    
    # 更新权重
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2
    

	Epoch - 0,  Loss - 37247962.3752283
	Epoch - 1,  Loss - 34443984.4475831
	Epoch - 2,  Loss - 35716108.277857
	Epoch - 3,  Loss - 34080519.69265053
	Epoch - 4,  Loss - 26669500.072707333
	Epoch - 5,  Loss - 16668477.668102676
	Epoch - 6,  Loss - 8832603.82292416
	Epoch - 7,  Loss - 4529345.252222549
	Epoch - 8,  Loss - 2541331.6420669453
	Epoch - 9,  Loss - 1646803.6430601692
	Epoch - 10,  Loss - 1207298.5776943879
	Epoch - 11,  Loss - 957271.1974327056
	Epoch - 12,  Loss - 792152.2380502683
	Epoch - 13,  Loss - 670645.2997182957
	Epoch - 14,  Loss - 575266.1532108173
	Epoch - 15,  Loss - 497671.4237324648
	Epoch - 16,  Loss - 433183.32875663566
	Epoch - 17,  Loss - 378885.1420677737
	Epoch - 18,  Loss - 332904.82569283224
	Epoch - 19,  Loss - 293668.3929269826
	Epoch - 20,  Loss - 259960.3469199749
	Epoch - 21,  Loss - 230920.20818232663
	Epoch - 22,  Loss - 205724.66541521472
	Epoch - 23,  Loss - 183777.01476899337
	Epoch - 24,  Loss - 164580.37905118053
	Epoch - 25,  Loss - 147743.8

	Epoch - 209,  Loss - 2.1185277400903804
	Epoch - 210,  Loss - 2.0192526487572144
	Epoch - 211,  Loss - 1.9247260874693677
	Epoch - 212,  Loss - 1.8347518144181953
	Epoch - 213,  Loss - 1.749073533414876
	Epoch - 214,  Loss - 1.6674855022177386
	Epoch - 215,  Loss - 1.5897994665099724
	Epoch - 216,  Loss - 1.5158203473679364
	Epoch - 217,  Loss - 1.4453664555612131
	Epoch - 218,  Loss - 1.3782444666864404
	Epoch - 219,  Loss - 1.314332790700948
	Epoch - 220,  Loss - 1.2534224170324655
	Epoch - 221,  Loss - 1.1954006961035837
	Epoch - 222,  Loss - 1.1401441597180035
	Epoch - 223,  Loss - 1.0874873748019194
	Epoch - 224,  Loss - 1.0373120753076852
	Epoch - 225,  Loss - 0.989519798226548
	Epoch - 226,  Loss - 0.9439774578138593
	Epoch - 227,  Loss - 0.9005639785990838
	Epoch - 228,  Loss - 0.8592034096683363
	Epoch - 229,  Loss - 0.8197913236003551
	Epoch - 230,  Loss - 0.7822158232054243
	Epoch - 231,  Loss - 0.7463986011285435
	Epoch - 232,  Loss - 0.7122716639761804
	Epoch - 233,  Loss

	Epoch - 432,  Loss - 0.00014660370195027076
	Epoch - 433,  Loss - 0.00014096084345379767
	Epoch - 434,  Loss - 0.0001355406408474025
	Epoch - 435,  Loss - 0.0001303282220873481
	Epoch - 436,  Loss - 0.00012531807453886195
	Epoch - 437,  Loss - 0.00012050372631519896
	Epoch - 438,  Loss - 0.00011587590625934219
	Epoch - 439,  Loss - 0.0001114282062232618
	Epoch - 440,  Loss - 0.00010715267478818387
	Epoch - 441,  Loss - 0.00010304428745622169
	Epoch - 442,  Loss - 9.909510248626038e-05
	Epoch - 443,  Loss - 9.529769555464083e-05
	Epoch - 444,  Loss - 9.16482509957259e-05
	Epoch - 445,  Loss - 8.81398893574833e-05
	Epoch - 446,  Loss - 8.476661132285804e-05
	Epoch - 447,  Loss - 8.152378227453181e-05
	Epoch - 448,  Loss - 7.840701974737554e-05
	Epoch - 449,  Loss - 7.541168132131955e-05
	Epoch - 450,  Loss - 7.253078745337693e-05
	Epoch - 451,  Loss - 6.976067146370264e-05
	Epoch - 452,  Loss - 6.709806621036572e-05
	Epoch - 453,  Loss - 6.453760807292029e-05
	Epoch - 454,  Loss - 6.207

### PyTorch : 张量
Numpy是一个很棒的框架，但它不能利用GPU来加速其数值计算。对于现代深度神经网络，GPU通常提供50倍或者更高的加速，所以numpy不能满足当代深度学习的需求。  

这里，先介绍最基本的的PyTorch概念：  
张量（Tensor）：PyTorch的tensor在概念上与numpy的array相同，tensor是一个n维数组，PyTorch提供了许多函数用于操作这些张量。任何希望使用Numpy执行的计算可以使用PyTorch的tensor来完成，可以认为它是科学计算的通用工具。  
与Numpy不同，PyTorch可以利用GPU加速器数值计算。要在GPU上运行Tensor，在构造张量使用device参数把tensor建立在GPU上。  

在这里，本章使用tensors将随机数据上训练一个两层的网络。和前面Numpy的例子类似，我们使用PyTorch的tensor，手动在网络中实现向前传播和反向传播。

In [25]:
import torch

# 初始参数
dtype = torch.float
device = torch.device('cpu')
N, Demonstration_in, H, Demonstration_out = 64,1000,100,10 # N是批量大小；D是输入维度；H是隐藏的维度；D_out是输出维度
epoch=500

# 创建输入值和真实值
inputs = torch.randn(N,Demonstration_in,device=device,dtype=dtype)
references = torch.randn(N,Demonstration_out,device=device,dtype=dtype)

# 随机初始化权重
w1 = torch.randn(Demonstration_in, H, device=device,dtype=dtype)
w2 = torch.randn(H, Demonstration_out, device=device, dtype=dtype)

# 学习速率
learning_rate = 1e-6

for i in range(epoch):
    # 向前传播
    h = inputs.mm(w1) # 计算隐藏层
    h_relu = h.clamp(min=0) # 实现RELU
    y_pred = h_relu.mm(w2) # 计算预测值
    
    # 计算loss值
    loss = (y_pred - references).pow(2).sum().item()
    print('\tEpoch - {epoch},  Loss - {loss}'.format(epoch=i,loss=loss))
    
    # 执行反向传播
    grad_y_pred = 2.0 * (y_pred - references)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = inputs.t().mm(grad_h)
    
    # 更新权重
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2
    

	Epoch - 0,  Loss - 36955752.0
	Epoch - 1,  Loss - 35229976.0
	Epoch - 2,  Loss - 35853756.0
	Epoch - 3,  Loss - 32742208.0
	Epoch - 4,  Loss - 24270582.0
	Epoch - 5,  Loss - 14573081.0
	Epoch - 6,  Loss - 7632836.0
	Epoch - 7,  Loss - 3993082.0
	Epoch - 8,  Loss - 2308762.75
	Epoch - 9,  Loss - 1527953.75
	Epoch - 10,  Loss - 1128372.0
	Epoch - 11,  Loss - 893340.9375
	Epoch - 12,  Loss - 735547.375
	Epoch - 13,  Loss - 619209.375
	Epoch - 14,  Loss - 528266.125
	Epoch - 15,  Loss - 454800.09375
	Epoch - 16,  Loss - 394198.90625
	Epoch - 17,  Loss - 343599.625
	Epoch - 18,  Loss - 300933.71875
	Epoch - 19,  Loss - 264745.65625
	Epoch - 20,  Loss - 233895.609375
	Epoch - 21,  Loss - 207426.40625
	Epoch - 22,  Loss - 184593.0625
	Epoch - 23,  Loss - 164772.28125
	Epoch - 24,  Loss - 147522.59375
	Epoch - 25,  Loss - 132429.96875
	Epoch - 26,  Loss - 119199.390625
	Epoch - 27,  Loss - 107556.0
	Epoch - 28,  Loss - 97262.6796875
	Epoch - 29,  Loss - 88144.7421875
	Epoch - 30,  Loss - 8004

	Epoch - 392,  Loss - 0.008561021648347378
	Epoch - 393,  Loss - 0.008264865726232529
	Epoch - 394,  Loss - 0.007982701994478703
	Epoch - 395,  Loss - 0.007706702221184969
	Epoch - 396,  Loss - 0.007431029807776213
	Epoch - 397,  Loss - 0.007183205336332321
	Epoch - 398,  Loss - 0.006938591133803129
	Epoch - 399,  Loss - 0.006702335551381111
	Epoch - 400,  Loss - 0.00647936575114727
	Epoch - 401,  Loss - 0.0062673017382621765
	Epoch - 402,  Loss - 0.006053084507584572
	Epoch - 403,  Loss - 0.005847836844623089
	Epoch - 404,  Loss - 0.005652866326272488
	Epoch - 405,  Loss - 0.005460713990032673
	Epoch - 406,  Loss - 0.0052825286984443665
	Epoch - 407,  Loss - 0.005106652621179819
	Epoch - 408,  Loss - 0.004939109552651644
	Epoch - 409,  Loss - 0.004776395857334137
	Epoch - 410,  Loss - 0.004614084959030151
	Epoch - 411,  Loss - 0.004464263562113047
	Epoch - 412,  Loss - 0.004317492246627808
	Epoch - 413,  Loss - 0.0041741859167814255
	Epoch - 414,  Loss - 0.004036311991512775
	Epoch - 

这里我们给出上述模型的结构图：

![figure.1](https://gitee.com/zyp521/upload_image/raw/master/HRijHA.png)

## 自动求导
### 张量和自动求导
在上面例子中，需要手动实现神经网络的向前和向后传播。手动实现反向传播对于小型双层网络来说并不是什么大问题，但是对于大型复杂网络来说很快会变得非常繁琐。  
但是可以使用自动微分来自动计算的网络中的反向传播。PyTorch中的**autograd**包提供了这个功能。当使用autograd时，网络向前传播将定一个计算图；图中的节点是tensor，边是函数，这些函数是输出tensor到输入tensor的映射。这样计算图使得在网络中反向传播时梯度的计算十分简单。  

听起来很复杂，但是在实践中使用起来非常简单，如果我们想计算某些的tensor的梯度，我们只需要在建立这个tensor时加入这么一句：**requires_grad=True**。这个tensor上的任何PyTorch的操作都将构造一个计算图，从而允许我们稍微在图中执行反向传播。如果这个tensor x的requires_grad=True，那么反向传播之后x.grad会变成另一个张量，其为x关于某个标量值的梯度。  
有时希望防止PyTorch在requires_grad=True的张量执行某些操作时构建计算图；例如，在训练神经网络时，我们通常不希望通过权重更新步骤进行反向传播。在这种情况下，我们可以使用**torch.no_grad()** 上下文管理器来防止构造计算图。  

下面我们使用PyTorch的Tensor和autograd来实现我们的两层神经网络，不需要再手动进行反向传播了。

In [27]:
import torch

# 参数
dtype = torch.float
device = torch.device('cpu')
N, Demonstration_in, H, Demonstration_out = 64,1000,100,10 # N是批量大小；D是输入维度；H是隐藏的维度；D_out是输出维度
epoch=500

# 输入值和真实值 设置梯度track
inputs = torch.randn(N,Demonstration_in,device=device,dtype=dtype,requires_grad=True)
references = torch.randn(N,Demonstration_out,device=device,dtype=dtype,requires_grad=True)

# 创建权重 设置梯度track
w1 = torch.randn(Demonstration_in,H,device=device,dtype=dtype,requires_grad=True)
w2 = torch.randn(H,Demonstration_out,device=device,dtype=dtype,requires_grad=True)

# 学习速率
learning_rate = 1e-6

for i in range(epoch):
    # 向前传播
    predictions = inputs.mm(w1).clamp(min=0).mm(w2)
    
    # 计算Loss值
    loss = (predictions - references).pow(2).sum()
    print('\tEpoch - {epoch},  Loss - {loss}'.format(epoch=i,loss=loss.item()))
    
    # 使用autograd来执行反向传播计算梯度
    loss.backward()
    
    # 根据梯度更新参数权重，对于更新参数不构建计算图
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        
        # 将w1和w2的梯度手动设为0
        w1.grad.zero_()
        w2.grad.zero_()
    

	Epoch - 0,  Loss - 25504136.0
	Epoch - 1,  Loss - 18848860.0
	Epoch - 2,  Loss - 17229986.0
	Epoch - 3,  Loss - 17688800.0
	Epoch - 4,  Loss - 18449136.0
	Epoch - 5,  Loss - 18271824.0
	Epoch - 6,  Loss - 16336267.0
	Epoch - 7,  Loss - 12994130.0
	Epoch - 8,  Loss - 9213174.0
	Epoch - 9,  Loss - 6016709.0
	Epoch - 10,  Loss - 3759897.5
	Epoch - 11,  Loss - 2344065.25
	Epoch - 12,  Loss - 1505023.75
	Epoch - 13,  Loss - 1016381.25
	Epoch - 14,  Loss - 727866.375
	Epoch - 15,  Loss - 551400.75
	Epoch - 16,  Loss - 437852.90625
	Epoch - 17,  Loss - 360316.125
	Epoch - 18,  Loss - 304226.5
	Epoch - 19,  Loss - 261553.703125
	Epoch - 20,  Loss - 227690.8125
	Epoch - 21,  Loss - 199974.4375
	Epoch - 22,  Loss - 176756.78125
	Epoch - 23,  Loss - 157001.53125
	Epoch - 24,  Loss - 139991.09375
	Epoch - 25,  Loss - 125224.0859375
	Epoch - 26,  Loss - 112331.078125
	Epoch - 27,  Loss - 101001.0390625
	Epoch - 28,  Loss - 91010.9375
	Epoch - 29,  Loss - 82167.859375
	Epoch - 30,  Loss - 74313.132

	Epoch - 288,  Loss - 0.026987584307789803
	Epoch - 289,  Loss - 0.025817953050136566
	Epoch - 290,  Loss - 0.024702517315745354
	Epoch - 291,  Loss - 0.023635845631361008
	Epoch - 292,  Loss - 0.022628694772720337
	Epoch - 293,  Loss - 0.02164769172668457
	Epoch - 294,  Loss - 0.02073054388165474
	Epoch - 295,  Loss - 0.019856970757246017
	Epoch - 296,  Loss - 0.019020583480596542
	Epoch - 297,  Loss - 0.018207013607025146
	Epoch - 298,  Loss - 0.017437532544136047
	Epoch - 299,  Loss - 0.01668686419725418
	Epoch - 300,  Loss - 0.01598960906267166
	Epoch - 301,  Loss - 0.01530672051012516
	Epoch - 302,  Loss - 0.014655634760856628
	Epoch - 303,  Loss - 0.014035172760486603
	Epoch - 304,  Loss - 0.013446938246488571
	Epoch - 305,  Loss - 0.012879228219389915
	Epoch - 306,  Loss - 0.012348595075309277
	Epoch - 307,  Loss - 0.011828778311610222
	Epoch - 308,  Loss - 0.011333156377077103
	Epoch - 309,  Loss - 0.010856034234166145
	Epoch - 310,  Loss - 0.010408751666545868
	Epoch - 311,  L

### 定义新的自动求导函数
在底层，每一个原始的自动求导运算实际上是两个在Tensor上运行的函数。其中，forward函数计算从输入Tensors获得的输出Tensors。而backward函数接收输出Tensors对于某个标量值的梯度，并且计算输入Tensors相对于该相同标量值的梯度。  

在PyTorch中，我们可以很容易地通过定义torch.autograd.Function 的子类并实现forward和backward函数，来定自己的自动求导运算。之后我们就可以使用这个新的自动梯度运算符了。然后我们可以通过构造一个实例并像调用函数一样，传入包含输入数据的tensor调用它，这样来使用新的自动求导运算。

这个例子，我们自定义一个自动求导函数来展示RELU的非线性。并用它实现我们的两层网络。

In [30]:
import torch

class MyReLU(torch.autograd.Function):
    """
    这个类用来重新定义forward和backward函数
    """
    @staticmethod
    def forward(ctx,x):
        """
        在正向传播中，我们接收到一个上下文对象和一个包含输入的张量；
        我们必须返回一个包含输出的张量，
        并且我们可以使用上下文对象来缓存对象，以便在反向传播中使用。（计算图）
        """
        ctx.save_for_backward(x)
        x = x.clamp(min=0) # 执行RELU
        return x
    
    @staticmethod
    def backward(ctx,grad_output):
        """
        在反向传播中，我们接收到上下文对象和一个张量
        其包含了相对于正向传播过程中产生的输出的损失的梯度。
        我们可以从上下文对象中检索缓存的数据
        并且必须计算并返回与正向传播的输入相关的损失的梯度。
        """
        x, = ctx.saved_tensors
        grad_x = grad_output.clone()
        grad_x[x < 0] = 0
        return grad_x

# 参数
dtype = torch.float
device = torch.device('cpu')
N, Demonstration_in, H, Demonstration_out = 64,1000,100,10 # N是批量大小；D是输入维度；H是隐藏的维度；D_out是输出维度
epoch=500

# 输入值和真实值 设置梯度track
inputs = torch.randn(N,Demonstration_in,device=device,dtype=dtype,requires_grad=True)
references = torch.randn(N,Demonstration_out,device=device,dtype=dtype,requires_grad=True)

# 创建权重 设置梯度track
w1 = torch.randn(Demonstration_in,H,device=device,dtype=dtype,requires_grad=True)
w2 = torch.randn(H,Demonstration_out,device=device,dtype=dtype,requires_grad=True)

# 学习速率
learning_rate = 1e-6

for i in range(epoch):
    # 向前传播
    predictions = MyReLU.apply(inputs.mm(w1)).mm(w2)
    
    # 计算Loss值
    loss = (predictions - references).pow(2).sum()
    print('\tEpoch - {epoch},  Loss - {loss}'.format(epoch=i,loss=loss.item()))
    
    # 使用autograd来执行反向传播计算梯度
    loss.backward()
    
    # 根据梯度更新参数权重，对于更新参数不构建计算图
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        
        # 将w1和w2的梯度手动设为0
        w1.grad.zero_()
        w2.grad.zero_()


	Epoch - 0,  Loss - 30183940.0
	Epoch - 1,  Loss - 23392636.0
	Epoch - 2,  Loss - 19433172.0
	Epoch - 3,  Loss - 16006332.0
	Epoch - 4,  Loss - 12473204.0
	Epoch - 5,  Loss - 9204530.0
	Epoch - 6,  Loss - 6466317.0
	Epoch - 7,  Loss - 4456253.0
	Epoch - 8,  Loss - 3066147.0
	Epoch - 9,  Loss - 2154341.25
	Epoch - 10,  Loss - 1557703.0
	Epoch - 11,  Loss - 1165617.0
	Epoch - 12,  Loss - 900680.8125
	Epoch - 13,  Loss - 716183.75
	Epoch - 14,  Loss - 582835.1875
	Epoch - 15,  Loss - 483109.09375
	Epoch - 16,  Loss - 406184.6875
	Epoch - 17,  Loss - 345326.875
	Epoch - 18,  Loss - 296161.59375
	Epoch - 19,  Loss - 255765.53125
	Epoch - 20,  Loss - 222153.6875
	Epoch - 21,  Loss - 193886.796875
	Epoch - 22,  Loss - 169903.984375
	Epoch - 23,  Loss - 149418.890625
	Epoch - 24,  Loss - 131835.015625
	Epoch - 25,  Loss - 116654.4453125
	Epoch - 26,  Loss - 103493.6953125
	Epoch - 27,  Loss - 92035.5625
	Epoch - 28,  Loss - 82029.1015625
	Epoch - 29,  Loss - 73256.28125
	Epoch - 30,  Loss - 65

	Epoch - 326,  Loss - 0.00031083280919119716
	Epoch - 327,  Loss - 0.0003002349694725126
	Epoch - 328,  Loss - 0.0002902340784203261
	Epoch - 329,  Loss - 0.00028095737798139453
	Epoch - 330,  Loss - 0.0002708912652451545
	Epoch - 331,  Loss - 0.0002626426285132766
	Epoch - 332,  Loss - 0.0002545126772020012
	Epoch - 333,  Loss - 0.00024656805908307433
	Epoch - 334,  Loss - 0.00023876488558016717
	Epoch - 335,  Loss - 0.0002315578458365053
	Epoch - 336,  Loss - 0.00022439402528107166
	Epoch - 337,  Loss - 0.00021769040904473513
	Epoch - 338,  Loss - 0.0002106442698277533
	Epoch - 339,  Loss - 0.00020363775547593832
	Epoch - 340,  Loss - 0.00019772339146584272
	Epoch - 341,  Loss - 0.00019182436517439783
	Epoch - 342,  Loss - 0.00018627347890287638
	Epoch - 343,  Loss - 0.00018060249567497522
	Epoch - 344,  Loss - 0.00017578696133568883
	Epoch - 345,  Loss - 0.00017076786025427282
	Epoch - 346,  Loss - 0.00016563999815844
	Epoch - 347,  Loss - 0.00016096371109597385
	Epoch - 348,  Loss 

### 静态图
PyTorch自动求导看起来非常想Tensorflow：这两个框架中，我们都定义计算图，使用自动微分来计算梯度。两者最大的不同就是TensorFlow的计算图是静态的，而PyTorch使用的动态的计算图。  

静态图的好处在于你可以预先对图进行优化，例如：一个框架可能要融合一些图的运算来提升效率，或者产生一个策略来将图分布到多个GPU或机器上。如果重复使用相同的图，那么在重复运行同一个图时，前期潜在的代价高昂的预优化的小号会被分摊开。  
静态图和动态图的一个区别是控制流。对于一些模型，我们希望对每个数据点执行不同的计算。例如：一个递归神经网络可能对于每个数据点执行不同的时间步数，这个展开(unrolling)可以作为一个循环来实现。  
对于一个静态图没循环结构要作为图的一部分。因此，tensorflow提供了运算符（例如tf.scan）来吧循环嵌入到图当中。对于动态图来说，情况更加简单，既然我们为每个例子即时创建图，我们可以使用普通的命令式控制流来为每个输入执行不同的计算。  

## nn模块
### nn
计算图和autograd是十分强大的工具，可以定义复杂的操作并自动求导；然而对于大规模的网络，autograd太过于底层。在构建神经网络时，我们将常考虑将计算安排成层，其中一些具有可学习的参数，它们将在学习过程中进行优化。  
TensorFlow里，有类似Keras，TensorFlow-Slim和TFLearn这种封装了体层计算图的高度抽象的接口，这使得构建网络十分方便。  

在PyTorch中，包nn完成了相同的功能，nn包中定义一组大致等价于层的模块。一个模块接收输入的tensor，计算输出的tensor，而且还保存了一些内部状态，比如需要学习的tensor的参数等。nn包中也定义了一组损失函数(loss function)，用来训练神经网络。

In [48]:
import torch
import torch.nn as nn
# 参数
dtype = torch.float
device = torch.device('cpu')
N, Demonstration_in, H, Demonstration_out = 64,1000,100,10 # N是批量大小；D是输入维度；H是隐藏的维度；D_out是输出维度
epoch=500

# 创建输入张量Tensor
inputs = torch.randn(N,Demonstration_in)
references = torch.randn(N,Demonstration_out)


# 使用nn包将我们的模型定义为一系列的层。
# nn.Sequential 是包含其他模型的模块，并按顺序应用这些模块来产生输出
# 每个线性模块使用线性函数从输入计算输出，并保存其内部的权重和偏差张量。
# 在构造模型之后，我们使用.to()方法将其移动到所需的设备
model = nn.Sequential(
    nn.Linear(Demonstration_in,H),
    nn.ReLU(),
    nn.Linear(H,Demonstration_out)
)

# 使用nn包自带的损失函数，这里我们使用的平均平方误差（MSE）作为我们的损失函数
# reduction='sum'，表示我们计算的平方误差的‘和’，而不是平均值
# 这是为了与前面我们手工计算损失的例子保持一致
# 事实上，将reduction='elementwise_mean'来使用均方误差作为损失更加常见
loss_fn = nn.MSELoss(reduction='sum')

# 学习速率
learning_rate = 1e-6
for i in range(epoch):
    # 向前传播
    predictions = model(inputs)
    
    # 计算损失值
    loss = loss_fn(predictions,references)
    print('\tEpoch - {epoch},  Loss - {loss}'.format(epoch=i,loss=loss))
    
    # 反向传播前将模型梯度设为0
    model.zero_grad()
    
    # 执行反向传播更新梯度
    loss.backward()
    
    # 更新权重
    with torch.no_grad():
        for parameter in model.parameters(): # 访问所有的参数权重
            parameter -= learning_rate * parameter.grad
    

	Epoch - 0,  Loss - 713.0663452148438
	Epoch - 1,  Loss - 712.5707397460938
	Epoch - 2,  Loss - 712.0755615234375
	Epoch - 3,  Loss - 711.5809326171875
	Epoch - 4,  Loss - 711.0867919921875
	Epoch - 5,  Loss - 710.593017578125
	Epoch - 6,  Loss - 710.100341796875
	Epoch - 7,  Loss - 709.60888671875
	Epoch - 8,  Loss - 709.1178588867188
	Epoch - 9,  Loss - 708.6273193359375
	Epoch - 10,  Loss - 708.137451171875
	Epoch - 11,  Loss - 707.64794921875
	Epoch - 12,  Loss - 707.1589965820312
	Epoch - 13,  Loss - 706.6707153320312
	Epoch - 14,  Loss - 706.1834106445312
	Epoch - 15,  Loss - 705.6967163085938
	Epoch - 16,  Loss - 705.2103881835938
	Epoch - 17,  Loss - 704.724609375
	Epoch - 18,  Loss - 704.2394409179688
	Epoch - 19,  Loss - 703.7547607421875
	Epoch - 20,  Loss - 703.2705078125
	Epoch - 21,  Loss - 702.7869262695312
	Epoch - 22,  Loss - 702.3038330078125
	Epoch - 23,  Loss - 701.8214721679688
	Epoch - 24,  Loss - 701.3397216796875
	Epoch - 25,  Loss - 700.8584594726562
	Epoch - 2

	Epoch - 299,  Loss - 588.7393798828125
	Epoch - 300,  Loss - 588.3899536132812
	Epoch - 301,  Loss - 588.0407104492188
	Epoch - 302,  Loss - 587.6917114257812
	Epoch - 303,  Loss - 587.34326171875
	Epoch - 304,  Loss - 586.994873046875
	Epoch - 305,  Loss - 586.6466064453125
	Epoch - 306,  Loss - 586.2991333007812
	Epoch - 307,  Loss - 585.9521484375
	Epoch - 308,  Loss - 585.6051635742188
	Epoch - 309,  Loss - 585.2584228515625
	Epoch - 310,  Loss - 584.9119873046875
	Epoch - 311,  Loss - 584.56591796875
	Epoch - 312,  Loss - 584.2201538085938
	Epoch - 313,  Loss - 583.87451171875
	Epoch - 314,  Loss - 583.529296875
	Epoch - 315,  Loss - 583.1842651367188
	Epoch - 316,  Loss - 582.8394775390625
	Epoch - 317,  Loss - 582.4950561523438
	Epoch - 318,  Loss - 582.1510620117188
	Epoch - 319,  Loss - 581.80712890625
	Epoch - 320,  Loss - 581.4635009765625
	Epoch - 321,  Loss - 581.1202392578125
	Epoch - 322,  Loss - 580.777099609375
	Epoch - 323,  Loss - 580.4341430664062
	Epoch - 324,  Lo

### optim
目前为止，我们已经通过手动改变包含可学习参数的张量来更新模型的权重，对于随机梯度下降(SGD/stochastic gradient descent)等简单的优化算法来说，并不是一个很大的负担，但在实践中，我们经常使用AdaGrad、RMSProp、Adam等更复杂的优化器来训练神经网络。

In [56]:
import torch
import torch.nn as nn

# 参数
dtype = torch.float
device = torch.device('cpu')
N, Demonstration_in, H, Demonstration_out = 64,1000,100,10 # N是批量大小；D是输入维度；H是隐藏的维度；D_out是输出维度
epoch=500

# 创建输入张量Tensor
inputs = torch.randn(N,Demonstration_in)
references = torch.randn(N,Demonstration_out)


# 使用nn包将我们的模型定义为一系列的层。
# nn.Sequential 是包含其他模型的模块，并按顺序应用这些模块来产生输出
# 每个线性模块使用线性函数从输入计算输出，并保存其内部的权重和偏差张量。
# 在构造模型之后，我们使用.to()方法将其移动到所需的设备
model = nn.Sequential(
    nn.Linear(Demonstration_in,H),
    nn.ReLU(),
    nn.Linear(H,Demonstration_out)
)

# 使用nn包自带的损失函数，这里我们使用的平均平方误差（MSE）作为我们的损失函数
# reduction='sum'，表示我们计算的平方误差的‘和’，而不是平均值
# 这是为了与前面我们手工计算损失的例子保持一致
# 事实上，将reduction='elementwise_mean'来使用均方误差作为损失更加常见
loss_fn = nn.MSELoss(reduction='sum')

# 学习速率
learning_rate = 1e-6

# Adam优化器
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)

for i in range(epoch):
    # 向前传播
    predictions = model(inputs)
    
    # 计算损失值
    loss1 = loss_fn(predictions,references)
    print('\tEpoch - {epoch},  Loss - {loss}'.format(epoch=i,loss=loss1))
    
    # 反向传播前将模型梯度设为0
    model.zero_grad()
    
    # 执行反向传播更新梯度
    loss1.backward()
    
    # 更新权重
    optimizer.step()

	Epoch - 0,  Loss - 690.0938110351562
	Epoch - 1,  Loss - 689.9185180664062
	Epoch - 2,  Loss - 689.7432250976562
	Epoch - 3,  Loss - 689.5679321289062
	Epoch - 4,  Loss - 689.3927001953125
	Epoch - 5,  Loss - 689.2174682617188
	Epoch - 6,  Loss - 689.0422973632812
	Epoch - 7,  Loss - 688.8671264648438
	Epoch - 8,  Loss - 688.6919555664062
	Epoch - 9,  Loss - 688.516845703125
	Epoch - 10,  Loss - 688.341796875
	Epoch - 11,  Loss - 688.1669311523438
	Epoch - 12,  Loss - 687.9920654296875
	Epoch - 13,  Loss - 687.8173217773438
	Epoch - 14,  Loss - 687.6427001953125
	Epoch - 15,  Loss - 687.4681396484375
	Epoch - 16,  Loss - 687.2935791015625
	Epoch - 17,  Loss - 687.1190185546875
	Epoch - 18,  Loss - 686.944580078125
	Epoch - 19,  Loss - 686.7700805664062
	Epoch - 20,  Loss - 686.595703125
	Epoch - 21,  Loss - 686.4213256835938
	Epoch - 22,  Loss - 686.2471313476562
	Epoch - 23,  Loss - 686.0728759765625
	Epoch - 24,  Loss - 685.8987426757812
	Epoch - 25,  Loss - 685.724609375
	Epoch - 2

	Epoch - 211,  Loss - 654.2451171875
	Epoch - 212,  Loss - 654.0809326171875
	Epoch - 213,  Loss - 653.916748046875
	Epoch - 214,  Loss - 653.7526245117188
	Epoch - 215,  Loss - 653.588623046875
	Epoch - 216,  Loss - 653.4246215820312
	Epoch - 217,  Loss - 653.2605590820312
	Epoch - 218,  Loss - 653.0964965820312
	Epoch - 219,  Loss - 652.9326171875
	Epoch - 220,  Loss - 652.768798828125
	Epoch - 221,  Loss - 652.6050415039062
	Epoch - 222,  Loss - 652.441162109375
	Epoch - 223,  Loss - 652.2774047851562
	Epoch - 224,  Loss - 652.11376953125
	Epoch - 225,  Loss - 651.9500732421875
	Epoch - 226,  Loss - 651.7864990234375
	Epoch - 227,  Loss - 651.6231079101562
	Epoch - 228,  Loss - 651.459716796875
	Epoch - 229,  Loss - 651.2962646484375
	Epoch - 230,  Loss - 651.1329345703125
	Epoch - 231,  Loss - 650.9696655273438
	Epoch - 232,  Loss - 650.806640625
	Epoch - 233,  Loss - 650.6435546875
	Epoch - 234,  Loss - 650.4805908203125
	Epoch - 235,  Loss - 650.317626953125
	Epoch - 236,  Loss -

	Epoch - 479,  Loss - 612.0188598632812
	Epoch - 480,  Loss - 611.8672485351562
	Epoch - 481,  Loss - 611.7156372070312
	Epoch - 482,  Loss - 611.5640869140625
	Epoch - 483,  Loss - 611.41259765625
	Epoch - 484,  Loss - 611.2611083984375
	Epoch - 485,  Loss - 611.109619140625
	Epoch - 486,  Loss - 610.958251953125
	Epoch - 487,  Loss - 610.8069458007812
	Epoch - 488,  Loss - 610.65576171875
	Epoch - 489,  Loss - 610.5045166015625
	Epoch - 490,  Loss - 610.3533325195312
	Epoch - 491,  Loss - 610.2022705078125
	Epoch - 492,  Loss - 610.0512084960938
	Epoch - 493,  Loss - 609.900146484375
	Epoch - 494,  Loss - 609.7491455078125
	Epoch - 495,  Loss - 609.5982055664062
	Epoch - 496,  Loss - 609.4471435546875
	Epoch - 497,  Loss - 609.2962036132812
	Epoch - 498,  Loss - 609.1454467773438
	Epoch - 499,  Loss - 608.9946899414062


### 自定义nn模块
有时候需要制定比现有模块序列更加复杂的模型；对于这些情况来说，可以通过继承nn.Module 并定义forward()函数，这个forward函数可以使用其他模块或者其他的自动求导运算来接收tensor，产生输出tensor。

In [67]:
import torch
import torch.nn as nn

class TwoLayerNetwork(nn.Module):
    def __init__(self,Demonstration_in,H,Demonstration_out):
        super(TwoLayerNetwork,self).__init__()
        
        self.linear1 = nn.Linear(Demonstration_in,H)
        self.linear2 = nn.Linear(H,Demonstration_out)
        self.relu = nn.ReLU()
        
    def forward(self,inputs):
        output = self.linear1(inputs)
        output = self.relu(output)
        output = self.linear2(output)
        
        return output


model = TwoLayerNetwork(Demonstration_in,H,Demonstration_out)


# 参数
dtype = torch.float
device = torch.device('cpu')
N, Demonstration_in, H, Demonstration_out = 64,1000,100,10 # N是批量大小；D是输入维度；H是隐藏的维度；D_out是输出维度
epoch=500

# 创建输入张量Tensor
inputs = torch.randn(N,Demonstration_in)
references = torch.randn(N,Demonstration_out)  

# 使用nn包自带的损失函数，这里我们使用的平均平方误差（MSE）作为我们的损失函数
# reduction='sum'，表示我们计算的平方误差的‘和’，而不是平均值
# 这是为了与前面我们手工计算损失的例子保持一致
# 事实上，将reduction='elementwise_mean'来使用均方误差作为损失更加常见
loss_fn = nn.MSELoss(reduction='sum')

# 学习速率
learning_rate = 1e-4

# SGD优化器
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)

for i in range(epoch):
    # 向前传播
    predictions = model(inputs)
    
    # 计算损失值
    loss = loss_fn(predictions,references)
    print('\tEpoch - {epoch},  Loss - {loss}'.format(epoch=i,loss=loss))
    
    # 反向传播前将模型梯度设为0
    model.zero_grad()
    
    # 执行反向传播更新梯度
    loss.backward()
    
    # 更新权重
    optimizer.step()

	Epoch - 0,  Loss - 717.4566040039062
	Epoch - 1,  Loss - 663.7501220703125
	Epoch - 2,  Loss - 617.1986694335938
	Epoch - 3,  Loss - 576.443115234375
	Epoch - 4,  Loss - 539.9593505859375
	Epoch - 5,  Loss - 506.78289794921875
	Epoch - 6,  Loss - 476.5166015625
	Epoch - 7,  Loss - 448.79095458984375
	Epoch - 8,  Loss - 423.1124572753906
	Epoch - 9,  Loss - 399.320556640625
	Epoch - 10,  Loss - 377.1463928222656
	Epoch - 11,  Loss - 356.3822326660156
	Epoch - 12,  Loss - 336.8816833496094
	Epoch - 13,  Loss - 318.3978576660156
	Epoch - 14,  Loss - 300.8748779296875
	Epoch - 15,  Loss - 284.1162414550781
	Epoch - 16,  Loss - 268.07269287109375
	Epoch - 17,  Loss - 252.83917236328125
	Epoch - 18,  Loss - 238.3040771484375
	Epoch - 19,  Loss - 224.459716796875
	Epoch - 20,  Loss - 211.33319091796875
	Epoch - 21,  Loss - 198.8175506591797
	Epoch - 22,  Loss - 186.9100799560547
	Epoch - 23,  Loss - 175.61279296875
	Epoch - 24,  Loss - 164.87991333007812
	Epoch - 25,  Loss - 154.678237915039

	Epoch - 306,  Loss - 0.0017488718731328845
	Epoch - 307,  Loss - 0.0017044387059286237
	Epoch - 308,  Loss - 0.0016611891333013773
	Epoch - 309,  Loss - 0.00161907693836838
	Epoch - 310,  Loss - 0.001578051713295281
	Epoch - 311,  Loss - 0.0015381431439891458
	Epoch - 312,  Loss - 0.0014992646174505353
	Epoch - 313,  Loss - 0.0014614106621593237
	Epoch - 314,  Loss - 0.001424551010131836
	Epoch - 315,  Loss - 0.001388642005622387
	Epoch - 316,  Loss - 0.0013536781771108508
	Epoch - 317,  Loss - 0.001319642411544919
	Epoch - 318,  Loss - 0.0012864894233644009
	Epoch - 319,  Loss - 0.0012541607720777392
	Epoch - 320,  Loss - 0.0012227148981764913
	Epoch - 321,  Loss - 0.0011920877732336521
	Epoch - 322,  Loss - 0.0011622603051364422
	Epoch - 323,  Loss - 0.0011331753339618444
	Epoch - 324,  Loss - 0.001104863709770143
	Epoch - 325,  Loss - 0.0010772838722914457
	Epoch - 326,  Loss - 0.0010503873927518725
	Epoch - 327,  Loss - 0.0010242071002721786
	Epoch - 328,  Loss - 0.000998698640614

### 控制流和权重共享
作为动态图和权重共享的例子，我们实现了一个非常奇怪的模型：一个全连接的RELU网络，每一次向前传播时，它的隐藏层的层数为随机1到4之间的数，这样可以多次重用相同的权重来计算。  

因为这个模型可以使用普通的PyThon控制流来实现循环，并且我们可以通过再定转发时多次重用同一个模块来实现最内层之间权重共享。  
我们利用Module的子类来实现该模型。

In [66]:
import random
import torch
import torch.nn as nn

class DynameicNetwork(nn.Module):
    def __init__(self,Demonstration_in,H,Demonstration_out):
        super(DynameicNetwork,self).__init__()
        
        self.linear1 = nn.Linear(Demonstration_in,H)
        self.relu = nn.ReLU()
        self.middle_linear = nn.Linear(H, H)
        self.linear2 = nn.Linear(H, Demonstration_out)
        
    def forward(self,x):
        """
        对于模型的向前传播，我们随机选择0、1、2、3
        并重用了多次计算隐藏层的middle_linear模块
        由于每个向前传播构建一个动态计算图，
        我们可以在定义模型的向前传播时使用常规Python控制流运算符，如循环或条件语句
        在这里，我们还看到，在定义计算图形时多次重用同一个模块是完全安全的。
        """
        output = self.linear1(x)
        output = self.relu(output)
        for _ in range(random.randint(0,3)): # 随机产生一个位于0-3之间的数x将上一层的输出经过x次中间层和RELU
            output = self.middle_linear(output)
            output = self.relu(output)
        output = self.linear2(output)
        
        return output

model = DynameicNetwork(Demonstration_in,H,Demonstration_out)


# 参数
dtype = torch.float
device = torch.device('cpu')
N, Demonstration_in, H, Demonstration_out = 64,1000,100,10 # N是批量大小；D是输入维度；H是隐藏的维度；D_out是输出维度
epoch=500

# 创建输入张量Tensor
inputs = torch.randn(N,Demonstration_in)
references = torch.randn(N,Demonstration_out)  

# 使用nn包自带的损失函数，这里我们使用的平均平方误差（MSE）作为我们的损失函数
# reduction='sum'，表示我们计算的平方误差的‘和’，而不是平均值
# 这是为了与前面我们手工计算损失的例子保持一致
# 事实上，将reduction='elementwise_mean'来使用均方误差作为损失更加常见
loss_fn = nn.MSELoss(reduction='sum')

# 学习速率
learning_rate = 1e-4

# SGD优化器  
# 使用momentum动量加速
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate,momentum=0.9)

for i in range(epoch):
    # 向前传播
    predictions = model(inputs)
    
    # 计算损失值
    loss = loss_fn(predictions,references)
    print('\tEpoch - {epoch},  Loss - {loss}'.format(epoch=i,loss=loss))
    
    # 反向传播前将模型梯度设为0
    model.zero_grad()
    
    # 执行反向传播更新梯度
    loss.backward()
    
    # 更新权重
    optimizer.step()

	Epoch - 0,  Loss - 717.1761474609375
	Epoch - 1,  Loss - 688.29296875
	Epoch - 2,  Loss - 612.2767944335938
	Epoch - 3,  Loss - 534.7396850585938
	Epoch - 4,  Loss - 663.9226684570312
	Epoch - 5,  Loss - 651.6809692382812
	Epoch - 6,  Loss - 679.5014038085938
	Epoch - 7,  Loss - 677.7264404296875
	Epoch - 8,  Loss - 676.4962768554688
	Epoch - 9,  Loss - 595.934326171875
	Epoch - 10,  Loss - 669.9837646484375
	Epoch - 11,  Loss - 665.420654296875
	Epoch - 12,  Loss - 248.46400451660156
	Epoch - 13,  Loss - 533.2970581054688
	Epoch - 14,  Loss - 643.0326538085938
	Epoch - 15,  Loss - 661.3712768554688
	Epoch - 16,  Loss - 655.7705688476562
	Epoch - 17,  Loss - 648.2195434570312
	Epoch - 18,  Loss - 638.5397338867188
	Epoch - 19,  Loss - 573.7095336914062
	Epoch - 20,  Loss - 611.03515625
	Epoch - 21,  Loss - 591.40966796875
	Epoch - 22,  Loss - 129.89315795898438
	Epoch - 23,  Loss - 542.3623046875
	Epoch - 24,  Loss - 511.2645263671875
	Epoch - 25,  Loss - 474.20501708984375
	Epoch - 2

	Epoch - 248,  Loss - 0.6334233283996582
	Epoch - 249,  Loss - 1.9966981410980225
	Epoch - 250,  Loss - 3.320133686065674
	Epoch - 251,  Loss - 0.6764392256736755
	Epoch - 252,  Loss - 2.1573052406311035
	Epoch - 253,  Loss - 0.6545682549476624
	Epoch - 254,  Loss - 0.518869161605835
	Epoch - 255,  Loss - 2.5767667293548584
	Epoch - 256,  Loss - 10.160286903381348
	Epoch - 257,  Loss - 1.116133451461792
	Epoch - 258,  Loss - 2.917104959487915
	Epoch - 259,  Loss - 8.28857135772705
	Epoch - 260,  Loss - 2.554196834564209
	Epoch - 261,  Loss - 11.873201370239258
	Epoch - 262,  Loss - 0.42885738611221313
	Epoch - 263,  Loss - 2.1813244819641113
	Epoch - 264,  Loss - 4.86180305480957
	Epoch - 265,  Loss - 5.1490983963012695
	Epoch - 266,  Loss - 3.0699892044067383
	Epoch - 267,  Loss - 3.853442668914795
	Epoch - 268,  Loss - 4.1419148445129395
	Epoch - 269,  Loss - 3.3701374530792236
	Epoch - 270,  Loss - 2.6948697566986084
	Epoch - 271,  Loss - 1.5511295795440674
	Epoch - 272,  Loss - 2.4