# PyTorch 中的循环神经网络模块
前面我们讲了循环神经网络的基础知识和网络结构，下面我们教大家如何在 pytorch 下构建循环神经网络，因为 pytorch 的动态图机制，使得循环神经网络非常方便。

## 一般的 RNN

![](https://ws1.sinaimg.cn/large/006tKfTcly1fmt9xz889xj30kb07nglo.jpg)

对于最简单的 RNN，我们可以使用下面两种方式去调用，分别是 `torch.nn.RNNCell()` 和 `torch.nn.RNN()`，这两种方式的区别在于 `RNNCell()` 只能接受序列中单步的输入，且必须传入隐藏状态，而 `RNN()` 可以接受一个序列的输入，默认会传入全 0 的隐藏状态，也可以自己申明隐藏状态传入。

`RNN()` 里面的参数有

- input_size 表示输入 $x_t$ 的特征维度
- hidden_size 表示输出的特征维度
- num_layers 表示网络的层数
- nonlinearity 表示选用的非线性激活函数，默认是 'tanh'
- bias 表示是否使用偏置，默认使用
- batch_first 表示输入数据的形式，默认是 False，就是这样形式，(seq, batch, feature)，也就是将序列长度放在第一位，batch 放在第二位
- dropout 表示是否在输出层应用 dropout
- bidirectional 表示是否使用双向的 rnn，默认是 False

对于 `RNNCell()`，里面的参数就少很多，只有 input_size，hidden_size，bias 以及 nonlinearity

In [1]:
import torch
from torch import nn

In [2]:
# 定义一个单步的 rnn
rnn_single = nn.RNNCell(input_size=100, hidden_size=200)

In [3]:
# 访问其中的参数
rnn_single.weight_hh

Parameter containing:
tensor(1.00000e-02 *
       [[ 1.9724, -0.2166, -1.8052,  ..., -5.9964,  6.1924,  3.1329],
        [ 0.0213,  5.0959, -4.3981,  ...,  3.0245,  0.5581,  5.8345],
        [-2.0580,  2.3759, -6.0162,  ..., -1.8694, -1.7910,  4.4702],
        ...,
        [ 0.5579, -4.0788, -5.8365,  ...,  1.9048,  1.4578,  2.0752],
        [ 1.6126,  1.7574, -0.6912,  ..., -2.8778,  0.8002,  4.0901],
        [ 2.4278, -3.8924, -7.0415,  ...,  1.6344, -5.3127, -2.2535]])

In [4]:
# 构造一个序列，长为 6，batch 是 5， 特征是 100
x = torch.randn(6, 5, 100) # 这是 rnn 的输入格式

In [5]:
# 定义初始的记忆状态
h_t = torch.zeros(5, 200)

In [6]:
# 传入 rnn
out = []
for i in range(6): # 通过循环 6 次作用在整个序列上
    h_t = rnn_single(x[i], h_t)
    out.append(h_t)

In [7]:
h_t

tensor([[-0.2785,  0.3361, -0.2707,  0.5384, -0.3453, -0.0621,  0.2997,
         -0.7268, -0.3606, -0.6820,  0.1715,  0.2967,  0.1803,  0.2656,
         -0.6619,  0.4360,  0.0999, -0.2199, -0.2005, -0.7297,  0.1703,
          0.4489, -0.5971,  0.4689, -0.0564, -0.5973, -0.6523,  0.1015,
         -0.5570,  0.6684, -0.1105,  0.5065, -0.3855,  0.5090,  0.0880,
         -0.5351, -0.3988, -0.3031,  0.1531, -0.4519,  0.0874, -0.0777,
         -0.1511, -0.5309,  0.4204, -0.0980, -0.0797,  0.7792, -0.5863,
         -0.2134, -0.1773,  0.1328, -0.5280,  0.7914, -0.0401,  0.2257,
          0.4226, -0.4460, -0.0362,  0.4347,  0.2317, -0.3605, -0.3718,
          0.6234, -0.0588,  0.2359,  0.9595,  0.2583, -0.0635,  0.8439,
          0.1412, -0.0096,  0.3995,  0.2461,  0.1129,  0.1811,  0.1364,
          0.0664, -0.2751,  0.1570,  0.0902, -0.2808, -0.1588,  0.4894,
          0.6023, -0.5421,  0.7022, -0.3036, -0.4107, -0.0656,  0.3289,
          0.4624,  0.1985,  0.5246,  0.2584,  0.2726, -0.4914,  

In [8]:
len(out)

6

In [9]:
out[0].shape # 每个输出的维度

torch.Size([5, 200])

可以看到经过了 rnn 之后，隐藏状态的值已经被改变了，因为网络记忆了序列中的信息，同时输出 6 个结果

下面我们看看直接使用 `RNN` 的情况

In [10]:
rnn_seq = nn.RNN(100, 200)

In [11]:
# 访问其中的参数
rnn_seq.weight_hh_l0

Parameter containing:
tensor(1.00000e-02 *
       [[ 6.4858, -1.3243, -4.7152,  ..., -2.7202,  0.4114,  1.0040],
        [-4.9094, -4.0587, -5.5163,  ..., -0.1241,  0.2478, -3.8715],
        [ 2.6127,  0.3696,  6.7411,  ...,  0.8146,  3.5174, -0.3044],
        ...,
        [ 7.0482, -0.2230, -6.8762,  ..., -5.8123, -3.6224,  3.9161],
        [-2.3369,  3.3696,  1.0000,  ..., -6.9897, -2.5819, -2.8631],
        [-2.0895, -1.8998,  4.1469,  ...,  2.1032, -2.3867,  5.0983]])

In [12]:
out, h_t = rnn_seq(x) # 使用默认的全 0 隐藏状态

In [13]:
h_t

tensor([[[-0.2864, -0.6870, -0.1868,  0.4762,  0.2230,  0.0990, -0.1803,
          -0.0705,  0.5446, -0.6403, -0.7496, -0.0373, -0.2892, -0.5572,
          -0.5748, -0.2722, -0.4566, -0.4044,  0.3902, -0.8183, -0.4640,
           0.3947, -0.2176,  0.7006,  0.3167, -0.1482, -0.8141,  0.1342,
           0.3783, -0.2587,  0.2346, -0.3896, -0.0987,  0.1096,  0.0189,
           0.1900,  0.1986,  0.3706, -0.3598, -0.3933, -0.1188, -0.3492,
           0.2103, -0.1661,  0.4169,  0.4494,  0.0311,  0.1188, -0.5825,
           0.7674, -0.3285, -0.5660, -0.3732, -0.0900, -0.2383, -0.0395,
          -0.0123, -0.0553,  0.1047,  0.3057,  0.4683, -0.7950,  0.4732,
           0.4115, -0.1588,  0.8503, -0.1951, -0.4973, -0.1665,  0.2522,
           0.0939,  0.2821,  0.2651,  0.4255,  0.4237, -0.4186, -0.1340,
           0.5045,  0.4684,  0.0182, -0.0710, -0.2301, -0.3311,  0.1191,
          -0.1738, -0.2243, -0.3827, -0.2523, -0.5368,  0.0339,  0.4295,
           0.0685, -0.1498, -0.0979, -0.0117,  0.50

In [14]:
len(out)

6

这里的 h_t 是网络最后的隐藏状态，网络也输出了 6 个结果

In [15]:
# 自己定义初始的隐藏状态
h_0 = (torch.randn(1, 5, 200))

这里的隐藏状态的大小有三个维度，分别是 (num_layers * num_direction, batch, hidden_size)

In [16]:
out, h_t = rnn_seq(x, h_0)

In [17]:
h_t

tensor([[[-0.2885, -0.7039, -0.1910,  0.4702,  0.2323,  0.0909, -0.2137,
          -0.0849,  0.5395, -0.6335, -0.7496, -0.0194, -0.2965, -0.5571,
          -0.5873, -0.2746, -0.4262, -0.3955,  0.3886, -0.8105, -0.4750,
           0.3896, -0.1857,  0.6998,  0.3398, -0.1367, -0.8179,  0.1236,
           0.4025, -0.2687,  0.2366, -0.3826, -0.1129,  0.0888,  0.0135,
           0.1977,  0.1818,  0.3387, -0.3661, -0.3672, -0.1350, -0.3537,
           0.2255, -0.1349,  0.4113,  0.4617,  0.0163,  0.1199, -0.5904,
           0.7724, -0.3196, -0.5542, -0.3657, -0.0850, -0.2565, -0.0152,
          -0.0048, -0.0533,  0.0984,  0.3099,  0.4729, -0.7894,  0.4734,
           0.4098, -0.1525,  0.8522, -0.1623, -0.4877, -0.1711,  0.2370,
           0.1058,  0.2485,  0.2701,  0.4263,  0.4372, -0.4218, -0.1425,
           0.4924,  0.4624,  0.0394, -0.0839, -0.2434, -0.3139,  0.1428,
          -0.1897, -0.2664, -0.3608, -0.2453, -0.5423,  0.0364,  0.4215,
           0.0899, -0.1717, -0.0838,  0.0020,  0.53

In [18]:
out.shape

torch.Size([6, 5, 200])

同时输出的结果也是 (seq, batch, feature)

一般情况下我们都是用 `nn.RNN()` 而不是 `nn.RNNCell()`，因为 `nn.RNN()` 能够避免我们手动写循环，非常方便，同时如果不特别说明，我们也会选择使用默认的全 0 初始化隐藏状态

## LSTM

![](https://ws1.sinaimg.cn/large/006tKfTcly1fmt9qj3uhmj30iz07ct90.jpg)

LSTM 和基本的 RNN 是一样的，他的参数也是相同的，同时他也有 `nn.LSTMCell()` 和 `nn.LSTM()` 两种形式，跟前面讲的都是相同的，我们就不再赘述了，下面直接举个小例子

In [19]:
lstm_seq = nn.LSTM(50, 100, num_layers=2) # 输入维度 100，输出 200，两层

In [20]:
lstm_seq.weight_hh_l0 # 第一层的 h_t 权重

Parameter containing:
tensor([[ 8.2637e-02, -1.6290e-02,  7.2261e-02,  ..., -1.2704e-02,
         -3.7244e-03,  5.8466e-02],
        [-7.7157e-02,  4.7159e-02,  9.5986e-02,  ..., -9.5587e-02,
          4.4277e-02, -8.7769e-02],
        [ 1.2518e-02,  8.6955e-02, -2.3107e-02,  ..., -9.5205e-02,
          4.9019e-02, -3.2687e-02],
        ...,
        [-2.3459e-02,  9.9478e-02, -8.4542e-02,  ..., -7.8115e-02,
         -5.9986e-02, -1.0029e-02],
        [ 9.2741e-02,  7.0935e-02,  4.9473e-02,  ..., -1.8433e-03,
          9.2872e-02, -6.9518e-02],
        [-7.0386e-02, -6.4270e-02,  4.4438e-02,  ..., -3.9259e-02,
         -4.5516e-02, -2.3544e-02]])

**小练习：想想为什么这个系数的大小是 (400, 100)**

In [22]:
lstm_input = (torch.randn(10, 3, 50)) # 序列 10，batch 是 3，输入维度 50

In [23]:
out, (h, c) = lstm_seq(lstm_input) # 使用默认的全 0 隐藏状态

注意这里 LSTM 输出的隐藏状态有两个，h 和 c，就是上图中的每个 cell 之间的两个箭头，这两个隐藏状态的大小都是相同的，(num_layers * direction, batch, feature)

In [24]:
h.shape # 两层，Batch 是 3，特征是 100

torch.Size([2, 3, 100])

In [25]:
c.shape

torch.Size([2, 3, 100])

In [26]:
out.shape

torch.Size([10, 3, 100])

我们可以不使用默认的隐藏状态，这是需要传入两个张量

In [28]:
h_init = (torch.randn(2, 3, 100))
c_init = (torch.randn(2, 3, 100))

In [29]:
out, (h, c) = lstm_seq(lstm_input, (h_init, c_init))

In [30]:
h.shape

torch.Size([2, 3, 100])

In [31]:
c.shape

torch.Size([2, 3, 100])

In [32]:
out.shape

torch.Size([10, 3, 100])

# GRU
![](https://ws3.sinaimg.cn/large/006tKfTcly1fmtaj38y9sj30io06bmxc.jpg)

GRU 和前面讲的这两个是同样的道理，就不再细说，还是演示一下例子

In [33]:
gru_seq = nn.GRU(10, 20)
gru_input = (torch.randn(3, 32, 10))

out, h = gru_seq(gru_input)

In [34]:
gru_seq.weight_hh_l0

Parameter containing:
tensor([[-1.5756e-01,  1.9829e-01,  3.0460e-02,  ..., -2.8688e-02,
          1.7674e-01, -5.7531e-02],
        [-1.4722e-02,  8.9410e-02,  1.9423e-01,  ..., -1.6158e-01,
         -9.8019e-02, -3.6601e-02],
        [-5.7251e-02, -1.4695e-01,  7.4977e-03,  ..., -1.5865e-01,
         -9.0608e-02,  1.8087e-01],
        ...,
        [ 2.0028e-01, -1.4748e-01,  1.7329e-01,  ..., -3.0522e-02,
         -5.8583e-02,  4.7067e-02],
        [-1.6744e-02, -2.9272e-02, -1.1360e-01,  ..., -1.1566e-01,
         -2.1596e-01, -6.1540e-02],
        [-1.3397e-01, -1.8592e-01, -1.2110e-04,  ..., -1.0225e-01,
          7.2537e-02,  8.1911e-02]])

In [35]:
h.shape

torch.Size([1, 32, 20])

In [36]:
out.shape

torch.Size([3, 32, 20])