[toc]

# Pytorch RNN

- input_size – The number of expected features in the input x

- hidden_size – The number of features in the hidden state h

- num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1

- nonlinearity – The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh'

- bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True

- batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False

- dropout – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0

- bidirectional – If True, becomes a bidirectional RNN. Default: False
    

pytorch 中的 RNN 和 tensorflow 中的 tf.keras.layers.SimpleRNN 的差距比较大。主要有下面几点

1. pytorch 中的数据默认是以 time major（time first）的格式输入的，而tensorflow 默认以 batch major 的格式输入。
2. tensorflow 中的 simplernn 有 return_state 和 reture_sequence 来控制是否返回 sequence 和 hidden_state，而 pytorch 中的 RNN 没有这两个参数，会直接返回 sequence 和 hidden_state。相当于 pytorch 只支持 return_state=True 和 return_sequences=True
3. tensorflow 中如果要在深度方向上叠加多个 rnn，需要多个 SimpleRNN 层，并且之前的层的 return_sequences=True，但是 pytorch 中直接设置 num_layers 就可以。

## 示例

In [1]:
from torch import nn
import torch

batch_size = 3
seq_len = 5
feature_size = 10

n_hidden = 2

rnn = nn.RNN(10, n_hidden)
h0 = torch.randn(1, batch_size, n_hidden)
x = torch.randn(seq_len, batch_size, feature_size)
output, hidden_state = rnn(x, h0)

print(output.shape)
print(hidden_state.shape)
print(output[-1]==hidden_state) # hidden_state 是最后一个时间步的 output

torch.Size([5, 3, 2])
torch.Size([1, 3, 2])
tensor([[[True, True],
         [True, True],
         [True, True]]])


## num_layers

pytorch 中使用 num_layers 来进行 rnn 的堆叠。

In [2]:
from torch import nn
import torch

batch_size = 3
seq_len = 5
feature_size = 10
num_layers = 3

n_hidden = 2

rnn = nn.RNN(10, n_hidden, num_layers)
h0 = torch.randn(num_layers, batch_size, n_hidden)
x = torch.randn(seq_len, batch_size, feature_size)
output, hidden_state = rnn(x, h0)

print(output.shape)
print(hidden_state.shape)
print(output[-1]==hidden_state[-1]) # hidden_state 是最后一层最后一个时间步的 output

torch.Size([5, 3, 2])
torch.Size([3, 3, 2])
tensor([[True, True],
        [True, True],
        [True, True]])


注意到，我们只指定了第一层的神经元的个数和最后一层神经元的个数。而中间的神经元的个数没有指定，pytorch 会默认中间层的神经元的层数和输出层相同。

In [3]:
print(rnn.weight_ih_l0.shape) # 第1层是 2 x 10 hidden_size x input_size
print(rnn.weight_ih_l1.shape) # 第2层是 2 x 2
print(rnn.weight_ih_l2.shape) # 第3层是 2 x 2

torch.Size([2, 10])
torch.Size([2, 2])
torch.Size([2, 2])


隐藏层的形状都是 hidden_size x hidden_size

In [4]:
print(rnn.weight_hh_l0.shape)
print(rnn.weight_hh_l1.shape)
print(rnn.weight_hh_l2.shape)

torch.Size([2, 2])
torch.Size([2, 2])
torch.Size([2, 2])


上面的网络的结构如下，由三个 rnn 堆叠而成。

![](https://gitee.com/EdwardElric_1683260718/picture_bed/raw/master/img/20200818150047.png)

## bidirectional

![](https://gitee.com/EdwardElric_1683260718/picture_bed/raw/master/img/20200820165559.png)

In [42]:
from torch import nn
import torch

batch_size = 3
seq_len = 5
feature_size = 10
n_hidden = 2

rnn = nn.RNN(10, n_hidden, bidirectional=True)
h0 = torch.zeros(2, batch_size, n_hidden)
x = torch.randn(seq_len, batch_size, feature_size)
output, hidden_state = rnn(x, h0)

print(output.shape) # [seq_len, batch_size, n_hidden * 2]
print(hidden_state.shape) # [2, batch_size, n_hidden]

h3_left = output[-1, :, :n_hidden]
h3_right = output[0, :, n_hidden:]
print(hidden_state == torch.cat((h3_left, h3_right)).view(2, -1, n_hidden))

torch.Size([5, 3, 4])
torch.Size([2, 3, 2])
tensor([[[True, True],
         [True, True],
         [True, True]],

        [[True, True],
         [True, True],
         [True, True]]])


# References
[RNN — PyTorch 1.6.0 documentation](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html?highlight=rnn#torch.nn.RNN)