### PyTorch的RNN

----
```
class torch.nn.RNN(*args, **kwargs)
```

----

Applies a multi-layer Elman RNN with $tanh$ or $ReLU$ non-linearity to an input sequence.   
请求一个多层RNN（循环神经网络），它将对输入序列进行$tanh$或$ReLU$非线性运算。

For each element in the input sequence, each layer computes the following function:   
对于输入序列的每一个元素，每层都进行如下公司的运算：

 $$ h_t = tanh(W_{ih}x_t + b_{ih} + W_{hh}h_{t-1} + b_{hh}) $$
 
 where $h_t$ is the hidden state at time $t$, $x_t$ is the input at time $t$, and $h_{(t-1)}$ is the hidden state of the previous layer at time $t-1$ or the initial hidden state at time $0$. If `nonlinearity` is `relu`, then $ReLU$ is used instead of $tanh$.   
 其中，$h_t$是$t$时刻的隐含状态，$x_t$是$t$时刻的输入，而$h_{(t-1)}$则是$t-1$时刻的隐含状态，该隐含状态的初始值为0.如果`nonlinearity`是`relu`，则用$ReLU$函数替换$tanh$函数。

**Parameters**:   
**参数**:

* **input_size** – The number of expected features in the input $x$ 【输入$x$的特征数，输入向量$x$的维度数，输入的个数】
* **hidden_size** – The number of features in the hidden state $h$ 【隐藏层神经元的个数】
* **num_layers** – Number of recurrent layers. E.g., setting `num_layers=2` would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1【RNN的层数，如果设置`num_layers=2`，则意味着有两个RNN连接在一起，其中第二个RNN的输入为第一个RNN的输出。缺省值为1.】
* **nonlinearity** – The non-linearity to use. Can be either '`tanh`' or '`relu`'. Default: '`tanh`'【作用函数，缺省值为'`tanh`'，如果设置成 '`relu`'，则作用函数为$ReLU$函数】
* **bias** – If `False`, then the layer does not use bias weights *b_ih* and *b_hh*. Default: `True`【如果设置成`False`则隐藏层没有参数*b_ih*和*b_hh*，缺省值为】
* **batch_first** – If `True`, then the input and output tensors are provided as $(batch, seq, feature)$. Default: `False`
* **dropout** – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to `dropout`. Default: 0
* **bidirectional** – If `True`, becomes a bidirectional RNN. Default: `False`

**Inputs**: input, h_0

* **input** of shape *(seq_len, batch, input_size)*: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See [torch.nn.utils.rnn.pack_padded_sequence()](https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pack_padded_sequence.html#torch-nn-utils-rnn-pack-padded-sequence) or [torch.nn.utils.rnn.pack_sequence()](https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pack_sequence.html#torch-nn-utils-rnn-pack-sequence) for details.

* **h_0** of shape *(num_layers * num_directions, batch, hidden_size)*: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

**Outputs**: output, h_n

* **output** of shape *(seq_len, batch, num_directions * hidden_size)*: tensor containing the output features (h_t) from the last layer of the RNN, for each $t$. If a [torch.nn.utils.rnn.PackedSequence](https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.PackedSequence.html#torch.nn.utils.rnn.PackedSequence) has been given as the input, the output will also be a packed sequence.  
For the unpacked case, the directions can be separated using `output.view(seq_len, batch, num_directions, hidden_size)`, with forward and backward being direction $0$ and $1$ respectively. Similarly, the directions can be separated in the packed case.

* **h_n** of shape *(num_layers * num_directions, batch, hidden_size)*: tensor containing the hidden state for t = seq_len.  
Like output, the layers can be separated using `h_n.view(num_layers, num_directions, batch, hidden_size)`.

**Shape**:

* **Input1**: $(L, N, H_{in})$ tensor containing input features where $H_{in}=\text{input_size}$ and $L$ represents a sequence length.

* **Input2**: $(S, N, H_{out})$ tensor containing the initial hidden state for each element in the batch.  
$H_{out}=\text{hidden_size}$ Defaults to zero if not provided. where $S=\text{num_layers} * \text{num_directions}$ If the RNN is bidirectional, num_directions should be 2, else it should be 1.

* **Output1**: $(L, N, H_{all})$ where $H_{all}=\text{num_directions} * \text{hidden_size}$

* **Output2**: $(S, N, H_{out})$ tensor containing the next hidden state for each element in the batch

**Variables**

* **~RNN.weight_ih_l[k]** – the learnable input-hidden weights of the k-th layer, of shape *(hidden_size, input_size)* for $k = 0$. Otherwise, the shape is *(hidden_size, num_directions * hidden_size)*

* **~RNN.weight_hh_l[k]** – the learnable hidden-hidden weights of the k-th layer, of shape *(hidden_size, hidden_size)*

* **~RNN.bias_ih_l[k]** – the learnable input-hidden bias of the k-th layer, of shape *(hidden_size)*

* **~RNN.bias_hh_l[k]** – the learnable hidden-hidden bias of the k-th layer, of shape *(hidden_size)*

**NOTE**

All the weights and biases are initialized from $\mathscr{U}(-\sqrt{k}, \sqrt{k})$ where $k = \frac{1}{\text{hidden_size}}$

In [2]:
from torch import nn, optim

In [3]:
rnn = nn.RNN(10, 20, 2)

In [4]:
rnn

RNN(10, 20, num_layers=2)

上述写法不意义清楚，换个写法，如下：

In [25]:
rnn = nn.RNN(input_size=10, hidden_size=20, num_layers=2, nonlinearity='relu')

In [26]:
rnn

RNN(10, 20, num_layers=2)

In [27]:
rnn.input_size

10

In [28]:
rnn.hidden_size

20

In [29]:
rnn.num_layers

2

In [30]:
rnn.nonlinearity

'relu'

In [33]:
import torch

随机生成一个输入，其中`seq_len=5`, `batch=3`, `input_size=10`

In [38]:
input = torch.randn(5, 3, 10)

In [35]:
input

tensor([[[-0.2136,  0.6550, -0.5866,  0.7904, -0.4175,  0.2324,  0.4426,
          -0.1730,  0.2303,  1.3951],
         [-0.0633, -0.0234,  1.4220,  0.7754,  0.4210, -0.2156, -0.0238,
           0.8117, -1.1339,  0.3355],
         [ 0.3425,  0.3564,  0.9073, -0.0297,  1.1188, -0.2755, -0.5922,
          -1.9047,  1.8543, -0.7428]],

        [[-0.5415,  0.8176, -0.7917,  2.6023,  0.9393, -0.2480,  0.5148,
           0.0495,  0.5244, -0.8879],
         [-0.6655, -0.7050,  0.0285, -1.6590, -0.8466,  1.1517, -0.4227,
          -0.6335,  1.0067, -1.0237],
         [-2.4608,  0.9272, -0.8395,  0.2601, -0.0750, -0.0074,  0.5788,
          -0.5884,  0.5253, -0.4161]],

        [[-0.3016,  0.3559,  0.3049,  1.0602, -1.6101, -1.2519,  1.3368,
           1.3935, -0.6285,  2.2198],
         [ 1.4819, -1.1124, -0.8175,  0.0967, -1.6808, -0.5923,  2.0534,
           0.8784, -0.8851, -0.3817],
         [ 0.9969,  1.4029, -0.3292, -0.6317, -1.6539, -1.1570, -0.1709,
          -0.7214, -0.5851,  1.0806

In [36]:
input.shape

torch.Size([5, 3, 10])

In [6]:
h0 = torch.randn(2, 3, 20)

In [10]:
h0

tensor([[[ 1.7005,  2.0622,  0.2596,  1.1244, -0.0030, -0.8713,  1.3854,
          -0.3751, -0.4079, -0.8769, -0.5297, -1.0469,  0.7056, -0.5951,
          -1.5194, -0.6982,  1.7719,  0.8174, -0.0060, -1.8900],
         [-0.7312,  0.7778, -0.5811, -0.4257, -0.8034, -0.8660,  1.5157,
          -0.7403,  1.1677,  0.5809,  0.4522, -0.4620, -0.9990,  0.0904,
           0.8183, -0.2081,  0.7247, -0.7790,  0.9893, -0.8280],
         [-1.0413, -0.3022, -1.2148, -1.7313,  0.2203, -0.5146,  0.6175,
          -0.6839, -0.0307,  1.6719,  0.4437, -0.2275,  1.3126, -0.7540,
          -0.0356, -0.8927,  2.0019,  0.2936,  0.8306, -0.5191]],

        [[ 0.8618,  2.8280, -1.4003,  0.4045, -0.8874,  1.1740,  2.6002,
           0.8753,  1.3151, -1.4312,  2.0962,  0.8373,  0.3204, -0.2359,
           0.7882, -1.2130,  1.5117, -2.1493, -2.0501,  1.1786],
         [ 0.6501, -1.2137, -0.4433,  0.6859, -0.3681, -1.8112,  1.0084,
           0.2967,  1.4405, -0.7480, -0.3950, -1.4791,  0.0060,  0.4411,
        

In [11]:
output, hn = rnn(input, h0)

In [13]:
hn

tensor([[[ 1.0671e-01,  6.3203e-01, -1.4205e-01, -4.5679e-01,  4.9430e-01,
           4.1664e-04, -2.2096e-01,  1.6742e-01, -1.5266e-01, -2.8774e-01,
           5.0801e-01, -3.2709e-01,  1.5137e-01, -1.9688e-01,  6.6191e-01,
           3.7101e-01, -1.6490e-01, -1.5713e-01,  3.3369e-02,  5.4381e-03],
         [-4.0392e-01,  7.4687e-01, -3.6658e-01, -6.0945e-01,  1.7619e-01,
          -5.1696e-01, -1.3466e-01,  4.9498e-01, -1.8346e-01, -2.3167e-01,
          -1.2358e-02,  1.6410e-01, -2.3573e-02, -1.1894e-01,  4.8816e-01,
           7.9178e-01,  3.0626e-01,  1.4387e-02,  8.6952e-02, -6.5329e-02],
         [-4.9836e-01,  4.7920e-02, -4.2825e-01, -2.1469e-01,  2.6080e-01,
          -1.2441e-01, -9.6232e-03,  4.6136e-01, -5.9921e-01, -3.0484e-01,
          -2.2815e-01, -3.4020e-02, -9.3360e-01,  4.2591e-01,  2.8626e-01,
           5.1743e-01,  1.1075e-01,  8.6932e-01, -1.5866e-02,  2.9688e-01]],

        [[ 4.9944e-01, -3.9954e-02, -1.2648e-01,  9.3358e-03,  3.0234e-02,
          -2.1879e-0

In [14]:
output

tensor([[[-0.5825,  0.7571, -0.4633, -0.8861, -0.4963,  0.6599,  0.8740,
           0.5617, -0.6960, -0.0864,  0.3847,  0.0574,  0.0355,  0.5923,
           0.9324, -0.6560, -0.9243, -0.6949, -0.6292,  0.4132],
         [-0.4413,  0.1195, -0.6355, -0.0180, -0.6298,  0.0630,  0.7187,
           0.1938, -0.7252,  0.5678, -0.8940, -0.2355,  0.5314,  0.4671,
          -0.3629, -0.9565, -0.8289,  0.6665, -0.4162,  0.6056],
         [ 0.4981, -0.0386,  0.4446,  0.6393,  0.2247, -0.3710,  0.7947,
           0.8055,  0.2164,  0.8039,  0.1271, -0.2050, -0.6548, -0.8605,
          -0.4928,  0.4536,  0.6726,  0.0681,  0.2140, -0.4608]],

        [[ 0.6615,  0.5527, -0.5354, -0.6154,  0.1578, -0.1347,  0.1343,
           0.5307, -0.1963, -0.1883,  0.6901, -0.1787, -0.0157, -0.6794,
           0.3095,  0.4369,  0.5052,  0.1298, -0.5970, -0.0251],
         [ 0.7091,  0.5332, -0.1371, -0.0753,  0.3985, -0.5981, -0.0085,
           0.7076,  0.0055, -0.3812,  0.2159, -0.1415, -0.5214, -0.2694,
        