# PyTorch 中的循环神经网络模块
Earlier we talked about the basics and network structure of the cyclic neural network. Below we teach you how to build a circular neural network under pytorch, because the dynamic graph mechanism of pytorch makes the loop neural network very convenient.


## General RNN

![](https://ws1.sinaimg.cn/large/006tKfTcly1fmt9xz889xj30kb07nglo.jpg)

For the simplest RNN, we can use the following two methods: `torch.nn.RNNCell()` and `torch.nn.RNN()`, the difference between the two methods is `RNNCell()` It can only accept single-step input in the sequence, and must pass in the hidden state, and `RNN()` can accept the input of a sequence. By default, it will pass in the hidden state of all 0s, or you can declare the hidden state by itself.

The parameters in `RNN()` are

Input_size represents the feature dimension of the input $x_t$

Hidden_size represents the feature dimension of the output

Num_layers represents the number of layers in the network

Nonlinearity indicates the optional nonlinear activation function. The default is 'tanh'.

Bias indicates whether to use offset, which is used by default

Batch_first indicates the form of the input data. The default is False. This is the form, (seq, batch, feature), which means that the sequence length is placed first and the batch is placed second.

Dropout indicates whether to apply dropout at the output layer

Bidirectional indicates whether to use bidirectional rnn, the default is False

For `RNNCell()`, there are fewer parameters, only input_size, hidden_size, bias, and nonlinearity.


In [46]:
import torch
from torch.autograd import Variable
from torch import nn

In [47]:
# Define a single step rnn
rnn_single = nn.RNNCell(input_size=100, hidden_size=200)

In [48]:
# access the parameters
rnn_single.weight_hh

Parameter containing:
1.00000e-02 *
 6.2260 -5.3805  3.5870  ...  -2.2162  6.2760  1.6760
-5.1878 -4.6751 -5.5926  ...  -1.8942  0.1589  1.0725
 3.3236 -3.2726  5.5399  ...   3.3193  0.2117  1.1730
          ...             ⋱             ...          
 2.4032 -3.4415  5.1036  ...  -2.2035 -0.1900 -6.4016
 5.2031 -1.5793 -0.0623  ...   0.3424  6.9412  6.3707
-5.4495  4.5280  2.1774  ...   1.8767  2.4968  5.3403
[torch.FloatTensor of size 200x200]

In [49]:
# Construct a sequence with a length of 6, batch is 5, and the feature is 100
x = Variable(torch.randn(6, 5, 100)) # This is the input format of rnn


In [50]:
# Define the initial memory state
h_t = Variable(torch.zeros(5, 200))

In [51]:
#入 rnn
out = []
For i in range(6): # acts on the entire sequence by looping 6 times
    h_t = rnn_single(x[i], h_t)
    out.append(h_t)

In [52]:
h_t

Variable containing:
 0.0136  0.3723  0.1704  ...   0.4306 -0.7909 -0.5306
-0.2681 -0.6261 -0.3926  ...   0.1752  0.5739 -0.2061
-0.4918 -0.7611  0.2787  ...   0.0854 -0.3899  0.0092
 0.6050  0.1852 -0.4261  ...  -0.7220  0.6809  0.1825
-0.6851  0.7273  0.5396  ...  -0.7969  0.6133 -0.0852
[torch.FloatTensor of size 5x200]

In [54]:
len(out)

6

In [55]:
Out[0].shape # Dimensions of each output


torch.Size([5, 200])

It can be seen that after rnn, the value of the hidden state has been changed because the network memorizes the information in the sequence and outputs 6 results at the same time.


Let's take a look at the case of using `RNN` directly.


In [32]:
rnn_seq = nn.RNN(100, 200)

In [33]:
# access the parameters
rnn_seq.weight_hh_l0

Parameter containing:
1.00000e-02 *
 1.0998 -1.5018 -1.4337  ...   3.8385 -0.8958 -1.6781
 5.3302 -5.4654  5.5568  ...   4.7399  5.4110  3.6170
 1.0788 -0.6620  5.7689  ...  -5.0747 -2.9066  0.6152
          ...             ⋱             ...          
-5.6921  0.1843 -0.0803  ...  -4.5852  5.6194 -1.4734
 4.4306  6.9795 -1.5736  ...   3.4236 -0.3441  3.1397
 7.0349 -1.6120 -4.2840  ...  -5.5676  6.8897  6.1968
[torch.FloatTensor of size 200x200]

In [34]:
Out, h_t = rnn_seq(x) # Use the default all 0 hidden state


In [36]:
h_t

Variable containing:
( 0 ,.,.) = 
  0.2012  0.0517  0.0570  ...   0.2316  0.3615 -0.1247
  0.5307  0.4147  0.7881  ...  -0.4138 -0.1444  0.3602
  0.0882  0.4307  0.3939  ...   0.3244 -0.4629 -0.2315
  0.2868  0.7400  0.6534  ...   0.6631  0.2624 -0.0162
  0.0841  0.6274  0.1840  ...   0.5800  0.8780  0.4301
[torch.FloatTensor of size 1x5x200]

In [35]:
len(out)

6

Here h_t is the last hidden state of the network, and the network also outputs 6 results.


In [40]:
# Define the initial hidden state
h_0 = Variable(torch.randn(1, 5, 200))

The size of the hidden state here has three dimensions, namely (num_layers * num_direction, batch, hidden_size)


In [41]:
out, h_t = rnn_seq(x, h_0)

In [42]:
h_t

Variable containing:
( 0 ,.,.) = 
  0.2091  0.0353  0.0625  ...   0.2340  0.3734 -0.1307
  0.5498  0.4221  0.7877  ...  -0.4143 -0.1209  0.3335
  0.0757  0.4204  0.3826  ...   0.3187 -0.4626 -0.2336
  0.3106  0.7355  0.6436  ...   0.6611  0.2587 -0.0338
  0.1025  0.6350  0.1943  ...   0.5720  0.8749  0.4525
[torch.FloatTensor of size 1x5x200]

In [45]:
out.shape

torch.Size([6, 5, 200])

The result of the simultaneous output is also (seq, batch, feature)


In general, we use `nn.RNN()` instead of `nn.RNNCell()`, because `nn.RNN()` can avoid us manually writing loops, which is very convenient, and if not specified, we also Will choose to initialize the hidden state with the default all 0s


## LSTM

![](https://ws1.sinaimg.cn/large/006tKfTcly1fmt9qj3uhmj30iz07ct90.jpg)

The LSTM is the same as the basic RNN. Its parameters are the same. At the same time, he also has two forms: `nn.LSTMCell()` and `nn.LSTM()`, which are the same as the previous ones. We will not Again, let's take a small example below.


In [58]:
Lstm_seq = nn.LSTM(50, 100, num_layers=2) # Input dimension 100, output 200, two layers


In [80]:
Lstm_seq.weight_hh_l0 # h_t weight of the first layer


Parameter containing:
1.00000e-02 *
 3.8420  5.7387  6.1351  ...   1.2680  0.9890  1.3037
-4.2301  6.8294 -4.8627  ...  -6.4147  4.3015  8.4103
 9.4411  5.0195  9.8620  ...  -1.6096  9.2516 -0.6941
          ...             ⋱             ...          
 1.2930 -1.3300 -0.9311  ...  -6.0891 -0.7164  3.9578
 9.0435  2.4674  9.4107  ...  -3.3822 -3.9773 -3.0685
-4.2039 -8.2992 -3.3605  ...   2.2875  8.2163 -9.3277
[torch.FloatTensor of size 400x100]

**Little exercise: Think about why the size of this coefficient is (400, 100)**


In [59]:
Lstm_input = Variable(torch.randn(10, 3, 50)) # Sequence 10, batch is 3, input dimension 50


In [64]:
Out, (h, c) = lstm_seq(lstm_input) # Use the default all 0 hidden state


Note that there are two hidden states of the LSTM output, h and c, which are the two arrows between each cell in the above figure. The two hidden states are the same size (num_layers * direction, batch, feature )


In [66]:
H.shape # two layers, Batch is 3, feature is 100


torch.Size([2, 3, 100])

In [67]:
c.shape

torch.Size([2, 3, 100])

In [61]:
out.shape

torch.Size([10, 3, 100])

We can not use the default hidden state, which is the need to pass in two tensors


In [68]:
h_init = Variable(torch.randn(2, 3, 100))
c_init = Variable(torch.randn(2, 3, 100))

In [69]:
out, (h, c) = lstm_seq(lstm_input, (h_init, c_init))

In [70]:
h.shape

torch.Size([2, 3, 100])

In [71]:
c.shape

torch.Size([2, 3, 100])

In [72]:
out.shape

torch.Size([10, 3, 100])

# GRU
![](https://ws3.sinaimg.cn/large/006tKfTcly1fmtaj38y9sj30io06bmxc.jpg)

The GRU and the two mentioned above are the same. I won’t go into details or demonstrate the example.


In [73]:
gru_seq = nn.GRU(10, 20)
gru_input = Variable(torch.randn(3, 32, 10))

out, h = gru_seq(gru_input)

In [76]:
gru_seq.weight_hh_l0

Parameter containing:
 0.0766 -0.0548 -0.2008  ...  -0.0250 -0.1819  0.1453
-0.1676  0.1622  0.0417  ...   0.1905 -0.0071 -0.1038
 0.0444 -0.1516  0.2194  ...  -0.0009  0.0771  0.0476
          ...             ⋱             ...          
 0.1698 -0.1707  0.0340  ...  -0.1315  0.1278  0.0946
 0.1936  0.1369 -0.0694  ...  -0.0667  0.0429  0.1322
 0.0870 -0.1884  0.1732  ...  -0.1423 -0.1723  0.2147
[torch.FloatTensor of size 60x20]

In [75]:
h.shape

torch.Size([1, 32, 20])

In [74]:
out.shape

torch.Size([3, 32, 20])