### **本节介绍常用的现代RNN模块**

In [1]:
# 导入库
from net_frame import *
import torch
from d2l import torch as torch
from torch import nn

**一、门控制单元(GRU —— gate reccurent uint)**
![GRU](images/gru.png)

其中，各部分的计算如下
$$
\begin{aligned}
\mathbf{R}_t = \sigma(\mathbf{X}_t \mathbf{W}_{xr} + \mathbf{H}_{t-1} \mathbf{W}_{hr} + \mathbf{b}_r),\\
\mathbf{Z}_t = \sigma(\mathbf{X}_t \mathbf{W}_{xz} + \mathbf{H}_{t-1} \mathbf{W}_{hz} + \mathbf{b}_z),
\end{aligned}
$$

$$\tilde{\mathbf{H}}_t = \tanh(\mathbf{X}_t \mathbf{W}_{xh} + \left(\mathbf{R}_t \odot \mathbf{H}_{t-1}\right) \mathbf{W}_{hh} + \mathbf{b}_h),$$

$$\mathbf{H}_t = \mathbf{Z}_t \odot \mathbf{H}_{t-1}  + (1 - \mathbf{Z}_t) \odot \tilde{\mathbf{H}}_t.$$

In [4]:
# 直接给简洁实现
device = try_gpu()
vocab_size = 100
num_inputs = vocab_size
num_hiddens = 10
gru_layer = nn.GRU(num_inputs,num_hiddens)
model = RNNModel(gru_layer,vocab_size)
model = model.to(device)
# 开始训练即可

**二、长短时记忆网络(LSTM —— Long-short term memory)**

![LSTM](images/LSTM.png)

其中，各部分计算如下

$$
\begin{aligned}
\mathbf{I}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xi} + \mathbf{H}_{t-1} \mathbf{W}_{hi} + \mathbf{b}_i),\\
\mathbf{F}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xf} + \mathbf{H}_{t-1} \mathbf{W}_{hf} + \mathbf{b}_f),\\
\mathbf{O}_t &= \sigma(\mathbf{X}_t \mathbf{W}_{xo} + \mathbf{H}_{t-1} \mathbf{W}_{ho} + \mathbf{b}_o),
\end{aligned}
$$

$$\tilde{\mathbf{C}}_t = \text{tanh}(\mathbf{X}_t \mathbf{W}_{xc} + \mathbf{H}_{t-1} \mathbf{W}_{hc} + \mathbf{b}_c),$$

$$\mathbf{C}_t = \mathbf{F}_t \odot \mathbf{C}_{t-1} + \mathbf{I}_t \odot \tilde{\mathbf{C}}_t.$$

$$\mathbf{H}_t = \mathbf{O}_t \odot \tanh(\mathbf{C}_t).$$

In [8]:
# 直接简洁实现
device = try_gpu()
vocab_size = 100
num_hiddens = 10
num_inputs = vocab_size
lstm_layer = nn.LSTM(num_inputs, num_hiddens)
model = RNNModel(lstm_layer, vocab_size)
model = model.to(device)
# 即可开始训练


**三、深度循环神经网络**

回顾我们一直用的RNN模型，其中的隐状态的计算仅用了1层，把这个隐状态计算层数加多即可.

![deep_rnn](images/deep_rnn.png)

In [None]:
# 实现
vocab_size, num_hiddens, num_layers = 100,256,5
num_inputs = vocab_size
device = try_gpu()
lstm_layer = nn.LSTM(num_inputs, num_hiddens, num_layers) # 即在LSTM中加了个num_layers参数
model = RNNModel(lstm_layer, vocab_size)
model = model.to(device)


**四、双向循环神经网络**

![double_direct_rnn](images/double_direct_rnn.png)

由于双向循环神经网络使用了过去的和未来的数据， 所以我们不能盲目地将这一语言模型应用于任何预测任务。 尽管模型产出的困惑度是合理的， 该模型预测未来词元的能力却可能存在严重缺陷。

In [None]:
# 简洁实现
net = nn.LSTM(num_inputs, num_hiddens, num_layers, bidirectional = True) # 设置bidirectional = True即可