# `Tensorflow`中的循环神经网络模块
前面我们讲了循环神经网络的基础知识和网络结构，下面我们教大家如何在`Tensorflow`下构建循环神经网络

## 一般的 RNN

![](https://ws1.sinaimg.cn/large/006tKfTcly1fmt9xz889xj30kb07nglo.jpg)

对于最简单的 RNN，我们可以使用下面两种方式去调用，分别是 `tf.nn.rnn_cell.BasicRNNCell()` 和 `tf.nn.dynamic_rnn`，这两种方式的区别在于 `BasicRNNCell()` 只能接受序列中单步的输入，且必须传入隐藏状态，而 `dynamic_rnn()` 可以接受一个序列的输入，默认会传入全 0 的隐藏状态，也可以自己申明隐藏状态传入。

`BasicRNNCell()` 里面的参数有

num_units 表示输出的特征维度

activation 表示选用的激活函数, 默认`tanh`

reuse 表示是否需要复用

`dynamic_rnn` 里面的参数则要丰富一些, 最重要的参数是下面的这些:

inputs: 基础的`RNNCell`

initial_state: 设置初始状态

time_major: 确定时间步长是否在输入的第一维上, 如果是, 那么输入就是`[max_time, batch_size, depth]`的格式, 否则是`[batch_size, max_step, depth]`

In [1]:
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import

import tensorflow as tf

- 定义一个基本的`rnn`单元

In [2]:
rnn_single = tf.nn.rnn_cell.BasicRNNCell(200)

Instructions for updating:
This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.


In [3]:
# 构造一个序列, 长度为6, batch是5, 特征是100
x = tf.random_normal([6, 5, 100])

- 获取零值初始状态

In [10]:
init_state = rnn_single.zero_state(5, tf.float32)

- 对时间循环输出`rnn`结果

In [13]:
# 获取初始状态
state = init_state

In [14]:
outputs = []
for i in range(6):
    # 在第一次调用`rnn_single`之后, 要重用它的元素
    if i > 0: 
        tf.get_variable_scope().reuse_variables()
    out, state = rnn_single(x[i], state)
    outputs.append(out)

Instructions for updating:
Colocations handled automatically by placer.


我们来看结果

In [15]:
sess = tf.InteractiveSession()

sess.run(tf.global_variables_initializer())

In [16]:
print(state.shape)
print(sess.run(state)[:, :6])

(5, 200)
[[ 0.23740672 -0.63808864 -0.44034815  0.23360763  0.2753587  -0.23616292]
 [ 0.16947421 -0.20845452  0.01050325  0.00670506  0.14466554 -0.74739504]
 [ 0.593006   -0.35832733  0.63541234  0.9081754   0.92346406 -0.8201202 ]
 [ 0.78909206 -0.60555834 -0.28283122  0.31207624  0.22151065  0.0539156 ]
 [-0.7323768  -0.58282286 -0.58387053  0.40701184  0.81881887  0.36537918]]


In [17]:
print(len(outputs))

6


可以看到, `BasicRNNCell`的状态是`[batch, num_units]`的, 我们需要在每个时间步手动去调用状态来进行结果的输出. 但这样代码显得繁琐, 我们可以通过`dynamic_rnn`将时间隐藏起来, 让它自动的去执行

- `dynamic_rnn`

In [18]:
out, final_state = tf.nn.dynamic_rnn(rnn_single, x, initial_state=init_state, time_major=True)

Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API


In [19]:
print(final_state.shape)
print(sess.run(final_state)[:, :6])

(5, 200)
[[ 0.86282545 -0.1959296   0.8673103  -0.8565273  -0.02591902 -0.04155239]
 [-0.36076847  0.49511367 -0.7887822   0.24777052  0.84505177 -0.2214899 ]
 [ 0.796027    0.23960643  0.6556977  -0.54825115  0.31795624 -0.6841595 ]
 [-0.21353531  0.6054556   0.94461817  0.33738232  0.24672262  0.721434  ]
 [ 0.64385796 -0.8875018   0.8214519   0.45600793  0.7053377  -0.66932195]]


In [20]:
print(out.shape)

(6, 5, 200)


可以发现, `dynamic_rnn`会输出最后一步的状态和中间所有时间步的结果

我们还可以自定义初始状态

In [21]:
# 定义初始状态由随机正态分布产生
init_state = tf.random_normal([5, 200], dtype=tf.float32)

In [22]:
out, final_state = tf.nn.dynamic_rnn(rnn_single, x, initial_state=init_state, time_major=True)

In [23]:
print(sess.run(final_state[:, :6]))

[[-0.22710875  0.05476503 -0.42959926 -0.5678      0.44084245 -0.23703308]
 [ 0.34647667 -0.70813704 -0.8357367   0.65132725  0.23925468 -0.62708473]
 [ 0.6513476  -0.2738228   0.40910715  0.13683943 -0.88606876  0.07258287]
 [ 0.00497269  0.86790895 -0.33660704 -0.2338637  -0.48237967 -0.5952984 ]
 [ 0.05422073 -0.65117145  0.20116867  0.9408332  -0.14599302 -0.84643024]]


`RNN`训练的过程中比较容易产生过拟合的现象, 为此我们需要添加`dropout`

- `RNNCell`添加`dropout`

In [24]:
# 添加`dropout`正则项
def build_rnn(num_units, batch_size, keep_prob=1):
    cell = tf.nn.rnn_cell.BasicRNNCell(num_units)
    cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=keep_prob)
        
    return cell

In [25]:
dropout_cell = build_rnn(100, 3, 0.5)

## LSTM

![](https://ws1.sinaimg.cn/large/006tKfTcly1fmt9qj3uhmj30iz07ct90.jpg)

LSTM 和基本的 RNN 是一样的，他的参数也是相同，我们就不再赘述了

在`RNN`中, 我们也可以定义深层网络, 就是通过`tf.nn.rnn_cell.MultiRNNCell`来实现, 调用它的方式非常简单, 构造一个`cell`的`list`作为参数传入就可以了

- `LSTM`和`MultiRNNCell`

In [26]:
def build_lstm(num_units, num_layers, batch_size, keep_prob=1):
    def build_cell(num_units):
        cell = tf.nn.rnn_cell.LSTMCell(num_units, reuse=tf.AUTO_REUSE)
        cell= tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=keep_prob)
        
        return cell
    
    cell = tf.nn.rnn_cell.MultiRNNCell([build_cell(num_units) for _ in range(num_layers)])
    init_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, init_state

构造一个2层的`lstm`模型

In [27]:
lstm_cell, lstm_init_state = build_lstm(100, 2, 5)

Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.


In [28]:
lstm, final_state =  tf.nn.dynamic_rnn(lstm_cell, x, initial_state=lstm_init_state, time_major=True)

现在我们有了一个2层的模型, 每个模型的状态有`c`和`h`

In [29]:
for i, layer_state in enumerate(final_state):
    print('layer {}'.format(i))
    print('c.shape: {}'.format(layer_state.c.shape))
    print('h.shape: {}'.format(layer_state.h.shape))

layer 0
c.shape: (5, 100)
h.shape: (5, 100)
layer 1
c.shape: (5, 100)
h.shape: (5, 100)


输出形式还是相同

In [30]:
print(lstm.shape)

(6, 5, 100)


- 自定义状态初始化

有时候我们不希望用零去初始化, 考虑到`lstm`状态的特殊性, `tensorflow`用`LSTMStatusTuple`表示一个`LSTMCell`的状态, 它的参数如下

`c`: 状态`c`

`h`: 状态`h`

我们用`tuple`和`for`来定义这个2层模型的初始化状态

In [31]:
init_state = tuple([tf.nn.rnn_cell.LSTMStateTuple(tf.random_normal([5, 100]), tf.random_normal([5, 100])) for _ in range(2)])

In [32]:
for i, layer_state in enumerate(init_state):
    print('layer {}'.format(i))
    print('c.shape: {}'.format(layer_state.c.shape))
    print('h.shape: {}'.format(layer_state.h.shape))

layer 0
c.shape: (5, 100)
h.shape: (5, 100)
layer 1
c.shape: (5, 100)
h.shape: (5, 100)


In [33]:
lstm, final_state =  tf.nn.dynamic_rnn(lstm_cell, x, initial_state=init_state, time_major=True)

# GRU
![](https://ws3.sinaimg.cn/large/006tKfTcly1fmtaj38y9sj30io06bmxc.jpg)

GRU 和前面讲的这两个是同样的道理，就不再细说，还是演示一下例子

In [None]:
def build_gru(num_units, num_layers, batch_size, keep_prob=1):
    def build_cell(num_units):
        cell = tf.nn.rnn_cell.GRUCell(num_units, reuse=tf.AUTO_REUSE)
        cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=keep_prob)
            
        return cell
    
    cell = tf.nn.rnn_cell.MultiRNNCell([build_cell(num_units) for _ in range(num_layers)])
    init_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, init_state

In [None]:
gru_cell, gru_init_state = build_gru(100, 2, 5)

In [None]:
print(gru_init_state)

In [None]:
gru, final_state =  tf.nn.dynamic_rnn(gru_cell, x, initial_state=gru_init_state, time_major=True)