## 循环神经网络 — 使用tensorflow
 本节介绍如何使用tensorflow训练循环神经网络。

### Penn Tree Bank (PTB) 数据集
我们以单词为基本元素来训练语言模型。Penn Tree Bank（PTB）是一个标准的文本序列数据集。它包括训练集、验证集和测试集。

下面我们载入数据集。



In [1]:
import math
import os
import time
import numpy as np

import zipfile
with zipfile.ZipFile('../../data/ptb.zip', 'r') as zin:
    zin.extractall('../../data/')

### 建立词语索引
下面定义了Dictionary类来映射词语和索引。



In [2]:
class Dictionary(object):
    def __init__(self):
        self.word_to_idx = {}
        self.idx_to_word = []

    def add_word(self, word):
        if word not in self.word_to_idx:
            self.idx_to_word.append(word)
            self.word_to_idx[word] = len(self.idx_to_word) - 1
        return self.word_to_idx[word]

    def __len__(self):
        return len(self.idx_to_word)

以下的`Corpus`类按照读取的文本数据集建立映射词语和索引的词典，并将文本转换成词语索引的序列。这样，每个文本数据集就变成了`NDArray`格式的整数序列。



In [3]:
class Corpus(object):
    def __init__(self, path):
        self.dictionary = Dictionary()
        self.train = self.tokenize(path + 'train.txt')
        self.valid = self.tokenize(path + 'valid.txt')
        self.test = self.tokenize(path + 'test.txt')

    def tokenize(self, path):
        assert os.path.exists(path)
        # 将词语添加至词典。

        with open(path, 'r') as f:
            tokens = 0
            for line in f:
                words = line.split() + ['<eos>']
                tokens += len(words)
                for word in words:
                    self.dictionary.add_word(word)
        # 将文本转换成词语索引的序列（NDArray格式）。
        with open(path, 'r') as f:
            indices = np.zeros((tokens,), dtype='int32')
            idx = 0
            for line in f:
                words = line.split() + ['<eos>']
                for word in words:
                    indices[idx] = self.dictionary.word_to_idx[word]
                    idx += 1
        return np.array(indices, dtype='int32')

看一下词典的大小。



In [4]:
data = '../../data/ptb/ptb.'
corpus = Corpus(data)
vocab_size = len(corpus.dictionary)
vocab_size

10000

### 循环神经网络模型库
我们可以定义一个循环神经网络模型库。这样就可以支持各种不同的循环神经网络模型了。

In [5]:
import tensorflow as tf
class RNNModel():
    """循环神经网络模型库"""
    def __init__(self, mode, vocab_size, embed_dim, hidden_dim,
                 num_layers, dropout=0.5):
        with tf.variable_scope('rnn', reuse=tf.AUTO_REUSE):
            self.drop = dropout
            self.embedding_weights = tf.get_variable("embedding", shape=[vocab_size, embed_dim], dtype=tf.float32)
            self.mode = mode
            self.num_layers = num_layers
            self.vocab_size = vocab_size
            self.hidden_dim = hidden_dim
            
            if mode == 'rnn_relu':
                self.cells = tf.contrib.rnn.BasicRNNCell(num_units=hidden_dim, activation=tf.nn.relu)
            elif mode == 'rnn_tanh':
                self.cells = tf.contrib.rnn.BasicRNNCell(num_units=hidden_dim, activation=tf.nn.tanh)
            elif mode == 'gru_':
                self.cells = tf.contrib.rnn.GRUCell(num_units=hidden_dim, activation=tf.nn.tanh, state_is_tuple=True)
            elif mode == 'lstm':
                self.cells = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_dim, activation=tf.nn.tanh, state_is_tuple=True)
            else:
                raise ValueError("Invalid mode %s. Options are rnn_relu, "
                                 "rnn_tanh, lstm, and gru"%mode)        

    def forward(self, inputs, state, is_training):
        with tf.variable_scope('rnn', reuse=tf.AUTO_REUSE):
            emb = tf.nn.embedding_lookup(self.embedding_weights, inputs)
            '''
            if is_training:
                emb =  tf.nn.dropout(emb, self.drop)
                cells = tf.contrib.rnn.DropoutWrapper(self.cells, output_keep_prob=self.drop)
            '''
            num_steps = emb.get_shape().as_list()[0]
            emb_list = []
            for i in range(num_steps):
                emb_list.append(emb[i])
            cells = self.cells
            cells = tf.contrib.rnn.MultiRNNCell([cells] * self.num_layers, state_is_tuple=True)
            initial_state = []
            for i in range(self.num_layers):
                initial_state.append(state[i])
            print initial_state

            outputs, last_state = tf.nn.static_rnn(cells, emb_list, initial_state=initial_state)
            decoded = tf.contrib.layers.fully_connected(outputs, self.vocab_size, activation_fn=None)
        return decoded, last_state


### 定义参数
我们接着定义模型参数。我们选择使用`ReLU`为激活函数的循环神经网络为例。这里我们把`epochs`设为$1$是为了演示方便。

### 多层循环神经网络
我们通过`num_layers`设置循环神经网络隐含层的层数，例如$2$。

对于一个多层循环神经网络，当前时刻隐含层的输入来自同一时刻输入层（如果有）或上一隐含层的输出。每一层的隐含状态只沿着同一层传递。

把单层循环神经网络中隐含层的每个单元当做一个函数$f$，这个函数在$t$时刻的输入是$\mathbf{X}_t, \mathbf{H}_{t-1}$，输出是$\mathbf{H}_t$：

$f(\mathbf{X}_t, \mathbf{H}_{t-1}) = \mathbf{H}_t$

假设输入为第$0$层，输出为第$L+1$层，在一共$L$个隐含层的循环神经网络中，上式中可以拓展成以下的函数:

$f(\mathbf{H}_t^{(l-1)}, \mathbf{H}_{t-1}^{(l)}) = \mathbf{H}_t^{(l)}$

如下图所示。

![image.png](http://zh.gluon.ai/_images/multi-layer-rnn.svg)

In [6]:
model_name = 'rnn_tanh'

embed_dim = 100
hidden_dim = 100
num_layers = 2
clipping_norm = 0.2
batch_size = 32
num_steps = 5
dropout_rate = 1.0


### 批量采样
我们将数据进一步处理为便于相邻批量采样的格式。



In [7]:
def batchify(data, batch_size):
    """数据形状 (num_batches, batch_size)"""
    num_batches = data.shape[0] // batch_size
    data = data[:num_batches * batch_size]
    data = data.reshape((batch_size, num_batches)).T
    return data


def get_batch(source, i):
    seq_len = min(num_steps, source.shape[0] - 1 - i)
    data = source[i : i + seq_len]
    target = source[i + 1 : i + 1 + seq_len]
    return data, target

train_data = batchify(corpus.train, batch_size)
val_data = batchify(corpus.valid, batch_size)
test_data = batchify(corpus.test, batch_size)

model = RNNModel(model_name, vocab_size, embed_dim, hidden_dim,
                       num_layers, dropout_rate)



In [8]:
slim = tf.contrib.slim

max_steps = 10000
batch_size = 32
num_epochs= 40
learning_rate = 1e0
eval_period = 100

#训练
print hidden_dim
num_inputs = num_outputs = vocab_size
input_placeholder = tf.placeholder(tf.int64, [num_steps, batch_size])
state_placeholder = tf.placeholder(tf.float32, [num_layers, batch_size, hidden_dim])
gt_placeholder = tf.placeholder(tf.int64, [num_steps, batch_size, 1])

state_
outputs, state = model.forward(input_placeholder, state_placeholder, is_training=True)

outputs = tf.concat(outputs, axis=0)
loss = tf.losses.sparse_softmax_cross_entropy(logits=tf.reshape(outputs, [batch_size * num_steps, vocab_size]),  labels=tf.reshape(gt_placeholder, (num_steps*batch_size, 1)))

print loss

params = []
var_list = tf.trainable_variables()
for var in var_list:
    print var.op.name
params.append(var)


op = tf.train.GradientDescentOptimizer(learning_rate)

gradients = tf.gradients(loss, params)

#process gradients
clipped_gradients, norm = tf.clip_by_global_norm(gradients, 5)

train_op = op.apply_gradients(zip(clipped_gradients, params))

init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
sess.run(init)

for epoch in range(num_epochs):
    state_init = np.zeros(shape=(num_layers, batch_size, hidden_dim))
    for ibatch, i in enumerate(range(0, train_data.shape[0] - 1 - num_steps, num_steps)):

        data, target = get_batch(train_data, i)
        feed_dict = {input_placeholder: data, state_placeholder: state_init, gt_placeholder: np.expand_dims(target, axis=-1)}
        loss_, state_, _ = sess.run([loss, state, train_op], feed_dict=feed_dict)
        state_init = state_
        if ibatch % eval_period == 0 and ibatch > 0:
            print('[Epoch %d Batch %d] loss %.2f, perplexity %.2f' % (
                epoch + 1, ibatch, loss_, math.exp(loss_)))
    
    '''
    total_L = 0.0
    ntotal = 0
    state_init = np.zeros(shape=(num_layers, batch_size, hidden_dim))
    for i in range(0, val_data.shape[0] - 1 - num_steps, num_steps):
        data, target = get_batch(val_data, i)
        feed_dict = {input_placeholder: data, state_placeholder: state_init, gt_placeholder: np.expand_dims(target, axis=-1)}
        loss_, state_ = sess.run([loss, state], feed_dict=feed_dict)
        state_init = state_
        total_L += loss_
        ntotal += 1
    
    total_L /= ntotal
    print('[Epoch %d] , validation loss %.2f, validation perplexity %.2f' % (epoch + 1, total_L, math.exp(total_L)))
    '''

100
> <ipython-input-5-1d9b6898ce6c>(30)forward()
-> emb = tf.nn.embedding_lookup(self.embedding_weights, inputs)
(Pdb) c
[<tf.Tensor 'rnn_1/strided_slice_5:0' shape=(32, 100) dtype=float32>, <tf.Tensor 'rnn_1/strided_slice_6:0' shape=(32, 100) dtype=float32>]
Tensor("sparse_softmax_cross_entropy_loss/value:0", shape=(), dtype=float32)
rnn/embedding
rnn/rnn/multi_rnn_cell/cell_0/basic_rnn_cell/kernel
rnn/rnn/multi_rnn_cell/cell_0/basic_rnn_cell/bias
rnn/fully_connected/weights
rnn/fully_connected/biases
[Epoch 1 Batch 100] loss 8.21, perplexity 3678.74
[Epoch 1 Batch 200] loss 7.51, perplexity 1834.23
[Epoch 1 Batch 300] loss 7.28, perplexity 1444.55
[Epoch 1 Batch 400] loss 7.28, perplexity 1453.88
[Epoch 1 Batch 500] loss 7.20, perplexity 1340.72
[Epoch 1 Batch 600] loss 6.92, perplexity 1008.93
[Epoch 1 Batch 700] loss 7.08, perplexity 1182.86
[Epoch 1 Batch 800] loss 7.48, perplexity 1768.37
[Epoch 1 Batch 900] loss 6.91, perplexity 1004.55
[Epoch 1 Batch 1000] loss 6.75, perplexit

[Epoch 3 Batch 3900] loss 6.38, perplexity 587.49
[Epoch 3 Batch 4000] loss 6.58, perplexity 717.86
[Epoch 3 Batch 4100] loss 6.24, perplexity 511.64
[Epoch 3 Batch 4200] loss 6.40, perplexity 603.93
[Epoch 3 Batch 4300] loss 6.28, perplexity 536.39
[Epoch 3 Batch 4400] loss 6.59, perplexity 725.36
[Epoch 3 Batch 4500] loss 6.32, perplexity 554.26
[Epoch 3 Batch 4600] loss 6.64, perplexity 764.63
[Epoch 3 Batch 4700] loss 6.80, perplexity 894.76
[Epoch 3 Batch 4800] loss 6.49, perplexity 661.02
[Epoch 3 Batch 4900] loss 6.53, perplexity 684.22
[Epoch 3 Batch 5000] loss 6.32, perplexity 558.12
[Epoch 3 Batch 5100] loss 6.88, perplexity 973.03
[Epoch 3 Batch 5200] loss 6.72, perplexity 826.55
[Epoch 3 Batch 5300] loss 6.60, perplexity 738.31
[Epoch 3 Batch 5400] loss 6.91, perplexity 1000.91
[Epoch 3 Batch 5500] loss 6.69, perplexity 806.26
[Epoch 3 Batch 5600] loss 6.62, perplexity 750.38
[Epoch 3 Batch 5700] loss 6.28, perplexity 535.38
[Epoch 3 Batch 5800] loss 6.46, perplexity 637.35

[Epoch 6 Batch 3000] loss 6.79, perplexity 885.22
[Epoch 6 Batch 3100] loss 6.41, perplexity 605.07
[Epoch 6 Batch 3200] loss 6.49, perplexity 656.03
[Epoch 6 Batch 3300] loss 6.80, perplexity 899.10
[Epoch 6 Batch 3400] loss 6.41, perplexity 605.85
[Epoch 6 Batch 3500] loss 6.31, perplexity 548.54
[Epoch 6 Batch 3600] loss 6.61, perplexity 741.70
[Epoch 6 Batch 3700] loss 6.52, perplexity 681.83
[Epoch 6 Batch 3800] loss 6.39, perplexity 594.61
[Epoch 6 Batch 3900] loss 6.35, perplexity 572.09
[Epoch 6 Batch 4000] loss 6.54, perplexity 694.68
[Epoch 6 Batch 4100] loss 6.21, perplexity 498.04
[Epoch 6 Batch 4200] loss 6.39, perplexity 597.73
[Epoch 6 Batch 4300] loss 6.25, perplexity 516.74
[Epoch 6 Batch 4400] loss 6.54, perplexity 693.07
[Epoch 6 Batch 4500] loss 6.29, perplexity 540.74
[Epoch 6 Batch 4600] loss 6.60, perplexity 738.01
[Epoch 6 Batch 4700] loss 6.78, perplexity 880.47
[Epoch 6 Batch 4800] loss 6.47, perplexity 645.08
[Epoch 6 Batch 4900] loss 6.51, perplexity 670.42


[Epoch 9 Batch 2100] loss 6.47, perplexity 646.58
[Epoch 9 Batch 2200] loss 6.53, perplexity 682.09
[Epoch 9 Batch 2300] loss 6.78, perplexity 878.30
[Epoch 9 Batch 2400] loss 6.55, perplexity 698.17
[Epoch 9 Batch 2500] loss 6.40, perplexity 598.94
[Epoch 9 Batch 2600] loss 6.31, perplexity 547.49
[Epoch 9 Batch 2700] loss 5.97, perplexity 390.22
[Epoch 9 Batch 2800] loss 6.77, perplexity 871.12
[Epoch 9 Batch 2900] loss 6.60, perplexity 733.69
[Epoch 9 Batch 3000] loss 6.78, perplexity 883.26
[Epoch 9 Batch 3100] loss 6.39, perplexity 596.44
[Epoch 9 Batch 3200] loss 6.47, perplexity 648.34
[Epoch 9 Batch 3300] loss 6.80, perplexity 895.35
[Epoch 9 Batch 3400] loss 6.40, perplexity 601.15
[Epoch 9 Batch 3500] loss 6.30, perplexity 542.88
[Epoch 9 Batch 3600] loss 6.60, perplexity 738.24
[Epoch 9 Batch 3700] loss 6.52, perplexity 675.79
[Epoch 9 Batch 3800] loss 6.38, perplexity 590.22
[Epoch 9 Batch 3900] loss 6.34, perplexity 569.02
[Epoch 9 Batch 4000] loss 6.53, perplexity 686.43


KeyboardInterrupt: 