In [1]:
%matplotlib inline

递归神经网络 - Recurrent Neural Network
====

In [2]:
# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from six.moves import xrange

In [3]:
sentence = """
Deep learning (also known as deep structured learning or hierarchical learning)
is part of a broader family of machine learning methods based on learning data
representations, as opposed to task-specific algorithms. Learning can be supervised,
semi-supervised or unsupervised. Deep learning models are loosely related to information
processing and communication patterns in a biological nervous system, such as neural
coding that attempts to define a relationship between various stimuli and associated
neuronal responses in the brain. Deep learning architectures such as deep neural
networks, deep belief networks and recurrent neural networks have been applied to
fields including computer vision, speech recognition, natural language processing,
audio recognition, social network filtering, machine translation, bioinformatics
and drug design,[5] where they have produced results comparable to and in some
cases superior[6] to human experts.
"""
# from wikipedia https://en.wikipedia.org/wiki/Deep_learning

chars = set(sentence)
word2ind = {word: i for i, word in enumerate(chars)}
ind2word = dict(zip(word2ind.values(), word2ind.keys()))

# hyper-parameter
input_timesteps = 2
output_timesteps = 1
num_classes = len(chars)
hidden_size = 60
layers_num = 2
training_epochs = 10000


In [4]:
data_num = len(sentence) - input_timesteps
x = [[word2ind[ch] for ch in sentence[i:i + input_timesteps]]
     for i in xrange(len(sentence) - input_timesteps)]
y = [[word2ind[sentence[i]]] for i in xrange(input_timesteps, len(sentence))]


In [5]:
import tensorflow as tf

X = tf.placeholder(dtype=tf.int32, shape=[None, input_timesteps])
Y = tf.placeholder(dtype=tf.int32, shape=[None, output_timesteps])

onehot_encoding = lambda tensor: tf.one_hot(tensor, depth=num_classes, axis=-1)
input_tensor = onehot_encoding(X)
output_tensor = onehot_encoding(Y)


  from ._conv import register_converters as _register_converters


## RNN中的Dropout
link: https://stackoverflow.com/questions/45917464/tensorflow-whats-the-difference-between-tf-nn-dropout-and-tf-contrib-rnn-dropo<br>
tensorflow中有两种Dropout手段：<br>
1.`tf.nn.droupout`：以上一个网络的输出的部分作为下一层网络的输入。适用于一切网络。<br>
2.`tf.contrib.rnn.DropoutWrapper`：在RNN cell内部实现dropout，可以控制RNN网络的输入和输出dropout。只适用于RNN内部。

## TensorFlow中创建多层RNN
`tensorflow`中有2种方法可以实现多层RNN：<br>
### 1.利用`rnn.MultiRNNCell`和`rnn.static_rnn`/`tf.nn.static_rnn`/`tf.nn.dynamic_rnn`的组合实现<br>
#### i).`tf.nn.dynamic_rnn`<br>
    不需要拆分<br>
``
cell = rnn.MultiRNNCell([rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob) \
                             for _ in xrange(num_layers)])
outputs, state = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32)
``
<br>
#### ii).`rnn.static_rnn` <=> `tf.nn.static_rnn`<br>
需要拆分<br>
``
x = tf.unstack(x, timesteps, 1)
cell = rnn.MultiRNNCell([rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob) \
                             for _ in xrange(num_layers)])
outputs, state = tf.nn.static_rnn/rnn.static_rnn(cell, x, dtype=tf.float32)
outputs = tf.concat(outputs, axis=-1)
``<br>
### 2.通过`tf.variable_scope`循环模拟多层RNN<br>
#### i).`tf.nn.dynamic_rnn`<br>
不需要拆分<br>
``
for _ in xrange(num_layers):
    with tf.variable_scope(None, default_name="rnn"):
        x, state = tf.nn.dynamic_rnn(
            rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob),
            inputs=x, dtype=tf.float32)
outputs = x
``<br>
#### ii).`rnn.static_rnn` <=> `tf.nn.static_rnn`<br>
``
x = tf.unstack(x, timesteps, 1)
for _ in xrange(num_layers):
    with tf.variable_scope(None, default_name="rnn"):
        x, state = tf.nn.static_rnn(
            rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob),
            inputs=x, dtype=tf.float32)
outputs = tf.concat(x, axis=-1)
``

## TensorFlow中创建多层RNN和多层BiRNN

BiRNN不能像RNN那样灵活，需要控制输入和输出的流程，代码写起来比较冗长；不能直接使用for循环BiRNN，需要使用函数实现多层BiRNN<br>
tensorflow中biRNN共有5个接口:<br>
### 1.tensorflow.contrib.rnn.stack_bidirectional_rnn(cells_fw, cells_bw, ...)
需要`tf.unstack`，将`[batch, timestep, length]`的`timestep`拆分为`list`<br>
`cells_fw`, `cells_bw`必须为`list`，`list`的长度为RNN网络层数<br>
``
x = tf.unstack(x, timesteps, 1)
cell_fw = [rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob) \
           for _ in xrange(num_layers)]
cell_bw = [rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob) \
           for _ in xrange(num_layers)]
x = tf.unstack(x, timesteps, 1)
outputs, state_fw, state_bw = rnn.stack_bidirectional_rnn(cell_fw, cell_bw, inputs=x, dtype=tf.float32)
outputs = tf.stack(outputs, axis=1)
``
### 2.tensorflow.contrib.rnn.stack_bidirectional_dynamic_rnn
不需要拆分<br>
``
cell_fw = [rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob) \
           for _ in xrange(num_layers)]
cell_bw = [rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob) \
           for _ in xrange(num_layers)]
outputs, state_fw, state_bw = rnn.stack_bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs=x, dtype=tf.float32)
``
### 3.tf.nn.bidirectional_dynamic_rnn
不需要拆分；通过`tf.variable_scope`循环模拟多层RNN<br>
``
for _ in xrange(num_layers):
    with tf.variable_scope(None, default_name="bidirectional-rnn"):
        cell_fw = rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob)
        cell_bw = rnn.DropoutWrapper(cell(units, activation=activation), output_keep_prob=dropout_prob)
        x, state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs=x, dtype=tf.float32)
    x = tf.concat(x, axis=-1)
outputs = x
``
### 4.tensorflow.contrib.rnn.static_bidirectional_rnn = 5.tf.nn.static_bidirectional_rnn
需要拆分；通过`tf.variable_scope`循环模拟多层RNN<br>
``
x = tf.unstack(x, timesteps, 1)
for _ in xrange(num_layers):
    with tf.variable_scope(None, default_name="bidirectional-rnn"):
        cell_fw = rnn.DropoutWrapper(cell(units, activation=activation))
        cell_bw = rnn.DropoutWrapper(cell(units, activation=activation))
        x, state_fw, state_bw = rnn.static_bidirectional_rnn(cell_fw, cell_bw, inputs=x, dtype=tf.float32)
outputs = tf.stack(x, axis=1)
``

In [6]:
from tensorflow.contrib import rnn

def RNN(x, num_hidden,
        cell_type=rnn.BasicLSTMCell,
        activation=tf.nn.relu,
        dropout_prob=1.0,
        num_layers=1):
    assert cell_type in [rnn.BasicLSTMCell, rnn.BasicRNNCell, rnn.GRUCell], \
        'RNN cell is wrong, must be in "rnn.BasicLSTMCell, rnn.BasicRNNCell, rnn.GRUCell", but it is %s.' % (cell_type)
    assert type(num_layers) == int and num_layers >= 1
    assert 0.0 < dropout_prob <= 1.0

    # RNN
    def mRNN(x, units, cell=cell_type, activation=activation, num_layers=num_layers, dropout_prob=dropout_prob):
        pass

    # BiRNN
    def mBiRNN(x, units, cell=cell_type, activation=activation, num_layers=num_layers, dropout_prob=dropout_prob):
        pass

    cell_fw = [rnn.DropoutWrapper(cell_type(num_hidden, activation=activation), output_keep_prob=dropout_prob) \
               for _ in xrange(num_layers)]
    cell_bw = [rnn.DropoutWrapper(cell_type(num_hidden, activation=activation), output_keep_prob=dropout_prob) \
               for _ in xrange(num_layers)]
    outputs, _, _ = rnn.stack_bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs=x, dtype=tf.float32)

    return outputs


mLSTM = RNN(input_tensor, hidden_size, dropout_prob=0.8, num_layers=2)
mLSTM = tf.reshape(mLSTM, [-1, output_timesteps, input_timesteps * hidden_size * 2])
fc1 = tf.layers.dense(inputs=mLSTM, units=num_classes)
y_ = fc1
y_max = tf.argmax(y_, axis=-1)

loss_op = tf.losses.softmax_cross_entropy(output_tensor, y_)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-2).minimize(loss_op)


In [7]:
config = tf.ConfigProto()
# 制定显存大小
config.gpu_options.per_process_gpu_memory_fraction = 0.5

session = tf.Session(config=config)
session.run(tf.global_variables_initializer())

for i in xrange(1, 1 + training_epochs):
    _, cost = session.run([optimizer, loss_op],
                          feed_dict={X: x, Y: y})
    if i % 1000 == 0:
        print('Epoch %s / %s, training cost: %s' % (i, training_epochs, cost))


Epoch 1000 / 10000, training cost: 0.9628274
Epoch 2000 / 10000, training cost: 0.9563312
Epoch 3000 / 10000, training cost: 0.9443648
Epoch 4000 / 10000, training cost: 0.93909156
Epoch 5000 / 10000, training cost: 0.9439826
Epoch 6000 / 10000, training cost: 0.939328
Epoch 7000 / 10000, training cost: 0.93188685
Epoch 8000 / 10000, training cost: 0.9366058
Epoch 9000 / 10000, training cost: 0.9412674
Epoch 10000 / 10000, training cost: 0.94335544


In [8]:
context_idxs = [word2ind['D'], word2ind['e']]
logue = context_idxs
for i in xrange(data_num):
    y_ = y_max.eval({X: [context_idxs], Y: y[:1]}, session)[0, 0]
    logue.append(y_)
    context_idxs = logue[-2:]

sentence = ''.join(sentence)
pred_sentence = ''.join([ind2word[i] for i in logue])

import editdistance

print('Distance between these two sentences is %s' % (editdistance.eval(sentence, pred_sentence)))
print("\033[1;31;40m%s \033[0m" % (sentence))
print(pred_sentence)

Distance between these two sentences is 696
[1;31;40m
Deep learning (also known as deep structured learning or hierarchical learning)
is part of a broader family of machine learning methods based on learning data
representations, as opposed to task-specific algorithms. Learning can be supervised,
semi-supervised or unsupervised. Deep learning models are loosely related to information
processing and communication patterns in a biological nervous system, such as neural
coding that attempts to define a relationship between various stimuli and associated
neuronal responses in the brain. Deep learning architectures such as deep neural
networks, deep belief networks and recurrent neural networks have been applied to
fields including computer vision, speech recognition, natural language processing,
audio recognition, social network filtering, machine translation, bioinformatics
and drug design,[5] where they have produced results comparable to and in some
cases superior[6] to human experts.


RNN优点：<br>
1.RNN适合处理关系序列。通常用于文本中。<br>
2.为了加强RNN处理序列的能力，通常采用LSTM，GRU等高级cell。LSTM和GRU性能相近，但是GRU的参数更少，LSTM在论文中却仍然使用较多。因此，推荐读者准备一份LSTM/GRU的矢量图，和英文论文文档，便于以后使用。<br>
*[理解LSTM和GRU](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

缺点：<br>
1.参数较多，运行时间较长<br>

推荐：<br>
1.基于RNN+CTC的语音/图像识别：
https://github.com/watsonyanghx/CNN_LSTM_CTC_Tensorflow<br>
2.快速理解CTC：
<br>
3.翻译模型（Attention机制）：http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html<br>

推荐资料：<br>
[TensorFLow，RNN](https://www.tensorflow.org/tutorials/recurrent)<br>
[TensorFlow，机器翻译](https://www.tensorflow.org/tutorials/seq2seq)<br>
[TenorFlow，语音识别](https://www.tensorflow.org/tutorials/audio_recognition)<br>
[Stanford，NLP课程](http://cs224d.stanford.edu/syllabus.html)<br>