In [1]:
%matplotlib inline

递归神经网络 - Recurrent Neural Network
====

In [2]:
# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from six.moves import xrange

In [3]:
sentence = """
Deep learning (also known as deep structured learning or hierarchical learning)
is part of a broader family of machine learning methods based on learning data
representations, as opposed to task-specific algorithms. Learning can be supervised,
semi-supervised or unsupervised. Deep learning models are loosely related to information
processing and communication patterns in a biological nervous system, such as neural
coding that attempts to define a relationship between various stimuli and associated
neuronal responses in the brain. Deep learning architectures such as deep neural
networks, deep belief networks and recurrent neural networks have been applied to
fields including computer vision, speech recognition, natural language processing,
audio recognition, social network filtering, machine translation, bioinformatics
and drug design,[5] where they have produced results comparable to and in some
cases superior[6] to human experts.
""".split()
# from wikipedia https://en.wikipedia.org/wiki/Deep_learning

vocab = set(sentence)
word2ind = {word: i for i, word in enumerate(vocab)}
ind2word = dict(zip(word2ind.values(), word2ind.keys()))

# hyper-parameter
input_timesteps = 2
output_timesteps = 1
vocab_size = len(vocab)
embedding_size = 100

hidden_size = 60
layers_num = 2
training_epochs = 10000

In [4]:
data_num = len(sentence) - input_timesteps
x = [[word2ind[ch] for ch in sentence[i:i + input_timesteps]]
     for i in xrange(len(sentence) - input_timesteps)]
y = [[word2ind[sentence[i]]] for i in xrange(input_timesteps, len(sentence))]


In [5]:
import tensorflow as tf

X = tf.placeholder(dtype=tf.int32, shape=[None, input_timesteps])
Y = tf.placeholder(dtype=tf.int32, shape=[None, output_timesteps])

onehot_encoding = lambda tensor: tf.one_hot(tensor, depth=vocab_size, axis=-1)
output_tensor = onehot_encoding(Y)


  from ._conv import register_converters as _register_converters


推荐资料：<br>
[TensorFLow，RNN](https://www.tensorflow.org/tutorials/recurrent)<br>
[TensorFlow，机器翻译](https://www.tensorflow.org/tutorials/seq2seq)<br>
[TenorFlow，语音识别](https://www.tensorflow.org/tutorials/audio_recognition)<br>
[Stanford，NLP课程](http://cs224d.stanford.edu/syllabus.html)<br>

In [6]:
embedding_layer = tf.Variable(tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embedding_layer, X)


In [7]:
from tensorflow.contrib import rnn

def RNN(x, num_hidden,
        cell_type=rnn.BasicLSTMCell,
        activation=tf.nn.relu,
        dropout_prob=1.0,
        num_layers=1):
    assert cell_type in [rnn.BasicLSTMCell, rnn.BasicRNNCell, rnn.GRUCell], \
        'RNN cell is wrong, must be in "rnn.BasicLSTMCell, rnn.BasicRNNCell, rnn.GRUCell", but it is %s.' % (cell_type)
    assert type(num_layers) == int and num_layers >= 1
    assert 0.0 < dropout_prob <= 1.0

    # RNN
    def mRNN(x, units, cell=cell_type, activation=activation, num_layers=num_layers, dropout_prob=dropout_prob):
        pass

    # BiRNN
    def mBiRNN(x, units, cell=cell_type, activation=activation, num_layers=num_layers, dropout_prob=dropout_prob):
        pass

    cell_fw = [rnn.DropoutWrapper(cell_type(num_hidden, activation=activation), output_keep_prob=dropout_prob) \
               for _ in xrange(num_layers)]
    cell_bw = [rnn.DropoutWrapper(cell_type(num_hidden, activation=activation), output_keep_prob=dropout_prob) \
               for _ in xrange(num_layers)]
    outputs, _, _ = rnn.stack_bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs=x, dtype=tf.float32)

    return outputs

mLSTM = RNN(embed, hidden_size, dropout_prob=0.8, num_layers=2)
mLSTM = tf.reshape(mLSTM, [-1, output_timesteps, input_timesteps * hidden_size * 2])
fc1 = tf.layers.dense(inputs=mLSTM, units=vocab_size)
y_ = fc1
y_max = tf.argmax(y_, axis=-1)

loss_op = tf.losses.softmax_cross_entropy(output_tensor, y_)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-2).minimize(loss_op)


In [8]:
config = tf.ConfigProto()
# 制定显存大小
config.gpu_options.per_process_gpu_memory_fraction = 0.5

session = tf.Session(config=config)
session.run(tf.global_variables_initializer())

keyword = 'learning'
print('Epoch %s / %s' % (0, training_epochs))
print('Embeding Vector(10 dims) of %s:' % (keyword), \
      session.run(embedding_layer[word2ind[keyword]])[:10])

for i in xrange(1, 1 + training_epochs):
    _, cost = session.run([optimizer, loss_op],
                          feed_dict={X: x, Y: y})
    if i % 1000 == 0:
        print('Epoch %s / %s, training cost: %s' % (i, training_epochs, cost))
        print('Embeding Vector(10 dims) of %s:' % (keyword), \
              session.run(embedding_layer[word2ind[keyword]])[:10])


Epoch 0 / 10000
Embeding Vector(10 dims) of learning: [ 0.00799441 -0.5405512   0.09135056  0.8176873  -0.12731647  0.8504865
 -0.45705009  0.66576886  0.06272578 -0.2356267 ]
Epoch 1000 / 10000, training cost: 0.05490133
Embeding Vector(10 dims) of learning: [-0.28414813 -0.5548082   0.01459709  0.69892734 -0.3096602   0.47404203
 -0.6738451   0.4596958   0.07354294 -0.18089756]
Epoch 2000 / 10000, training cost: 0.9808509
Embeding Vector(10 dims) of learning: [-0.44491723 -0.78584325 -0.43986446  0.67091227  0.28678086  0.44343904
 -0.25964335  0.49541366  0.02252071 -0.66784793]
Epoch 3000 / 10000, training cost: 0.18120779
Embeding Vector(10 dims) of learning: [-0.5225975  -1.0272251  -0.561625    1.1386011   0.22322932  0.33065888
 -0.6619925   0.5235973  -0.32603624 -0.69371986]
Epoch 4000 / 10000, training cost: 0.6563275
Embeding Vector(10 dims) of learning: [-0.41274878 -0.7972945  -0.5073091   1.0862406   0.00845267  0.0736497
 -1.0075927   0.30090532 -0.53805286 -0.56945497]

In [9]:
context_idxs = [word2ind['Deep'], word2ind['learning']]
logue = context_idxs
for i in xrange(data_num):
    y_ = y_max.eval({X: [context_idxs], Y: y[:1]}, session)[0, 0]
    logue.append(y_)
    context_idxs = logue[-2:]

sentence = ' '.join(sentence)
pred_sentence = ' '.join([ind2word[i] for i in logue])

import editdistance

print('Distance between these two sentences is %s' % (editdistance.eval(sentence, pred_sentence)))
print("\033[1;31;40m%s \033[0m" % (sentence))
print(pred_sentence)

Distance between these two sentences is 490
[1;31;40mDeep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semi-supervised or unsupervised. Deep learning models are loosely related to information processing and communication patterns in a biological nervous system, such as neural coding that attempts to define a relationship between various stimuli and associated neuronal responses in the brain. Deep learning architectures such as deep neural networks, deep belief networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics and drug design,[5] where they have produced results comparable to and in some cases superior[6] to human experts. 

本节使用的语言模型是N-garm：利用前几个序列推导之后的一个序列。一般来说，前几个序列越长，模型的性能更好。另一种语言模型是CBOW，根据一个序列的前后几个序列来推导，中间序列。<br>
word2vec是指，文本到向量。one-hot encoding和word embedding都属于word2vec。<br>
[word embedding](http://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html#sphx-glr-beginner-nlp-word-embeddings-tutorial-py)
<br>
word embedding可以加载离线训练好的结果，也可以在线训练。本案例中属于在线训练，因为我们用不到那么大的词汇库。<br>

推荐资料：<br>
[TensorFLow，RNN](https://www.tensorflow.org/tutorials/recurrent)<br>
[TensorFlow，机器翻译](https://www.tensorflow.org/tutorials/seq2seq)<br>
[TenorFlow，语音识别](https://www.tensorflow.org/tutorials/audio_recognition)<br>
[Stanford，NLP课程](http://cs224d.stanford.edu/syllabus.html)<br>
[word embedding--wikipedia](https://en.wikipedia.org/wiki/Word_embedding)