[View in Colaboratory](https://colab.research.google.com/github/JozeeLin/google-tensorflow-exercise/blob/master/Bidirectional_LSTM_Classifier.ipynb)

双向循环神经网络(Bidirectional Recurrent Neural Networks,Bi-RNN)的主要目标是增加RNN可利用的信息。比如普通的MLP对数据长度等有限制，而RNN虽然可以处理不固定长度的时序数据，

但是无法利用某个历史输入的未来信息。Bi-RNN则正好相反，它可以同时使用时序数据中某个输入的历史及未来数据。其实现原理很简单，将时序方向相反的两个循环神经网络连接到同一个输出，

通过这种结构，输出层就可以同时获取历史和未来信息了。

在需要上下文环境的情况中，Bi-RNN将会非常有用，比如在手写文字识别时，如果有当前要识别的单词的前面和后面一个单词的信息，那么将非常有利于识别。同样的，当我们在阅读文章时，

有时也需要通过下文的语境来预测文中某句话的准确含义。**对语言模型这类问题，可能Bi-RNN并不合适，因为我们的目标就是通过前文预测下一个单词，这里不能讲下文信息传给模型**。

对于很多分类问题，如手写文字识别、机器翻译、蛋白结构预测等，使用Bi-RNN将会大大提升模型效果。

**百度在其语音识别中也是通过Bi-RNN综合考虑上下文语境，将其模型准确率大大提升**。

Bi-RNN网络结构的核心是把一个普通的单向的RNN拆成两个方向，一个是随时序正向的，一个是逆着时序的反向的。这样当前时间节点的输出就可以同时利用正向、反向两个方向的信息，

而不像普通RNN需要等到后面时间节点才可以获取未来信息。

## 本节代码来自TensorFlow-Examples的开源实现

In [8]:
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/tmp/data/', one_hot=True)

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [0]:
learning_rate = 0.01
max_samples = 400000
batch_size = 128
display_step = 10

In [0]:
n_input = 28
n_steps = 28
n_hidden = 256
n_classes = 10

In [0]:
x = tf.placeholder('float',[None, n_steps, n_input])
y = tf.placeholder('float', [None, n_classes])

weights = tf.Variable(tf.random_normal([2*n_hidden, n_classes]))
biases = tf.Variable(tf.random_normal([n_classes]))

In [0]:
def BiRNN(x, weights, biases):
  x = tf.transpose(x, [1,0,2])
  x = tf.reshape(x, [-1, n_input])
  x = tf.split(x, n_steps)
  
  lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
  lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
  
  outputs, _, _ = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x, dtype=tf.float32)
  
  return tf.matmul(outputs[-1], weights)+biases

In [6]:
pred = BiRNN(x, weights, biases)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred,labels=y))

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

init = tf.global_variables_initializer()

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.



In [7]:
with tf.Session() as sess:
  sess.run(init)
  
  step = 1
  while step*batch_size < max_samples:
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    
    batch_x = batch_x.reshape((batch_size, n_steps, n_input))
    
    sess.run(optimizer, feed_dict={x:batch_x,y:batch_y})
    
    if step%display_step == 0:
      acc = sess.run(accuracy, feed_dict={x:batch_x, y:batch_y})
      loss = sess.run(cost, feed_dict={x:batch_x, y:batch_y})
      print('Iter '+str(step*batch_size)+", Minibatch Loss= "+ \
           "{:.6}".format(loss)+", Training Accuracy= "+\
           "{:.5f}".format(acc))
      
    step += 1
  print('Optimization Finished!')
  
  #对mnist.test.images中全部的测试数据进行预测，并将准确率展示出来
  test_len = 10000
  test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
  test_label = mnist.test.labels[:test_len]
  print('Testing Accuracy:', sess.run(accuracy, feed_dict={x:test_data, y:test_label}))

InternalError: ignored