
https://github.com/fchollet/keras/blob/master/examples/mnist_hierarchical_rnn.py
Example of using Hierarchical RNN (HRNN) to classify MNIST digits.

HRNNs can learn across multiple levels
of temporal hierarchy over a complex sequence.
Usually, the first recurrent layer of an HRNN
encodes a sentence (e.g. of word vectors)
into a  sentence vector.
The second recurrent layer then encodes a sequence of
such vectors (encoded by the first layer) into a document vector.
This document vector is considered to preserve both
the word-level and sentence-level structure of the context.

# References

- [A Hierarchical Neural Autoencoder for Paragraphs and Documents](https://arxiv.org/abs/1506.01057)
    Encodes paragraphs and documents with HRNN.
    Results have shown that HRNN outperforms standard
    RNNs and may play some role in more sophisticated generation tasks like
    summarization or question answering.
- [Hierarchical recurrent neural network for skeleton based action recognition](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7298714)
    Achieved state-of-the-art results on
    skeleton based action recognition with 3 levels
    of bidirectional HRNN combined with fully connected layers.

In the below MNIST example the first LSTM layer first encodes every
column of pixels of shape (28, 1) to a column vector of shape (128,).
The second LSTM layer encodes then these 28 column vectors of shape (28, 128)
to a image vector representing the whole image.
A final Dense layer is added for prediction.

After 5 epochs: train acc: 0.9858, val acc: 0.9864


In [36]:
from __future__ import print_function

import keras
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Input, Dense, TimeDistributed
from keras.layers import LSTM

from datetime import datetime
import numpy as np

In [2]:
# Training parameters.
batch_size = 32
num_classes = 10
epochs = 5

# Embedding dimensions.
row_hidden = 128
col_hidden = 128



In [41]:
# The data, shuffled and split between train and test sets.
(x_train, y_train), (x_test, y_test) = mnist.load_data()
y_test_orig = y_test

In [42]:
print ("x_train:", x_train.shape)
print ("y_train:", y_train.shape)
print ("x_test:", x_test.shape)
print ("y_test:", y_test.shape)
print ("y_test_orig:", y_test_orig.shape)

x_train: (60000, 28, 28)
y_train: (60000,)
x_test: (10000, 28, 28)
y_test: (10000,)
y_test_orig: (10000,)


In [43]:
# Reshapes data to 4D for Hierarchical RNN.
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255


In [44]:
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')


x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [45]:
# Converts class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


In [46]:
print ("y_train", y_train.shape)
print ("x_train", x_train.shape)

y_train (60000, 10)
x_train (60000, 28, 28, 1)


In [9]:
row, col, pixel = x_train.shape[1:]

In [11]:
print('row :', type(row), row)
print('col :', type(col), col)
print('pixel :', type(pixel), pixel)


row : <type 'int'> 28
col : <type 'int'> 28
pixel : <type 'int'> 1


In [14]:
# 4D input.
x = Input(shape=(row, col, pixel))
print (type(x))
print (x)

<class 'tensorflow.python.framework.ops.Tensor'>
Tensor("input_2:0", shape=(?, 28, 28, 1), dtype=float32)


In [15]:
# Encodes a row of pixels using TimeDistributed Wrapper.
encoded_rows = TimeDistributed(LSTM(row_hidden))(x)
#https://keras.io/layers/wrappers/

In [16]:
print (type(encoded_rows))

<class 'tensorflow.python.framework.ops.Tensor'>


In [17]:
# Encodes columns of encoded rows.
encoded_columns = LSTM(col_hidden)(encoded_rows)


In [18]:
print (type(encoded_columns))

<class 'tensorflow.python.framework.ops.Tensor'>


In [20]:
# Final predictions and model.
prediction = Dense(num_classes, activation='softmax')(encoded_columns)
print (type(prediction))
model = Model(x, prediction)
print (type(model))
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])


<class 'tensorflow.python.framework.ops.Tensor'>
<class 'keras.engine.training.Model'>


In [22]:
print (len(model.layers))
print (model.layers)


4
[<keras.engine.topology.InputLayer object at 0x7fd843a50350>, <keras.layers.wrappers.TimeDistributed object at 0x7fd843a502d0>, <keras.layers.recurrent.LSTM object at 0x7fd8982924d0>, <keras.layers.core.Dense object at 0x7fd841765d50>]


In [24]:
startTime= datetime.now()
print ("start")
# Training.
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
timeElapsed=datetime.now()-startTime
print('Time elpased (hh:mm:ss.ms) {}'.format(timeElapsed))


start
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Time elpased (hh:mm:ss.ms) 0:14:44.503536


In [27]:

startTime= datetime.now()
# Evaluation.
print ("start")
scores = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

timeElapsed=datetime.now()-startTime
print('Time elpased (hh:mm:ss.ms) {}'.format(timeElapsed))


start
Test loss: 0.0560403121914
Test accuracy: 0.9829
Time elpased (hh:mm:ss.ms) 0:00:09.550869


In [None]:
#x_test, y_test

In [31]:
type(model)

keras.engine.training.Model

In [32]:
startTime= datetime.now()
print ("start")
pred = model.predict(x_test, batch_size=batch_size, verbose=0, steps=None)
print (type(pred))
timeElapsed=datetime.now()-startTime
print('Time elpased (hh:mm:ss.ms) {}'.format(timeElapsed))


<type 'numpy.ndarray'>
Time elpased (hh:mm:ss.ms) 0:00:09.799174


In [33]:
print (type(pred), pred.shape)

<type 'numpy.ndarray'> (10000, 10)


In [34]:
pred[0]

array([  4.73742557e-06,   6.86850944e-06,   6.52822928e-05,
         8.68870993e-05,   1.49033094e-05,   1.81596097e-05,
         2.39001406e-06,   9.99712646e-01,   8.41233577e-06,
         7.96956665e-05], dtype=float32)

In [49]:
pred_class = np.argmax(pred,axis=-1)
print (" pred_class:", pred_class[0:10])
print ("y_test_orig:", y_test_orig[0:10])

 pred_class: [7 2 1 0 4 1 4 9 5 9]
y_test_orig: [7 2 1 0 4 1 4 9 5 9]


In [52]:
match = 0.
for i in range(0, len(y_test_orig)):
    if y_test_orig[i] == pred_class[i]:
        match += 1
print ("calc'd accuracy:", match/len(y_test_orig))

#NB: matches are exact because y_test was categorical ie 0 or 1 not predicting probability for each category.

calc'd accuracy: 0.9829
