# Recurrent neural networks (RNNs) in `keras`

Examples: https://github.com/fchollet/keras/tree/master/examples

To consider/compare:

https://github.com/fchollet/keras/blob/master/examples/imdb_fasttext.py

https://github.com/fchollet/keras/blob/master/examples/lstm_benchmark.py

https://github.com/fchollet/keras/blob/master/examples/imdb_bidirectional_lstm.py

In [6]:
import numpy as np
import tensorflow.contrib.keras as keras

## Loading the IMDB sentiment classification dataset

Dataset used here is [IMDB Movie reviews sentiment classification dataset](https://keras.io/datasets/#imdb-movie-reviews-sentiment-classification) available through `keras`:

- 25,000 movies reviews from IMDB
- binary labeling by sentiment (positive/negative)
- reviews encoded as a sequence of integers representing word indices
- words indexed by overall frequency, "1" is corresponds to most frequent word
- `imdb.load_data()` comes with option to consider only the top most frequent words

In [3]:
max_features = 20000
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=max_features)

In [10]:
x_train[0][:5]

[1, 14, 22, 16, 43]

In [11]:
y_train[:5]

array([1, 0, 0, 1, 0])

In [7]:
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Average train sequence length: {}'.format(np.mean(list(map(len, x_train)), dtype=int)))
print('Average test sequence length: {}'.format(np.mean(list(map(len, x_test)), dtype=int)))

25000 train sequences
25000 test sequences
Average train sequence length: 238
Average test sequence length: 230


In [None]:
# TODO do the above by class