**LSTM GRU BRNN**

In [0]:
from time import time
import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

We use Keras to load the imdb dataset, limiting it to the 10000 most common words.

In [0]:
from tensorflow.keras.datasets import imdb 

num_words = 10000

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=num_words)

print("Number of exemples train set: %d" % len(X_train))
print("Number of exemples test set: %d" % len(X_test))
print(X_train[0])
print(y_train[0])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
Number of exemples train set: 25000
Number of exemples test set: 25000
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 

The reviews within the corpus of text obviously have different length, we use the pad_sequences function of keras to limit sequences to 500 elements (in our case limit sentences to 500 words). If a sequence has less than 500 examples, zeros will be added at the end.

In [0]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

maxlen = 500

X_train = pad_sequences(X_train, maxlen = maxlen)
X_test = pad_sequences(X_test, maxlen = maxlen)

**LSTM model**

The first layer will embedding creating 100 embedding vectors for each of the 10,000 words in our dictionary.

The second layer is the recurring Long-short term memory layer.

The third layer will calculate the network ouput, being a binary classification problem (positive/negative review) the activation function will be the sigmoid.

In [0]:
from tensorflow.keras.layers import Embedding, LSTM, Dropout

model = Sequential()

model.add(Embedding(num_words, 100))
model.add(LSTM(32, dropout=0.5, recurrent_dropout=0.2, return_sequences=True))
model.add(LSTM(32, dropout=0.5, recurrent_dropout=0.2))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
start_at = time()
model.summary()
model.fit(X_train, y_train, batch_size=512, validation_split=0.2, epochs=5)

model.evaluate(X_test, y_test)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 100)         1000000   
_________________________________________________________________
lstm (LSTM)                  (None, None, 32)          17024     
_________________________________________________________________
lstm_1 (LSTM)                (None, 32)                8320      
_________________________________________________________________
dropout_1 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
Total params: 1,025,377
Trainable params: 1,025,377
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.3354117274284363, 0.8557999730110168]

**GRU**

Unlike LSTMs, Gated Recurrent Units (GRUs) require fewer tensor calculations and therefore usually lead to similar results in less time.

In [0]:
from tensorflow.keras.layers import Embedding, GRU, Dropout

model = Sequential()

model.add(Embedding(num_words, 100))
model.add(GRU(32, dropout=0.5, recurrent_dropout=0.2, return_sequences=True))
model.add(GRU(32, dropout=0.5, recurrent_dropout=0.2))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
start_at = time()
model.summary()
model.fit(X_train, y_train, batch_size=512, validation_split=0.2, epochs=5)

model.evaluate(X_test, y_test)

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 100)         1000000   
_________________________________________________________________
gru (GRU)                    (None, None, 32)          12864     
_________________________________________________________________
gru_1 (GRU)                  (None, 32)                6336      
_________________________________________________________________
dropout (Dropout)            (None, 32)                0         
_________________________________________________________________
dense (Dense)                (None, 1)                 33        
Total params: 1,019,233
Trainable params: 1,019,233
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.37949562072753906, 0.8409600257873535]

**BRNN** (Bidirectional Recurrent Neural Networks)

In [0]:
from tensorflow.keras.layers import  Embedding, Dense, LSTM, Activation, Bidirectional, Dropout

model = Sequential()

model.add(Embedding(num_words, 100))
model.add(Bidirectional(LSTM(32, dropout=0.5, recurrent_dropout=0.2, return_sequences = True)))
model.add(Bidirectional(LSTM(32, dropout=0.5, recurrent_dropout=0.2)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
start_at = time()
model.summary()
model.fit(X_train, y_train, batch_size=512, validation_split=0.2, epochs=5)

model.evaluate(X_test, y_test)


Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, None, 100)         1000000   
_________________________________________________________________
bidirectional_2 (Bidirection (None, None, 64)          34048     
_________________________________________________________________
bidirectional_3 (Bidirection (None, 64)                24832     
_________________________________________________________________
dropout_3 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 65        
Total params: 1,058,945
Trainable params: 1,058,945
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.32699453830718994, 0.8671200275421143]



Comparison of the networks

* From my experience, GRUs train faster and perform better than LSTMs on less training data if you are doing language modeling (not sure about other tasks).
* GRUs are simpler and thus easier to modify, for example adding new gates in case of additional input to the network. It’s just less code in general.
* LSTMs should, in theory, remember longer sequences than GRUs and outperform them in tasks requiring modeling long-distance relations.
* The GRUs also have less parameter complexity than LSTM which can be seen from the model summaries above.
* The simple RNNs only have simple recurrent operations without any gates to control the flow of information among the cells.
* BRNN are doing almost double recurrence so taking more time

