# **RNN**
A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior.

IMDB sentiment classification task

This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. IMDB provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.

You can download the dataset from http://ai.stanford.edu/~amaas/data/sentiment/  or you can directly use 
" from keras.datasets import imdb " to import the dataset.

Few points to be noted:
Modules like SimpleRNN, LSTM, Activation layers, Dense layers, Dropout can be directly used from keras
For preprocessing, you can use required 

In [44]:
#load the imdb dataset 
from keras.datasets import imdb

vocabulary_size = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = vocabulary_size)
print('Loaded dataset with {} training samples, {} test samples'.format(len(X_train), len(X_test)))

Loaded dataset with 25000 training samples, 25000 test samples


In [45]:
#the review is stored as a sequence of integers. 
# These are word IDs that have been pre-assigned to individual words, and the label is an integer

print('---review---')
print(X_train[0])
print('---label---')
print(y_train[0])

# to get the actual review
word2id = imdb.get_word_index()
id2word = {i: word for word, i in word2id.items()}
print('---review with words---')
print([id2word.get(i, ' ') for i in X_train[0]])
print('---label---')
print(y_train[0])

---review---
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 2, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 2, 19, 178, 32]
---label---
1
---review with words---
['the', 'as', 'you', 'wi

In [46]:
#pad sequences (write your code here)
from keras.preprocessing import sequence
X_train = sequence.pad_sequences(X_train, maxlen = 500)
X_test = sequence.pad_sequences(X_test, maxlen = 500)

In [62]:
#design a RNN model (write your code)
from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout, SimpleRNN

rnn = Sequential()
rnn.add(Embedding(vocabulary_size, 32, input_length = 500))
rnn.add(SimpleRNN(100, dropout = 0.2))
rnn.add(Dense(1, activation='sigmoid'))
print(rnn.summary())

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_7 (Embedding)     (None, 500, 32)           160000    
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, 100)               13300     
                                                                 
 dense_7 (Dense)             (None, 1)                 101       
                                                                 
Total params: 173,401
Trainable params: 173,401
Non-trainable params: 0
_________________________________________________________________
None


In [63]:
#train and evaluate your model
#choose your loss function and optimizer and mention the reason to choose that particular loss function and optimizer
# use accuracy as the evaluation metric

rnn.compile(
    loss = 'binary_crossentropy',
    optimizer = 'adam',
    metrics = ['accuracy']
)

Adam seemed to work better when compared to SGD.

In [64]:
batch_size = 64
num_epochs = 3

x1 = X_train[:batch_size]
y1 = y_train[:batch_size]
x_valid = X_train[batch_size:]
y_valid = y_train[batch_size:]
rnn.fit(
    x1, y1,
    validation_data = (x_valid, y_valid),
    batch_size = batch_size,
    epochs = num_epochs
)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f40fa0c96d0>

In [65]:
#evaluate the model using model.evaluate()
scores = rnn.evaluate(X_test, y_test, verbose = 0)
print('Accuracy (SimpleRNN): ', scores[1])

Accuracy (SimpleRNN):  0.5072799921035767


# **LSTM**

Instead of using a RNN, now try using a LSTM model and compare both of them. Which of those performed better and why ?


In [56]:
lstm = Sequential()
lstm.add(Embedding(vocabulary_size, 32, input_length = 500))
lstm.add(LSTM(100))
lstm.add(Dense(1, activation = 'sigmoid'))
print(lstm.summary())

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_6 (Embedding)     (None, 500, 32)           160000    
                                                                 
 lstm_4 (LSTM)               (None, 100)               53200     
                                                                 
 dense_6 (Dense)             (None, 1)                 101       
                                                                 
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
None


In [57]:
lstm.compile(
    loss = 'binary_crossentropy',
    optimizer = 'adam',
    metrics = ['accuracy']
)

In [58]:
batch_size = 64
num_epochs = 3

x1 = X_train[:batch_size]
y1 = y_train[:batch_size]
x_valid = X_train[batch_size:]
y_valid = y_train[batch_size:]

lstm.fit(
    x1, y1,
    validation_data = (x_valid, y_valid),
    batch_size = batch_size,
    epochs = num_epochs
)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f40f6ff9190>

Perform Error analysis and explain using few examples.

In [61]:
scores = model.evaluate(X_test, y_test, verbose=0)
print('Accuracy (LSTM): ', scores[0])

Accuracy (LSTM):  0.7238239049911499


Running this for more epochs, might yield better results.

LSTMS are form of RNN that control the relative influence of the input and prior value of the hidden state. Compared to traditional RNN, LSTM units are able to learn the temporal dependency of the input sequence.

There are two components in LSTM unit:
* Cell state: cell state is the output of the LSTM and is a function of the current input, the previously hidden state and the previous output
* Forget gate: forget gate decides how much the cell state will be influenced by the previous state.

How to achieve better results?

1. Mapping using a simple ID won't be sufficient.
1. Consider constructing better embeddings for the words before running the model.
2. Consider using more layers.
3. Consider stacking multiple layers to capture longer time-series dimensions and also to reduce some of the vanishing gradient problem.
4. Consider using an ensemble of smaller neural networks in a stacked architecture to capture the temporal aspect.