<a href="https://colab.research.google.com/github/Urav-Dalal/synapseLP/blob/nlpW3/NLPw3rnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color='#6629b2'>Predicting sentiment ratings with RNN using Keras</font>


## <font color='#6629b2'>Study Material</font>
- https://web.stanford.edu/~jurafsky/slp3/9.pdf
- https://www.youtube.com/watch?v=UNmqTiOnRfg
- https://www.youtube.com/watch?v=WCUNPb-5EYI
- https://www.youtube.com/watch?v=OuYtk9Ymut4

## <font color='#6629b2'>Dataset</font>

The Large Movie Review Datasetconsists of 50,000 movie reviews from [IMDB](http://www.imdb.com/). The ratings are on a 1-10 scale, but the dataset specifically contains "polarized" reviews: positive reviews with a rating of 7 or higher, and negative reviews with a rating of 4 or lower. There are an equal number of positive and negative reviews. 

You can download the dataset from: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

In [None]:

from google.colab import drive
drive.mount("/content/gdrive")

Mounted at /content/gdrive


In [None]:
'''Load the dataset into variable "reviews". You can truncate the dataset to keep a few hundred records if it's
    taking too long to process/train. Keep in mind, bigger the dataset, higher the accuracy score!
'''
import pandas as pd
import numpy as np

reviews = pd.read_csv('/content/gdrive/MyDrive/synapseNLPw1/imdbreview.csv')


reviews

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
...,...,...
49995,I thought this movie did a down right good job...,positive
49996,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49997,I am a Catholic taught in parochial elementary...,negative
49998,I'm going to have to disagree with the previou...,negative


In [None]:
review = reviews.head(150)
review

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
...,...,...
145,I remember seeing this film in the theater in ...,positive
146,A family is traveling through the mid West. Th...,positive
147,Francis Ford Coppola wrote and directed this s...,positive
148,This movie was not very well directed. they al...,negative


In [None]:
#Split the data into train and split
from sklearn.model_selection import train_test_split
train_reviews,test_reviews,train_sentiment,test_sentiment = train_test_split(review['review'],review['sentiment'],test_size=0.3)
train_reviews=pd.DataFrame(train_reviews,columns=['review'])
test_reviews=pd.DataFrame(test_reviews,columns=['review'])
train_sentiment=pd.DataFrame(train_sentiment,columns=['sentiment'])
test_sentiment=pd.DataFrame(test_sentiment,columns=['sentiment'])
train_reviews

Unnamed: 0,review
129,I remember seeing this film in the mid 80's th...
49,Average (and surprisingly tame) Fulci giallo w...
23,"First of all, let's get a few things straight ..."
138,I just watched this movie on it's premier nigh...
96,My guess would be this was originally going to...
...,...
55,As someone has already mentioned on this board...
2,I thought this was a wonderful way to spend ti...
79,This film took me by surprise. I make it a hab...
95,Daniel Day-Lewis is the most versatile actor a...


In [None]:
# convert the above train test datasets into a pandas dataframe. You shoud have 4 dataframes.
# Name them train_reviews, test_reviews, train_sentiment, test_sentiment.
# In train_sentiment and test_sentiment, convert "positive" to 1 and "negative" to 0
train_sentiment['sentiment'] = train_sentiment['sentiment'].map({'positive':1,'negative':0})
test_sentiment['sentiment'] = test_sentiment['sentiment'].map({'positive':1,'negative':0})

test_sentiment

Unnamed: 0,sentiment
75,1
124,1
145,1
108,1
139,0
59,1
14,1
120,1
89,0
72,1


In [None]:
train_reviews.head()

Unnamed: 0,review
129,I remember seeing this film in the mid 80's th...
49,Average (and surprisingly tame) Fulci giallo w...
23,"First of all, let's get a few things straight ..."
138,I just watched this movie on it's premier nigh...
96,My guess would be this was originally going to...


## <font color='#6629b2'>Preparing the data</font>

###  <font color='#6629b2'>Tokenization</font>

The first preprocessing step is to tokenize each of the reviews into (lowercased) individual words, since the models will encode the reviews at the word level (rather than subword units like characters, for example). For this we'll use [spaCy](https://spacy.io/), which is a fast and extremely user-friendly library that performs various language processing tasks. Once you load a spaCy model for a particular language, you can provide any text as input to the model, e.g. encoder(text) and access its linguistic features.

In [None]:
'''Lowercase and tokenise all the reviews in train_reviews using spacy'''

import spacy
encoder = spacy.load('en_core_web_sm')

def text_to_tokens(text_seqs):
    #complete this function that lowers and tokenizes the reviews
    lis=[]
    for i in range(len(text_seqs)):
      lis.append(text_seqs.iloc[i].lower())
    for i in range(len(text_seqs)):
      lis[i] = encoder(lis[i])
    return lis

train_reviews['tokenized review'] = text_to_tokens(train_reviews['review'])
train_reviews



Unnamed: 0,review,tokenized review
129,I remember seeing this film in the mid 80's th...,"(i, remember, seeing, this, film, in, the, mid..."
49,Average (and surprisingly tame) Fulci giallo w...,"(average, (, and, surprisingly, tame, ), fulci..."
23,"First of all, let's get a few things straight ...","(first, of, all, ,, let, 's, get, a, few, thin..."
138,I just watched this movie on it's premier nigh...,"(i, just, watched, this, movie, on, it, 's, pr..."
96,My guess would be this was originally going to...,"(my, guess, would, be, this, was, originally, ..."
...,...,...
55,As someone has already mentioned on this board...,"(as, someone, has, already, mentioned, on, thi..."
2,I thought this was a wonderful way to spend ti...,"(i, thought, this, was, a, wonderful, way, to,..."
79,This film took me by surprise. I make it a hab...,"(this, film, took, me, by, surprise, ., i, mak..."
95,Daniel Day-Lewis is the most versatile actor a...,"(daniel, day, -, lewis, is, the, most, versati..."


###  <font color='#6629b2'>Lexicon</font>

Then we need to assemble a lexicon (aka vocabulary) of words that the model needs to know. Each tokenized word in the reviews is added to the lexicon, and then each word is mapped to a numerical index that can be read by the model. Since large datasets may contain a huge number of unique words, it's common to filter all words occurring less than a certain number of times, and replace them with some generic &lt;UNK&gt; token. The min_freq parameter in the function below defines this threshold. When assigning the indices, the number 1 will represent unknown words. The number 0 will represent "empty" word slots, which is explained below. Therefore "real" words will have indices of 2 or higher.

In [None]:
'''Count tokens (words) in texts and add them to the lexicon'''

import pickle

def make_lexicon(token_seqs, min_freq=1, use_padding=False):
    # First, count how often each word appears in the text. Save this count in a dictionary called token_counts
    token_counts = {}
    for token in token_seqs:
      for word in token:
        if word in token_counts:
          token_counts[word]+=1
        else:
          token_counts[word]=1
      
    

    # Then, assign each word to a numerical index, i.e save  all these words in a list. Filter words that occur less than or equal to min_freq times.
    
    lexicons = []
    for token,count in token_counts.items():
      if count>=min_freq:
        lexicons.append(token)
    
    
    #create a dictionary lexicon that maps each word to its index. Note that indexes will start from 2,  index 0 is saved for padding and index 1 for unknown words ('<UNK>')
    lexicon = {token:idx+2 for idx,token in enumerate(lexicons)}
    
    lexicon[u'<UNK>'] = 1 # Unknown words are those that occur fewer than min_freq times
    lexicon_size = len(lexicon)
    print(lexicon)

    print("LEXICON SAMPLE ({} total items):".format(lexicon_size))
    print(dict(list(lexicon.items())[:20]))
    
    return lexicon

lexicon = make_lexicon(token_seqs=train_reviews['tokenized review'], min_freq=1)

{i: 2, remember: 3, seeing: 4, this: 5, film: 6, in: 7, the: 8, mid: 9, 80: 10, 's: 11, thought: 12, it: 13, a: 14, well: 15, paced: 16, and: 17, well: 18, acted: 19, piece: 20, .: 21, i: 22, now: 23, work: 24, quite: 25, often: 26, in: 27, berkeley: 28, square: 29, and: 30, the: 31, had: 32, to: 33, get: 34, a: 35, copy: 36, of: 37, dvd: 38, to: 39, remind: 40, myself: 41, how: 42, little: 43, the: 44, area: 45, has: 46, changed: 47, ,: 48, although: 49, my: 50, office: 51, is: 52, newish: 53, it: 54, just: 55, 30: 56, seconds: 57, away: 58, from: 59, ": 60, the: 61, bank: 62, ": 63, .: 64, even: 65, jack: 66, barclays: 67, car: 68, dealership: 69, is: 70, still: 71, there: 72, selling: 73, bentleys: 74, and: 75, rolls: 76, royces.<br: 77, /><br: 78, />it: 79, 's: 80, look: 81, like: 82, the: 83, dvd: 84, is: 85, due: 86, a: 87, region: 88, 2: 89, release: 90, soon: 91, .: 92, the: 93, region: 94, 1: 95, copy: 96, i: 97, is: 98, very: 99, poor: 100, quality: 101, .: 102, let: 103, 's:

###  <font color='#6629b2'>From strings to numbers</font>

Once the lexicon is built, we can use it to transform each review from a list of string tokens into a list of numerical indices.

In [None]:
'''Convert each review from a list of tokens to a list of numbers (indices)'''

def tokens_to_idxs(token_seqs, lexicon): 
    #complete this function to return a list of indexed tokens 
    idx_seqs = [[lexicon[token] if token in lexicon else lexicon['<UNK>'] for token in data] for data in token_seqs]
    # idx_seqs = []
    # for tokens in token_seqs:
    #   idx_seq = [lexicon[token] for token in tokens if token in lexicon]
    #   idx_seqs.append(idx_seq)
    return idx_seqs

train_reviews['Review_Idxs'] = tokens_to_idxs(token_seqs=train_reviews['tokenized review'], lexicon=lexicon)
                                   
train_reviews[['tokenized review', 'Review_Idxs']][:10]

Unnamed: 0,tokenized review,Review_Idxs
129,"(i, remember, seeing, this, film, in, the, mid...","[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1..."
49,"(average, (, and, surprisingly, tame, ), fulci...","[144, 145, 146, 147, 148, 149, 150, 151, 152, ..."
23,"(first, of, all, ,, let, 's, get, a, few, thin...","[261, 262, 263, 264, 265, 266, 267, 268, 269, ..."
138,"(i, just, watched, this, movie, on, it, 's, pr...","[654, 655, 656, 657, 658, 659, 660, 661, 662, ..."
96,"(my, guess, would, be, this, was, originally, ...","[999, 1000, 1001, 1002, 1003, 1004, 1005, 1006..."
8,"(encouraged, by, the, positive, comments, abou...","[1211, 1212, 1213, 1214, 1215, 1216, 1217, 121..."
73,"(i, am, not, a, golf, fan, by, any, means, ., ...","[1364, 1365, 1366, 1367, 1368, 1369, 1370, 137..."
143,"(this, cute, animated, short, features, two, c...","[1571, 1572, 1573, 1574, 1575, 1576, 1577, 157..."
115,"(this, is, one, of, the, finest, movies, i, ha...","[1660, 1661, 1662, 1663, 1664, 1665, 1666, 166..."
110,"(apparently, ,, the, people, that, wrote, the,...","[1808, 1809, 1810, 1811, 1812, 1813, 1814, 181..."


In [None]:
'''Encode reviews (train_reviews['Review_Idxs']) as bag-of-words vectors'''

import numpy as np

def idx_seqs_to_bows(idx_seqs, matrix_length):
    #complete the function to return an array having bag-of-words vectors of the encoded reviews
    # hint: numpy.bincount()
    bow_seqs = []
    for token in idx_seqs:
      bow_seqs.append(np.bincount(np.array(token),minlength=matrix_length))
    bow_seqs=np.array(bow_seqs)
    return bow_seqs
bow_train_reviews = idx_seqs_to_bows(train_reviews['Review_Idxs'], 
                                     matrix_length=len(lexicon) + 1) #add one to length for padding)

print("TRAIN INPUT:\n", bow_train_reviews)
print("SHAPE:", bow_train_reviews.shape, "\n")

#Showing an example mapping string words to counts
lexicon_lookup = {idx: lexicon_item for lexicon_item, idx in lexicon.items()}
lexicon_lookup[0] = ""
pd.DataFrame([(lexicon_lookup[idx], count) for idx, count in enumerate(bow_train_reviews[0])], 
                 columns=['Word', 'Count'])

TRAIN INPUT:
 [[0 0 1 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 1 1 1]]
SHAPE: (105, 28507) 



Unnamed: 0,Word,Count
0,,0
1,<UNK>,0
2,i,1
3,remember,1
4,seeing,1
...,...,...
28502,.,0
28503,save,0
28504,your,0
28505,money,0


##  <font color='#6629b2'>Building a Recurrent Neural Network </font>



###  <font color='#6629b2'>Numerical lists to matrices</font>

The input representation for the RNN explicitly encodes the order of words in the review. We'll return to the lists of the word indices contained in train_reviews['Review_Idxs']. The input to the model will be these number sequences themselves. We need to put all the reviews in the training set into a single matrix, where each row is a review and each column is a word index in that sequence. This enables the model to process multiple sequences in parallel (batches) as opposed to one at a time. Using batches significantly speeds up training. However, each review has a different number of words, so we create a padded matrix equal to the length on the longest review in the training set. For all reviews with fewer words, we prepend the row with zeros representing an empty word position. This is why the number 0 was not assigned as a word index in the lexicon. We can tell Keras to ignore these zeros during training.

In [None]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

def pad_idx_seqs(idx_seqs):
    
    #find the biggest review's length and save it in the variable below
    ls=[]
    for token in idx_seqs:
      ls.append(len(token))
    
    max_seq_len = max(ls)
    
    #pad all these indexed reviews and return these padded sequences
    #HINT: use pad_sequences function by keras
    padded_idxs = pad_sequences(sequences=idx_seqs, maxlen=max_seq_len)
    return padded_idxs
    # lst=[]
    # for token in idx_seqs:
    #   token=np.array(token)
    #   lst.append(pad_sequences(token.reshape(1,token.shape[0]),maxlen=max_seq_len))
    # return np.array(lst)
    


train_padded_idxs = pad_idx_seqs(train_reviews['Review_Idxs'])

print("TRAIN INPUT:\n", train_padded_idxs)
print("SHAPE:", train_padded_idxs.shape, "\n")

TRAIN INPUT:
 [[    0     0     0 ...   141   142   143]
 [    0     0     0 ...   258   259   260]
 [    0     0     0 ...   651   652   653]
 ...
 [    0     0     0 ... 27892 27893 27894]
 [    0     0     0 ... 28223 28224 28225]
 [    0     0     0 ... 28504 28505 28506]]
SHAPE: (105, 876) 



###  <font color='#6629b2'>Model Layers</font>
The RNN will have four layers:

**1. Input**: The input layer takes in the matrix of word indices.

**2. Embedding**: A [layer](https://keras.io/layers/embeddings/) that converts integer word indices into distributed vector representations (embeddings), which were introduced above. The difference here is that rather than plugging in embeddings from a pretrained model as before, the word embeddings will be learned inside the model itself. Thus, the input to the model will be the word indices rather than their embeddings, and the embedding values will change as the model is trained. The mask_zero=True parameter in this layer indicates that values of 0 in the matrix (the padding) will be ignored by the model.

**3. GRU**: A [recurrent (GRU) hidden layer](https://keras.io/layers/recurrent/), the central component of the model. As it observes each word in the review, it integrates the word embedding representation with what it's observed so far to compute a representation (hidden state) of the review at that timepoint. There are a few architectures for this layer - we use the GRU variation, Keras also provides LSTM or just the simple vanilla recurrent layer (see the materials at the bottom for an explanation of the difference). This layer outputs the last hidden state of the sequence (i.e. the hidden representation of the review after its last word is observed).

**4. Dense**: An output [layer](https://keras.io/layers/core/#dense) that predicts the rating for the review based on its GRU representation given by the previous layer. It has one dimension that contains a continuous value (the rating). Add a proper activation function.

###  <font color='#6629b2'>Parameters</font>

Our function for creating the RNN takes the following parameters:

**n_input_nodes**: As with the standard bag-of-words MLP, this is the number of unique words in the lexicon, plus one to account for the padding represented by 0 values. This indicates the number of rows in the embedding layer, where each row corresponds to a word.

**n_embedding_nodes**: the number of dimensions (units) in the embedding layer, which can be freely defined. Here, it is set to 300.

**n_hidden_nodes**: the number of dimensions in the GRU hidden layer. Like the embedding layer, this can be freely chosen. Here, it is set to 500.

In [None]:

'''Create the model'''

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Embedding
from tensorflow.keras.layers import GRU, Dense
from tensorflow.keras.layers import InputLayer

def create_rnn_model(n_input_nodes, n_embedding_nodes, n_hidden_nodes):
    
    #complete this function to create a model and compile it having the 4 layers listed above.
    #Note: Layer 1 -  Technically the shape of this layer is (batch_size, len(train_padded_idxs)).
    # However, both the batch size and the length of the input matrix can be inferred from the input at training time. 
    # The batch size is implicitly included in the shape of the input, so it does not need to 
    # be specified as a dimension of the input. None can be given as placeholder for the input matrix length.
    # By defining it as None, the model is flexible in accepting inputs with different lengths.
    input_layer = Input(shape=(None,))
    # Layer 2
    embedding = Embedding(input_dim=n_input_nodes,output_dim=n_embedding_nodes,mask_zero=True)(input_layer)
    # Layer 3
    gru = GRU(units=n_hidden_nodes)(embedding)
    # Layer 4
    output_layer = Dense(units=1,activation='sigmoid')(gru)
    #Specify which layers are input and output, compile model with loss and optimization functions
    model = Model(inputs=[input_layer],outputs=output_layer)
    model.compile(optimizer='adam',loss='mean_squared_error')
    return model


In [None]:
rnn_model = create_rnn_model(n_input_nodes=len(lexicon) + 1, n_embedding_nodes=300, n_hidden_nodes=500)

###  <font color='#6629b2'>Training</font>

In [None]:
'''
Train the rnn_model using the padded sequences and y=train_reviews['Rating'].
You need  to convert train_reviews['Rating'] to tensor before passing it as an argument
Hint: tf.convert_to_tensor
batch_size=20, epochs=5
'''

y=train_sentiment['sentiment']
y=tf.convert_to_tensor(y)
c = rnn_model.fit(train_padded_idxs,y,batch_size=20,epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


###  <font color='#6629b2'>Prediction</font>

In [None]:
'''Put test reviews in padded matrix just how we did for train_reviews'''
test_reviews['tokenized review'] = text_to_tokens(test_reviews['review'])
lexicon = make_lexicon(token_seqs=test_reviews['tokenized review'], min_freq=1)
test_reviews['Review_Idxs'] = tokens_to_idxs(token_seqs=test_reviews['tokenized review'], lexicon=lexicon)
test_padded_idxs = pad_idx_seqs(test_reviews['Review_Idxs'])

print("TEST INPUT:\n", test_padded_idxs)
print("SHAPE:", test_padded_idxs.shape, "\n")

{it: 2, tries: 3, to: 4, be: 5, the: 6, epic: 7, adventure: 8, of: 9, the: 10, century: 11, .: 12, and: 13, with: 14, a: 15, cast: 16, like: 17, shô: 18, kasugi: 19, ,: 20, christopher: 21, lee: 22, and: 23, john: 24, -: 25, rhys: 26, davies: 27, it: 28, really: 29, is: 30, the: 31, perfect: 32, b: 33, -: 34, adventure: 35, of: 36, all: 37, time: 38, .: 39, it: 40, 's: 41, actually: 42, is: 43, a: 44, pretty: 45, fun: 46, ,: 47, swashbuckling: 48, adventure: 49, that: 50, ,: 51, even: 52, with: 53, it: 54, 's: 55, flaws: 56, ,: 57, captures: 58, your: 59, interest: 60, .: 61, it: 62, must: 63, have: 64, felt: 65, as: 66, the: 67, biggest: 68, movie: 69, ever: 70, for: 71, the: 72, people: 73, who: 74, made: 75, it: 76, .: 77, even: 78, if: 79, it: 80, 's: 81, made: 82, in: 83, the: 84, 90s: 85, ,: 86, it: 87, does: 88, n't: 89, have: 90, a: 91, modern: 92, feel: 93, .: 94, it: 95, more: 96, has: 97, the: 98, same: 99, feeling: 100, that: 101, a: 102, old: 103, errol: 104, flynn: 105, m

In [None]:
'''Predict the ratings '''

#Since ratings are integers, need to round predicted rating to nearest integer
test_reviews['RNN_Pred_Rating'] = np.round(rnn_model.predict(test_padded_idxs)[:,0]).astype(int)
test_reviews



Unnamed: 0,review,tokenized review,Review_Idxs,RNN_Pred_Rating
75,It tries to be the epic adventure of the centu...,"(it, tries, to, be, the, epic, adventure, of, ...","[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1...",1
124,This tale based on two Edgar Allen Poe pieces ...,"(this, tale, based, on, two, edgar, allen, poe...","[182, 183, 184, 185, 186, 187, 188, 189, 190, ...",1
145,I remember seeing this film in the theater in ...,"(i, remember, seeing, this, film, in, the, the...","[357, 358, 359, 360, 361, 362, 363, 364, 365, ...",1
108,"Despite later claims, this early-talkie melodr...","(despite, later, claims, ,, this, early, -, ta...","[708, 709, 710, 711, 712, 713, 714, 715, 716, ...",1
139,I caught this film on AZN on cable. It sounded...,"(i, caught, this, film, on, azn, on, cable, .,...","[924, 925, 926, 927, 928, 929, 930, 931, 932, ...",1
59,"I just watched The Dresser this evening, havin...","(i, just, watched, the, dresser, this, evening...","[1237, 1238, 1239, 1240, 1241, 1242, 1243, 124...",1
14,This a fantastic movie of three prisoners who ...,"(this, a, fantastic, movie, of, three, prisone...","[1757, 1758, 1759, 1760, 1761, 1762, 1763, 176...",1
120,this movie gets a 10 because there is a lot of...,"(this, movie, gets, a, 10, because, there, is,...","[1814, 1815, 1816, 1817, 1818, 1819, 1820, 182...",1
89,Hollywood movie industry is the laziest one in...,"(hollywood, movie, industry, is, the, laziest,...","[1932, 1933, 1934, 1935, 1936, 1937, 1938, 193...",1
72,I thought that Mukhsin has been wonderfully wr...,"(i, thought, that, mukhsin, has, been, wonderf...","[2363, 2364, 2365, 2366, 2367, 2368, 2369, 237...",1


###  <font color='#6629b2'>Evaluation</font>

In [None]:
'''Evaluate the model with R^2'''

# print the r2 score
from sklearn.metrics import r2_score 
y_true = test_sentiment['sentiment']
y_pred = test_reviews['RNN_Pred_Rating']
r2_score(y_true,y_pred)

-0.8749999999999998

On the full test dataset of 25,000 reviews, the $R^2$ for this model is 0.622525. So the RNN outperforms the continuous bag-of-words MLP as well as the standard bag-of-words approach.
Your score might not be good because we're training on only 100-200 reviews.

### <font color='#6629b2'>Visualizing data inside the model</font>

To help visualize the data representation inside the model, we can look at the output of each layer in a model individually. Keras' Functional API lets you derive a new model with the layers from an existing model, so you can define the output to be a layer below the output layer in the original model. Calling predict() on this new model will produce the output of that layer for a given input. Of course, glancing at the numbers by themselves doesn't provide any interpretation of what the model has learned (although there are opportunities to [interpret these values](https://medium.com/civis-analytics/interpreting-and-visualizing-neural-networks-for-text-processing-e9dff0da9c22), but seeing them verifies the model is just a series of transformations from one matrix to another. The model stores its layers as the list model.layers, and you can retrieve specific layer by its position index in the model.

In [None]:
'''Showing the output of the RNN embedding layer (second layer) for the test reviews'''

embedding_layer = Model(inputs=rnn_model.layers[0].input, 
                        outputs=rnn_model.layers[1].output) #embedding layer is 2nd layer (index 1)
embedding_output = embedding_layer.predict(test_padded_idxs)
print("EMBEDDING LAYER OUTPUT SHAPE:", embedding_output.shape)
print(embedding_output[0])

EMBEDDING LAYER OUTPUT SHAPE: (45, 733, 300)
[[ 0.01497246 -0.00994288  0.04098307 ... -0.01046517  0.04444816
  -0.02171963]
 [ 0.01497246 -0.00994288  0.04098307 ... -0.01046517  0.04444816
  -0.02171963]
 [ 0.01497246 -0.00994288  0.04098307 ... -0.01046517  0.04444816
  -0.02171963]
 ...
 [-0.02047932 -0.03564006 -0.00668221 ... -0.01123598 -0.01489099
   0.01239465]
 [ 0.02762319 -0.00451954  0.04755603 ... -0.02774454 -0.01601022
  -0.03016751]
 [-0.01145597 -0.04320019  0.03428584 ... -0.04236094 -0.02177334
   0.04770444]]


## <font color='#6629b2'>Conclusion</font>

As mentioned above, the models shown here could be applied to any task where the goal is to predict a score for a particular sequence. For ratings prediction, this output is ordinal, but it could also be categorical with a few simple changes to the output layer of the model. 

## <font color='#6629b2'>More resources</font>

Yoav Goldberg's book [Neural Network Methods for Natural Language Processing](http://www.morganclaypool.com/doi/abs/10.2200/S00762ED1V01Y201703HLT037) is a thorough introduction to neural networks for NLP tasks in general.

If you'd like to learn more about what Keras is doing under the hood, there is a [Theano tutorial](http://deeplearning.net/tutorial/lstm.html) that also applies an RNN to sentiment prediction, using the same dataset here

Andrej Karpathy's blog post [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) is very helpful for understanding the mathematical details of an RNN, applied to the task of language modeling. It also provides raw Python code with an implementation of the backpropagation algorithm.

TensorFlow also has an RNN language model [tutorial](https://www.tensorflow.org/versions/r0.12/tutorials/recurrent/index.html) using the Penn Treebank dataset

Chris Olah provides a good [explanation](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) of how LSTM RNNs work (this explanation also applies to the GRU model used here)

Denny Britz's [tutorial](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/) documents well both the technical details of RNNs and their implementation in Python.