#Natural Language Processing (Sentiment Analysis) with Recurrent Neural Networks

##Word Embeddings
**Word embeddings** keeps the order of words intact and encodes similar words with similar labels. It attempts to encode the frequency and order of words as well as the meaning of those words in the sentence. It encodes each word as a dense vector that represents its context in the sentence.

Word embeddings are learned by looking at many different training examples. An *embedding layer* can be added to the beginning of the model and be trained for correct embeddings for words.  We can also use pre-trained embedding layers.

##Recurrent Neural Networks
An RNN will process one word at a time while maintaining an internal memory of what it has already seen.  This allows it to treat words differently based on their order in a sentence and to slowly build an understanding of the entire input, one word at a time.   The text data are treated as a sequence to pass one word at a time to the RNN.

![alt text](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)
*Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/*

where
**h<sub>t</sub>** is output at time t, **x<sub>t</sub>** is input at time t, and **A** is Recurrent Layer (loop).  

This is a **simple RNN layer**.
The recurrent layer processes words or input one at a time in a combination with the output from the previous iteration.  As we progress further in the input sequence, we build a better understanding of the text as a whole.

##LSTM
A Long Short-Term Memory (LSTM) RNN works similarly but adds a way to access inputs from any timestep in the past. 
With LSTM, we have a long-term memory data structure storing all the previously seen inputs as well as when we saw them.  This adds to the complexity of our network and allows it to discover more useful relationships between inputs and when they appear. 

##Sentiment Analysis
Sentiment Analysis (from Wikipedia):

*the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive or negative.*

In this example, we will classify movie reviews as potitive, negative, or neutral.

*This guide is based on the following tensorflow tutorial: https://www.tensorflow.org/tutorials/text/text_classification_rnn*

###Movie Review Dataset
Load in the IMDB movie review dataset from Keras. 

This dataset contains 25,000 reviews from IMDB where each one is already preprocessed and has a label as either positive or negative. Each review is encoded by integers that represents how common a word is in the entire dataset. For example, a word encoded by the integer 3 means that it is the 3rd most common word in the dataset.

In [1]:
# %tensorflow_version 2.x
from keras.datasets import imdb
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np

VOCAB_SIZE = 88584

MAXLEN = 250
BATCH_SIZE = 64

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = VOCAB_SIZE)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


In [2]:
display(len(train_data))
display(len(train_data[0]))
display(len(test_data))

25000

218

25000

In [3]:
# Lets look at one review
display(len(train_data[0]))
display(train_data[0])

218

[1,
 14,
 22,
 16,
 43,
 530,
 973,
 1622,
 1385,
 65,
 458,
 4468,
 66,
 3941,
 4,
 173,
 36,
 256,
 5,
 25,
 100,
 43,
 838,
 112,
 50,
 670,
 22665,
 9,
 35,
 480,
 284,
 5,
 150,
 4,
 172,
 112,
 167,
 21631,
 336,
 385,
 39,
 4,
 172,
 4536,
 1111,
 17,
 546,
 38,
 13,
 447,
 4,
 192,
 50,
 16,
 6,
 147,
 2025,
 19,
 14,
 22,
 4,
 1920,
 4613,
 469,
 4,
 22,
 71,
 87,
 12,
 16,
 43,
 530,
 38,
 76,
 15,
 13,
 1247,
 4,
 22,
 17,
 515,
 17,
 12,
 16,
 626,
 18,
 19193,
 5,
 62,
 386,
 12,
 8,
 316,
 8,
 106,
 5,
 4,
 2223,
 5244,
 16,
 480,
 66,
 3785,
 33,
 4,
 130,
 12,
 16,
 38,
 619,
 5,
 25,
 124,
 51,
 36,
 135,
 48,
 25,
 1415,
 33,
 6,
 22,
 12,
 215,
 28,
 77,
 52,
 5,
 14,
 407,
 16,
 82,
 10311,
 8,
 4,
 107,
 117,
 5952,
 15,
 256,
 4,
 31050,
 7,
 3766,
 5,
 723,
 36,
 71,
 43,
 530,
 476,
 26,
 400,
 317,
 46,
 7,
 4,
 12118,
 1029,
 13,
 104,
 88,
 4,
 381,
 15,
 297,
 98,
 32,
 2071,
 56,
 26,
 141,
 6,
 194,
 7486,
 18,
 4,
 226,
 22,
 21,
 134,
 476,
 26,
 480,
 5

###More Preprocessing
The reviews are of different lengths.   We cannot pass different length data into our neural network.  Therefore, we must make each review the same length.   To do this, we will follow the procedure below:
- if the review is greater than 250 words, trim off the extra words
- if the review is less than 250 words, add the necessary amount of 0's to make it equal to 250.

In [4]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

train_data = pad_sequences(train_data, MAXLEN)
test_data = pad_sequences(test_data, MAXLEN)

# train_data = sequence.pad_sequences(train_data, MAXLEN)
# test_data = sequence.pad_sequences(test_data, MAXLEN)

We load the encodings from the dataset and use them to encode the review data.

In [5]:
# Build an encode function

word_index = imdb.get_word_index()

def encode_text(text):
  tokens = keras.preprocessing.text.text_to_word_sequence(text)
  tokens = [word_index[word] if word in word_index else 0 for word in tokens]
  return pad_sequences([tokens], MAXLEN)[0]

text = "that movie was just amazing, so amazing"
encoded = encode_text(text)
print(encoded)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0 

In [6]:
# Build a decode function

reverse_word_index = {value: key for (key, value) in word_index.items()}

def decode_integers(integers):
    PAD = 0
    text = ""
    for num in integers:
      if num != PAD:
        text += reverse_word_index[num] + " "

    return text[:-1]

print(decode_integers(encoded))
print(decode_integers(train_data[0]))
display(train_labels[0])

that movie was just amazing so amazing
the as you with out themselves powerful lets loves their becomes reaching had journalist of lot from anyone to have after out atmosphere never more room titillate it so heart shows to years of every never going villaronga help moments or of every chest visual movie except her was several of enough more with is now current film as you of mine potentially unfortunately of you than him that with out themselves her get for was camp of you movie sometimes movie that with scary but pratfalls to story wonderful that in seeing in character to of 70s musicians with heart had shadows they of here that with her serious to have does when from why what have critics they is you that isn't one will very to as itself with other tricky in of seen over landed for anyone of gilmore's br show's to whether from than out themselves history he name half some br of 'n odd was two most of mean for 1 any an boat she he should is thought frog but of script you not while his

1

###Create the Model
An **embedding layer** stores one vector per word. When called, it converts the sequences of word indices to sequences of vectors. These vectors are trainable. After training (on enough data), words with similar meanings often have similar vectors.

We use a word embedding layer as the first layer.  Then we add an LSTM layer to the model followed by a dense layer that outputs a node of the predicted sentiment.

In the following, 32 stands for the output dimension of the vectors generated by the embedding layer. 

The following picture is for reference.  We don't have TextVectorization layer here.  We don't use Bidirectional RNN here.


![alt text](https://www.tensorflow.org/text/tutorials/images/bidirectional.png)

In [7]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE, 32),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

# model = tf.keras.Sequential([
#     tf.keras.layers.Embedding(
#         input_dim=VOCAB_SIZE,
#         output_dim=32,
#         # Use masking to handle the variable sequence lengths
#         mask_zero=True),
#     tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
#     tf.keras.layers.Dense(32, activation='relu'),
#     tf.keras.layers.Dense(1, activation='sigmoid')
# ])

In [8]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 32)          2834688   
                                                                 
 lstm (LSTM)                 (None, 32)                8320      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 2,843,041
Trainable params: 2,843,041
Non-trainable params: 0
_________________________________________________________________


###Training
Compile and train the model.

In [9]:
model.compile(loss="binary_crossentropy", optimizer="rmsprop", metrics=['acc'])
# model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
#               optimizer=tf.keras.optimizers.Adam(1e-4),
#               metrics=['accuracy'])
history = model.fit(train_data, train_labels, epochs=2, validation_split=0.2)

Epoch 1/2
Epoch 2/2


Evaluate the model on the test data to see how well it performs.

In [10]:
results = model.evaluate(test_data, test_labels)
print(results)

[0.32370561361312866, 0.8659600019454956]


###Make Predictions
Use the trained network to make predictions on our own reviews.

We need to convert our review into the form that the network can understand.  Call the function encode_text().

In [11]:
# Make a prediction

def predict(text):
  encoded_text = encode_text(text)
  pred = np.zeros((1,250))
  pred[0] = encoded_text
  result = model.predict(pred) 
  print(result[0])

positive_review = "That movie was! really loved it and would great watch it again because it was amazingly great"
predict(positive_review)

negative_review = "that movie really sucked. I hated it and wouldn't watch it again. Was one of the worst things I've ever watched"
predict(negative_review)

[0.8536767]
[0.30975097]


##Sources

1. Chollet François. Deep Learning with Python. Manning Publications Co., 2018.
2. “Text Classification with an RNN &nbsp;: &nbsp; TensorFlow Core.” TensorFlow, www.tensorflow.org/tutorials/text/text_classification_rnn.
3. “Understanding LSTM Networks.” Understanding LSTM Networks -- Colah's Blog, https://colah.github.io/posts/2015-08-Understanding-LSTMs/.