## Sentiment Analysis of Movie Reviews using Recurrent Neural Network
### Movie Review Dataset
Well start by loading in the IMDB movie review dataset from keras. This dataset contains 25,000 reviews from IMDB where each one is already preprocessed and has a label as either positive or negative. Each review is encoded by integers that represents how common a word is in the entire dataset. For example, a word encoded by the integer 3 means that it is the 3rd most common word in the dataset.


In [1]:
from keras.datasets import imdb
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np

In [2]:
vocab_size = 88584
max_len = 250
batch_size = 64

(train_data, train_labels),(test_data,test_labels) = imdb.load_data(num_words=vocab_size)

In [3]:
train_data.shape

(25000,)

### Preprocessing
We cannot pass different length data into our neural network. Therefore, we must make each review the same length. To do this we will follow the procedure below:
- if the review is greater than 250 words then trim off the extra words
- if the review is less than 250 words add the necessary amount of 0's to make it equal to 250.

In [4]:
train_data = sequence.pad_sequences(train_data,max_len)
test_data = sequence.pad_sequences(test_data,max_len)

In [5]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size,32),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

In [6]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 32)          2834688   
                                                                 
 lstm (LSTM)                 (None, 32)                8320      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 2843041 (10.85 MB)
Trainable params: 2843041 (10.85 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [7]:
model.compile(loss="binary_crossentropy", optimizer="rmsprop", metrics=["acc"])
history = model.fit(train_data,train_labels, epochs=2, validation_split=0.2)

Epoch 1/2
Epoch 2/2


In [8]:
results = model.evaluate(test_data,test_labels)
print(results)

[0.29181602597236633, 0.877560019493103]


In [9]:
word_index = imdb.get_word_index()

In [10]:
import pandas as pd

In [11]:
vocab = pd.DataFrame([word_index])

In [12]:
vocab.head()

Unnamed: 0,fawn,tsukino,nunnery,sonja,vani,woods,spiders,hanging,woody,trawling,...,copywrite,geysers,artbox,cronyn,hardboiled,voorhees',35mm,'l',paget,expands
0,34701,52006,52007,16816,63951,1408,16115,2345,2289,52008,...,88581,52003,88582,52004,52005,88583,16815,88584,18509,20597


In [13]:
vocab = vocab.transpose()
vocab.head()

Unnamed: 0,0
fawn,34701
tsukino,52006
nunnery,52007
sonja,16816
vani,63951


In [14]:
vocab = vocab.sort_values(by=0)

In [15]:
print(vocab)

               0
the            1
and            2
a              3
of             4
to             5
...          ...
pipe's     88580
copywrite  88581
artbox     88582
voorhees'  88583
'l'        88584

[88584 rows x 1 columns]


In [16]:
def encode_text(text):
    tokens = keras.preprocessing.text.text_to_word_sequence(text)
    tokens = [word_index[word] if word in word_index else 0 for word in tokens]
    return sequence.pad_sequences([tokens],max_len)[0]

In [17]:
text = "Captivating from start to finish! The storyline is engaging, characters are relatable, and the performances are top-notch. A perfect blend of humor, drama, and heartwarming moments. This movie is a must-watch for anyone looking for an uplifting and entertaining experience!"
encoded = encode_text(text)
print(encoded)

[    0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0   

In [18]:
reverse_word_index = {value:key for (key,value) in word_index.items()}
def decode_integers(integers):
    PAD = 0
    text=''
    for num in integers:
        if num != PAD:
            text += reverse_word_index[num] + " "
    return text[:-1]

print(decode_integers(train_data[69]))

akshay just from short in scene it man ii from out then could akshay this until states bad isn't i'd in of clear in start don't fits equally to mood is much way you or for soon well at you're only perry it is drugs area over resolution in of decades or going narrated deft responsible accents halfway to sound boring or lords dyer to stealing sophomoric i i of german little after one will keep this of bait various not depressing magic mature to is numbers not vermin relax jules like hand some in at ephemeral photographed here's having because go alone care br idiots frustration to vintage conniving having because in dog were right usually miles wow did tale opinion dubbing she member kirk jim 1978 suppose mormon wants morgan to begins br sense an without beautiful put this shakespeare inspirational to it's zombie trio bank i i deft there gothic good disappointed subtle br screen of relax jules it so bloodsucker produced breasts like involved in at points 3000 to la about cassidy leaves s

In [19]:
def predict(text):
    encoded_text = encode_text(text)
    pred = np.zeros((1,250))
    pred[0] = encoded_text
    result = model.predict(pred)
    print(result[0])

In [20]:
good_reviews = [
    "Absolutely riveting! Brilliant storytelling, captivating performances, and a perfect balance of emotions. This movie is a true masterpiece!",
    "Incredible film! Engaging plot, well-developed characters, and outstanding cinematography. A must-see for any movie lover!",
    "A delightful experience! The actors shine, the dialogue is sharp, and the direction is flawless. A cinematic gem that leaves you wanting more.",
    "Mesmerizing from start to finish! The narrative is gripping, the visuals are stunning, and the music is enchanting. A cinematic triumph!",
    "Outstanding in every aspect! Phenomenal acting, breathtaking visuals, and a story that pulls at your heartstrings. A true work of art!"
]

bad_reviews = [
    "Disappointing and lackluster. The plot feels contrived, the characters are one-dimensional, and the dialogue is uninspired. Not worth the hype.",
    "A complete letdown. The story is predictable, the acting is wooden, and the pacing is sluggish. Save your time and skip this one.",
    "Underwhelming at best. The film lacks depth, the performances are forgettable, and the ending is unsatisfying. A forgettable experience.",
    "Uninspired and tedious. The narrative is dull, the characters are clichéd, and the direction lacks creativity. A forgettable film that fails to leave an impression.",
    "A missed opportunity. The potential was there, but the execution falls flat. Weak script, mediocre acting, and a lack of originality make this movie a disappointment."
]

In [21]:
for review in bad_reviews:
    predict(review)

[0.45679316]
[0.40833718]
[0.45369768]
[0.49596766]
[0.5870418]


#### Here 0 represents negative review and 1 represents posetive reviews 

In [39]:
for i in range(5):
    review = decode_integers(test_data[i+1])
    print('Actual review score is: ',test_labels[i+1],'\n The review is: ',review)
    print('The predicted review score is:')
    predict(review)

Actual review score is:  1 
 The review is:  quest are chase to being quickly of little it time hell to plot br of something long put are of every place this consequence council of interplay storytelling being nasty not of you warren in is failed club i i of films pay so sequences mightily film okay uses to received wackiness if time done for room sugar viewer as cartoon of gives to forgettable br be because many these of reflection sugar contained gives it wreck scene to more was two when had find as you another it of themselves probably who interplay storytelling if itself by br about 1950's films not would effects that her box to miike for if hero close seek end is very together movie of wheel got say kong sugar fred close bore there is playing lot of scriptures pan place trilogy of lacks br of their time much this men as on it is telling program br silliness okay orientation to frustration at corner rawlins she of sequences to political clearly in of drugs keep guy i i was throwing