#Name Bhavesh Kumar Bohara
#MML2022013

###Sentiment Analysis

Definition: The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.

###IMDB dataset : Moview review dataset

This dataset contains 25,000 reviews from IMDB where each one is already preprocessed and has a label as either positive or negative. Each review is encoded by integers that represents how common a word is in the entire dataset.  For example, a word encoded by the integer 3 means that it is the 3rd most common word in the dataset.

###Implementation for moview review


In [None]:
from keras.datasets import imdb
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np
from tensorflow.keras.preprocessing.sequence import pad_sequences
VOCAB_SIZE = 88584

MAXLEN = 250
BATCH_SIZE = 64

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = VOCAB_SIZE)

In [None]:
#size of one sample of train data
len(train_data[1])

189

In [None]:
train_data=pad_sequences(train_data,MAXLEN)
test_data=pad_sequences(test_data,MAXLEN)

In [None]:
len(train_data[1])

In [None]:
len(train_data[5])
#All review is length of 250 now.

###Creating Model

In [None]:
model=tf.keras.Sequential([
    tf.keras.layers.Embedding(VOCAB_SIZE,32),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1,activation='sigmoid')
])

In [None]:
model.summary()

In [None]:
model.compile(loss="binary_crossentropy",optimizer="rmsprop",metrics=['accuracy'])
history=model.fit(train_data,train_labels,epochs=10,validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
model.save("lstm.h5") #create HDF5 file "lstm.h5"  to save LSTM model

In [None]:
new_model = tf.keras.models.load_model('lstm.h5')
#Code loads a trained Keras model from a file named 'lstm.h5'

The loaded model is then assigned to the variable new_model which can be used to make predictions or further train the model if necessary.

In [None]:
results=new_model.evaluate(test_data,test_labels)
print(results)

[0.4557284712791443, 0.8637999892234802]


In [None]:
results=model.evaluate(test_data,test_labels)
print(results)

[0.4557284712791443, 0.8637999892234802]


###Making Predictions

In [None]:
word_index=imdb.get_word_index()
#code retrieves the word index of the IMDB movie reviews dataset using a method called get_word_index() which is available in the IMDB library.

In [None]:
#Example to show first ten word in IMDB library
for i in range(10):
    print(list(word_index.keys())[i],':',list(word_index.values())[i])

fawn : 34701
tsukino : 52006
nunnery : 52007
sonja : 16816
vani : 63951
woods : 1408
spiders : 16115
hanging : 2345
woody : 2289
trawling : 52008


In [None]:
def encode_text(text):
    tokens=keras.preprocessing.text.text_to_word_sequence(text)
    tokens=[word_index[word] if word in word_index else 0 for word in tokens]
    return pad_sequences([tokens],MAXLEN)[0]

In [None]:
text="that movie was amazing, i have to watch it again"
encoded=encode_text(text)
print(encoded)

[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0  12  17  13 477  10  25   

In [None]:
reverse_word_index={value:key for (key,value) in word_index.items()}

def decode_integers(integers):
    PAD=0
    text=""
    for num in integers:
        if num!=PAD:
            text+=reverse_word_index[num] +" "

    return text[:-1]

print(decode_integers(encoded))

that movie was amazing i have to watch it again


In [None]:
def predict(text):
    encoded_text=encode_text(text)
    pred=encoded_text.reshape(1,250) #converting vector to 2d
    result=model.predict(pred)
    print(result[0])

In [None]:
positive_review="That was a good movie, i will definitely watch it again"
predict(positive_review)

negative_review="Don't waste your time watching this movie, so disappointing"
predict(negative_review)

[0.99264354]
[0.77117985]
