<a href="https://colab.research.google.com/github/Shrilekhya/MachineLearning/blob/main/SentimentAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [50]:
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
import pandas as pd

In [51]:
train_data = pd.read_csv('/content/imdb_reviews.csv')
test_data = pd.read_csv('/content/test_reviews.csv')

In [52]:
train_data.head()

Unnamed: 0,Reviews,Sentiment
0,<START this film was just brilliant casting lo...,positive
1,<START big hair big boobs bad music and a gian...,negative
2,<START this has to be one of the worst films o...,negative
3,<START the <UNK> <UNK> at storytelling the tra...,positive
4,<START worst mistake of my life br br i picked...,negative


In [53]:
word_indexes = pd.read_csv('/content/word_indexes.csv')
word_indexes.head()

Unnamed: 0,Words,Indexes
0,tsukino,52009
1,nunnery,52010
2,sonja,16819
3,vani,63954
4,woods,1411


In [54]:
# Converting this word_indexes list into a dictionary

word_indexes = dict(zip(word_indexes.Words, word_indexes.Indexes))

In [55]:
word_indexes["<PAD>"] = 0
word_indexes["<START"] = 1
word_indexes["<UNK>"] = 2
word_indexes["<UNUSED>"] = 3
word_indexes["<U"] = 4

In [56]:
# function for encoding the text data into the indexes

def textEncoder(text):
  return [word_indexes[word] for word in text]

In [57]:
# Splitting the data
X_train, y_train = train_data['Reviews'], train_data['Sentiment']
X_test, y_test = test_data['Reviews'], test_data['Sentiment']

In [58]:
X_train.head()

0    <START this film was just brilliant casting lo...
1    <START big hair big boobs bad music and a gian...
2    <START this has to be one of the worst films o...
3    <START the <UNK> <UNK> at storytelling the tra...
4    <START worst mistake of my life br br i picked...
Name: Reviews, dtype: object

Splitting the words in each review using whitespace as delimitter

In [59]:
X_train=X_train.apply(lambda review:review.split())
X_test=X_test.apply(lambda review:review.split())

In [60]:
X_train.head()

0    [<START, this, film, was, just, brilliant, cas...
1    [<START, big, hair, big, boobs, bad, music, an...
2    [<START, this, has, to, be, one, of, the, wors...
3    [<START, the, <UNK>, <UNK>, at, storytelling, ...
4    [<START, worst, mistake, of, my, life, br, br,...
Name: Reviews, dtype: object

In [61]:
#Encoding all the words
X_train = X_train.apply(textEncoder)
X_test = X_test.apply(textEncoder)

In [62]:
X_train.head()

0    [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, ...
1    [1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463,...
2    [1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 5...
3    [1, 4, 2, 2, 33, 2804, 4, 2040, 432, 111, 153,...
4    [1, 249, 1323, 7, 61, 113, 10, 10, 13, 1637, 1...
Name: Reviews, dtype: object

In [63]:
#We encode the sentiments

def encodeSentiments(sent):
  if sent == "positive":
    return 1
  else:
    return 0

In [64]:
y_train = y_train.apply(encodeSentiments)
y_test = y_test.apply(encodeSentiments)

In [65]:
# We have to make the length of each review same for the model to work properly. So we consider the max lenght to be 500 and pad 0's if review length is lesser

X_train = keras.preprocessing.sequence.pad_sequences(X_train, maxlen=500, padding='post', value=word_indexes["<PAD>"])
X_test = keras.preprocessing.sequence.pad_sequences(X_test, maxlen=500, padding='post', value=word_indexes["<PAD>"])

Here,

1st layer : takes 10000 egs and creates word embeddings of length 16 for each review

2nd layer : it prevents overfitting by reducing the number of parameters

3rd layer : dense layer with 16 hidden unit and relu as activation function

4th layer : dense layer which is the output layer and uses sigmoid as activation function

In [68]:
model = keras.Sequential([keras.layers.Embedding(10000, 16, input_length=500),
                          keras.layers.GlobalAveragePooling1D(),
                          keras.layers.Dense(16, activation='relu'),
                          keras.layers.Dense(1, activation='sigmoid')])

In [69]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [71]:
#training the model
output = model.fit(X_train, y_train, epochs=30, batch_size=512, validation_data=(X_test, y_test))

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [72]:
loss, accuracy = model.evaluate(X_test, y_test)



In [73]:
randomIndex = np.random.randint(1,1000)
user_review = test_data.loc[randomIndex]
print(user_review)

Reviews      <START actually i have more a question than a ...
Sentiment                                             positive
Name: 323, dtype: object


In [74]:
user_review = X_test[randomIndex]
user_review=np.array([user_review])
user_sent = model.predict(user_review)

if(user_sent>0.5):
  print("Positive Sentiment")
else:
  print("Negative Sentiment")

Positive Sentiment
