<a href="https://colab.research.google.com/github/biswa-13/TensorFlow-Practice/blob/master/TF5_Sentiment_Analysis_Step_By_Step_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Step By Step Guid for Doing the Sentiment Analysis using Tensoflow :**
- Import the required packages and declaire the variables
- Download the dataset
- Prepare the training and testing dataset
- Tokenizing, Sequencing and Padding the dataset
- Create the model
- Train the model using the keras Embeddings layer
- Predict/Test the model accuracy

Reference : https://colab.research.google.com/github/tensorflow/examples/blob/master/courses/udacity_intro_to_tensorflow_for_deep_learning/l09c04_nlp_embeddings_and_sentiment.ipynb#scrollTo=QXtfw-OY3WoZ

In [42]:
#- Import the required packages and declaire the variables
print("Start - Import the required packages and declaire the variables")
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

import numpy as np
import pandas as pd

vocab_size = 1000
embeding_dimensions = 16
max_len = 100
oov_token = "<OOv>"
trunc_type = "post"
padding_type = "post"

print("Finish - Import the required packages and declaire the variables")

Start - Import the required packages and declaire the variables
Finish - Import the required packages and declaire the variables


In [36]:
# - Downloading the Dataset
print("Start: Downloading the Dataset -->")
path = tf.keras.utils.get_file("reviewDataset.csv", "https://drive.google.com/uc?id=13ySLC_ue6Umt9RJYSeM2t-V0kCv-4C-P")

# Retriving the downloaded datasset
dataset = pd.read_csv(path)
dataset.head()
print("Finish: Downloading the Dataset -->")

Start: Downloading the Dataset -->
Finish: Downloading the Dataset -->


In [37]:
#- Prepare the training and testing dataset

print("Start - Prepare the training and testing dataset")
# Get the reviews and lables from the csv file
reviews = dataset["text"]
labels = dataset["sentiment"]
train_size = int(len(reviews) * 0.8)

train_dataset = reviews[0: train_size]
train_labels = labels[0:train_size]

test_dataset = reviews[train_size:]
test_labels = labels[train_size:]
# Make labels into numpy arrays for use with the network later
updtd_train_lbls = np.array(train_labels)
updtd_test_lbls = np.array(test_labels)
print("Finish - Prepare the training and testing dataset")

Start - Prepare the training and testing dataset
Finish - Prepare the training and testing dataset


In [45]:
# - Tokenize the dataset
print("Start - Tokenizing, Sequencing and Padding the dataset")
# -- initializing the tokenizer
tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_token)
# -- tokenizing the words
tokenizer.fit_on_texts(train_dataset)
# -- retriving the wordIndex
wordIndex = tokenizer.word_index
# -- generating the text sequences
sequences_train = tokenizer.texts_to_sequences(train_dataset)
# -- applying the paading configurations
padding_train = pad_sequences(sequences= sequences_train, maxlen= max_len, padding= padding_type, truncating= trunc_type)

# generating sequence and padding for the test dataset
sequence_test = tokenizer.texts_to_sequences(test_dataset)
padding_test = pad_sequences(sequences= sequence_test, maxlen = max_len, padding= padding_type, truncating= trunc_type)
print("Finish - Tokenizing, Sequencing and Padding the dataset")

Start - Tokenizing, Sequencing and Padding the dataset
Finish - Tokenizing, Sequencing and Padding the dataset


In [51]:
# - Building the model using the keras Embeddings layer
print("Start - Building the model using the keras Embeddings layer")

# Build a basic sentiment network
# Note the embedding layer is first, 
# and the output is only 1 node as it is either 0 or 1 (negative or positive)

## initializing the model
model = tf.keras.Sequential([
                             tf.keras.layers.Embedding(vocab_size, embeding_dimensions, input_length=max_len),
                             tf.keras.layers.Flatten(),
                             tf.keras.layers.Dense(6, activation= "relu"),
                             tf.keras.layers.Dense(1, activation="sigmoid")
])

## compiling the model
model.compile(loss="binary_crossentropy", optimizer= 'adam', metrics=['accuracy'])
model.summary()

print("Finish - Building the model using the keras Embeddings layer")

Start - Creating and Training the model using the keras Embeddings layer
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 100, 16)           16000     
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 6)                 9606      
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 7         
Total params: 25,613
Trainable params: 25,613
Non-trainable params: 0
_________________________________________________________________
Finish - Creating and Training the model using the keras Embeddings layer


In [52]:
# - Training the model
print("Start - Training the model")
epochs = 6
model.fit(padding_train, updtd_train_lbls, epochs= epochs, validation_data= (padding_test, updtd_test_lbls))
print("Finish - Training the model")

Start - testing the model
Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
Finish - testing the model


In [56]:
# - Testing the model
print("Start: Testing the model")
# Use the model to predict a review   
fake_reviews = ['I love this phone', 'I hate spaghetti', 
                'Everything was cold',
                'Everything was hot exactly as I wanted', 
                'Everything was green that is good but i dont like the green', 
                'the host seated us immediately',
                'they gave us free chocolate cake', 
                'not sure about the wilted flowers on the table',
                'only works when I stand on tippy toes', 
                'does not work when I stand on my head']

print(fake_reviews) 

# Create the sequences
padding_type='post'
sample_sequences = tokenizer.texts_to_sequences(fake_reviews)
fakes_padded = pad_sequences(sample_sequences, padding=padding_type, maxlen=max_len)           

print('\nHOT OFF THE PRESS! HERE ARE SOME NEWLY MINTED, ABSOLUTELY GENUINE REVIEWS!\n')              

classes = model.predict(fakes_padded)

# The closer the class is to 1, the more positive the review is deemed to be
for x in range(len(fake_reviews)):
  print(fake_reviews[x])
  print(classes[x])
  print('\n')

# Try adding reviews of your own
# Add some negative words (such as "not") to the good reviews and see what happens
# For example:
# they gave us free chocolate cake and did not charge us
print("Start: Testing the model")

Start: Testing the model
['I love this phone', 'I hate spaghetti', 'Everything was cold', 'Everything was hot exactly as I wanted', 'Everything was green that is good but i dont like the green', 'the host seated us immediately', 'they gave us free chocolate cake', 'not sure about the wilted flowers on the table', 'only works when I stand on tippy toes', 'does not work when I stand on my head']

HOT OFF THE PRESS! HERE ARE SOME NEWLY MINTED, ABSOLUTELY GENUINE REVIEWS!

I love this phone
[0.65867996]


I hate spaghetti
[0.5517792]


Everything was cold
[0.6035269]


Everything was hot exactly as I wanted
[0.601652]


Everything was green that is good but i dont like the green
[0.5872112]


the host seated us immediately
[0.5933626]


they gave us free chocolate cake
[0.5982133]


not sure about the wilted flowers on the table
[0.5296513]


only works when I stand on tippy toes
[0.60210544]


does not work when I stand on my head
[0.5013036]


Start: Testing the model
