<center>
  <h1 style="
      background: linear-gradient(45deg, #4CAF50, #2196F3);
      color: #ffffff;
      font-size: 36px;
      font-weight: bold;
      font-family: 'Arial', sans-serif;
      border: 2px solid #2196F3;
      padding: 10px;
      border-radius: 8px;
      text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.5);
      margin: 20px 0;
  ">Sentiment Analysis</h1>
</center>


![imagegif url](https://i.pinimg.com/originals/52/ad/6a/52ad6a11c1dcb1692ff9e321bd520167.gif)


Sentiment analysis is the process of analyzing digital text to determine if the emotional tone of the message is positive or negative. Today, companies have large volumes of text data like emails, customer support chat transcripts, social media comments, and reviews. Sentiment analysis tools can scan this text to automatically determine the author’s attitude towards a topic. Companies use the insights from sentiment analysis to improve customer service and increase brand reputation. 


Why is sentiment analysis important?
Sentiment analysis, also known as opinion mining, is an important business intelligence tool that helps companies improve their products and services. We give some benefits of sentiment analysis below.

Provide objective insights
Businesses can avoid personal bias associated with human reviewers by using artificial intelligence (AI)–based sentiment analysis tools. As a result, companies get consistent and objective results when analyzing customers’ opinions.

For example, consider the following sentence: 

I'm amazed by the speed of the processor but disappointed that it heats up quickly. 

Marketers might dismiss the discouraging part of the review and be positively biased towards the processor's performance. However, accurate sentiment analysis tools sort and classify text to pick up emotions objectively.

Build better products and services
A sentiment analysis system helps companies improve their products and services based on genuine and specific customer feedback. AI technologies identify real-world objects or situations (called entities) that customers associate with negative sentiment. From the above example, product engineers focus on improving the processor's heat management capability because the text analysis software associated disappointed (negative) with processor (entity) and heats up (entity).

Analyze at scale
Businesses constantly mine information from a vast amount of unstructured data, such as emails, chatbot transcripts, surveys, customer relationship management records, and product feedback. Cloud-based sentiment analysis tools allow businesses to scale the process of uncovering customer emotions in textual data at an affordable cost. 

Real-time results
Businesses must be quick to respond to potential crises or market trends in today's fast-changing landscape. Marketers rely on sentiment analysis software to learn what customers feel about the company's brand, products, and services in real time and take immediate actions based on their findings. They can configure the software to send alerts when negative sentiments are detected for specific keywords.

In [None]:
import numpy as np
import pandas as pd
import glob
import matplotlib.pyplot as plt
import nltk
import re
import tensorflow as tf
import keras
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.layers import Embedding,LSTM,Dense
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.text import one_hot

<h2 style="
      background: linear-gradient(45deg, #3498db, #2c3e50);
      color: #ffffff;
      font-size: 24px;
      font-weight: bold;
      font-family: 'Arial', sans-serif;
      border: 2px solid #2c3e50;
      padding: 8px;
      border-radius: 6px;
      text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.3);
      margin: 10px 0;
  ">Reading Dataset</h2>


In [None]:
files = glob.glob('../input/sentiment-labelled-sentences-data-set/sentiment labelled sentences/*.txt')
files

In [None]:
data=[]
for index in [0,1,3]: 
    f = open(files[0], "r")
    for review in f:
        x = review.strip().split('\t')
        data.append([x[0], int(x[1])])

data[0]

In [None]:
data = pd.DataFrame(data,columns=['reviews','labels'])
data.head()

In [None]:
data.shape

In [None]:
data.describe()

In [None]:
data['labels'].value_counts().sum

<h2 style="
      background: linear-gradient(45deg, #3498db, #2c3e50);
      color: #ffffff;
      font-size: 24px;
      font-weight: bold;
      font-family: 'Arial', sans-serif;
      border: 2px solid #2c3e50;
      padding: 8px;
      border-radius: 6px;
      text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.3);
      margin: 10px 0;
  ">Preprocessing</h2>


In [None]:
nltk.download('stopwords')

In [None]:
oov_tok = '<oov>'
embedding_dims = 16
padding_type = 'post'
vocab_size = 1000
max_length = 20
trunc_type = 'post'

In [None]:
ps = PorterStemmer()
corpus = []
sentances = data.reviews.copy()
for sent in sentances:
    review = re.sub('[^a-zA-Z]', ' ', sent).lower().split()
    review = [ps.stem(word) for word in review if not word in stopwords.words('english')]
    review = ' '.join(review)
    corpus.append(review)

In [None]:
corpus[:10]

In [None]:
onehot_representation = [one_hot(sent,vocab_size) for sent in corpus]
onehot_representation[:10]

In [None]:
padded_sequences = pad_sequences(onehot_representation,padding=padding_type,maxlen=max_length)

In [None]:
padded_sequences[:10]

<h2 style="
      background: linear-gradient(45deg, #3498db, #2c3e50);
      color: #ffffff;
      font-size: 24px;
      font-weight: bold;
      font-family: 'Arial', sans-serif;
      border: 2px solid #2c3e50;
      padding: 8px;
      border-radius: 6px;
      text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.3);
      margin: 10px 0;
  ">Splitting Dataset</h2>


In [None]:
reviews = np.array(padded_sequences)
labels = np.array(data['labels'])

In [None]:
training_size = int(len(reviews)*0.80)
training_reviews = reviews[0:training_size]
testing_reviews = reviews[training_size:]

training_labels = labels[0:training_size]
testing_labels = labels[training_size:]


training_labels_final = np.array(training_labels)
testing_labels_final = np.array(testing_labels)

In [None]:
print('\n the first review:',training_reviews[0])
print('\n the sequence for the first review:',padded_sequences[0])


<h2 style="
      background: linear-gradient(45deg, #3498db, #2c3e50);
      color: #ffffff;
      font-size: 24px;
      font-weight: bold;
      font-family: 'Arial', sans-serif;
      border: 2px solid #2c3e50;
      padding: 8px;
      border-radius: 6px;
      text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.3);
      margin: 10px 0;
  ">Embedding Model</h2>


In [None]:
embedding_model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size,embedding_dims,input_length=max_length),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(6,activation='relu'),
    tf.keras.layers.Dense(1,activation='sigmoid')
])

In [None]:
embedding_model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
embedding_model.summary()

In [None]:
epochs_num =20
history=embedding_model.fit(training_reviews,training_labels_final,epochs=epochs_num,validation_data=(testing_reviews,testing_labels_final))

In [None]:
def plot_graphs(history,string):
    plt.plot(history.history[string])
    plt.plot(history.history['val_'+string])
    plt.xlabel('epochs')
    plt.ylabel(string)
    plt.legend([string,'val_'+string])
    plt.show()


plot_graphs(history,'accuracy')
plot_graphs(history,'loss')

<h2 style="
      background: linear-gradient(45deg, #3498db, #2c3e50);
      color: #ffffff;
      font-size: 24px;
      font-weight: bold;
      font-family: 'Arial', sans-serif;
      border: 2px solid #2c3e50;
      padding: 8px;
      border-radius: 6px;
      text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.3);
      margin: 10px 0;
  ">Predicting Sentiment in New Reviews For Embedding Model</h2>


In [None]:
new_reviews_to_predict = [
    "Worst purchase ever. Do not buy!",
    "Great value for the price.",
    "Terrible quality, very disappointed."
]

In [None]:
tokenizer = Tokenizer()
def predict_sentiment(model, tokenizer, new_reviews, max_length=max_length, show_padded_sequence=True):
    corpus=[]
    for sent in new_reviews:
        review = re.sub('[^a-zA-Z]', ' ', sent).lower().split()
        review = [ps.stem(word) for word in review if not word in stopwords.words('english')]
        review = ' '.join(review)
        corpus.append(review)
        
    onehot_representation = [one_hot(sent,vocab_size) for sent in corpus]   
    padded_sequences = pad_sequences(onehot_representation,padding=padding_type,maxlen=max_length)
    
    classes = model.predict(padded_sequences)

    for i in range(len(new_reviews)):
        if show_padded_sequence:
            print(f"Padded Sequence: {padded_sequences[i]}")
        print(f"Review: {new_reviews[i]}")
        print(f"Predicted Probability: {classes[i][0]}")
       
        threshold = 0.5
        if classes[i][0] >= threshold:
            print("Prediction: Positive Sentiment")
        else:
            print("Prediction: Negative Sentiment")
        print("\n")


predict_sentiment(embedding_model, tokenizer, new_reviews_to_predict)


<h2 style="
      background: linear-gradient(45deg, #3498db, #2c3e50);
      color: #ffffff;
      font-size: 24px;
      font-weight: bold;
      font-family: 'Arial', sans-serif;
      border: 2px solid #2c3e50;
      padding: 8px;
      border-radius: 6px;
      text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.3);
      margin: 10px 0;
  "> LSTM Model</h2>


In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size,embedding_dims,input_length=max_length),
    tf.keras.layers.LSTM(embedding_dims,return_sequences=True,kernel_regularizer= keras.regularizers.l2(0.01)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.LSTM(embedding_dims),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(8,activation='relu'),
    tf.keras.layers.Dense(1,activation='sigmoid'),

])

In [None]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
model.summary()

In [None]:
history1=model.fit(training_reviews,training_labels_final,epochs=20,validation_data=(testing_reviews,testing_labels_final))

In [None]:
plot_graphs(history1,'accuracy')
plot_graphs(history1,'loss')

<h2 style="
      background: linear-gradient(45deg, #3498db, #2c3e50);
      color: #ffffff;
      font-size: 24px;
      font-weight: bold;
      font-family: 'Arial', sans-serif;
      border: 2px solid #2c3e50;
      padding: 8px;
      border-radius: 6px;
      text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.3);
      margin: 10px 0;
  ">Predicting Sentiment in New Reviews For LSTM Model</h2>


In [None]:
predict_sentiment(model, tokenizer, new_reviews_to_predict)