# **Restaurant Reviews Sentiment Analysis** using various approache 





## Dataset:
1000 Rows X 2 Columns
*   Reviews column : Descriptive review of customer descrbing their experience
*   Liked column : Value to signify whether the experience was good or not 
(1: Positive ,0: Not positive)

## POLARITY DETECTION USING VADER :
### VADER : Valence aware dictionary for sEntiment reasoning
VADER is a **lexicon** and **rule-based sentiment analysis tool** that is specifically designed to extract sentiments expressed in social media. It is fully open-sourced under the MIT License.

It gives us the value for **polarity**(positive/negative emotion) and **intensity** (strength of emotion). It uses human centric approach and combines qualitative analysis and emperical validation.
#### Lexicon
Lexicon is a list of words(dictionary). For example positive lexicon is a list of all possible positive words (like good, amaze, appreciable etc.). Similarly negative lexicon is a list of all possible negative words (like bad, abolish, ambush etc.)
* It was manually created and annotated by human raters.
* The dictionary is “valence aware,” which means the terms have an associated weight indicating how positive or negative they are, rather than just a binary positive-negative classification. 
Additionally, while VADER’s lexicon maps the sentiment of individual word tokens, most users are interested in analyzing full sentences or paragraphs of text rather than individual words. VADER would not be fully effective without also considering contextual information found in the sentence that can change the sentiment of a word. For this reason, VADER has five heuristics designed to detect contextual clues that affect the intensity or polarity of words in a sentence. 

#### Advantages of Vader :
* It does not require any training data.
* It can very well understand the sentiment of a text containing emoticons, slangs, conjunctions, capital words, punctuations and much more thus making it efficient to work with social media text.
* VADER can work with multiple domains.
* Good for evaluating public opinion, performing a competitive analysis, or enhancing customer experience.
* Easier to understand

#### Disadvantages of vader:
* Misspellings and grammatical mistakes may cause the analysis to overlook important words or usage
* Sarcasm and irony may be misinterpreted.
* Analysis is language-specific.
* Discriminating jargon, nomenclature, memes, or turns of phrase may not be recognized.
* It only cares about individual words and completely ignores the context in which it is used. 

In [199]:
# importing required packages 
from nltk.sentiment import SentimentIntensityAnalyzer
import numpy as np
import pandas as pd
import operator
import time

In [200]:
# read data in a dataframe from the tsv file 
df = pd.read_table('/kaggle/input/restaurant-reviews/Restaurant_Reviews.tsv')
display(df)

Unnamed: 0,Review,Liked
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1
...,...,...
995,I think food should have flavor and texture an...,0
996,Appetite instantly gone.,0
997,Overall I was not impressed and would not go b...,0
998,"The whole experience was underwhelming, and I ...",0


#### SentimentIntensityAnalyzer() takes in a string and returns a dictionary of scores in each of four categories:
If calcultaing sentiment only acccording to compound score : 
1. **Negative Score** : (compound score <= -0.05)
2. **Positive Score** : (compound score >= 0.05) 
3. **Neutral Score** :  (compound score > -0.05) and (compound score < 0.05) 
4. **Compound Score** : It is a metric that calculates the sum of all the lexicon ratings which have been normalized between -1(most extreme negative) and +1 (most extreme positive).


Emotion intensity or sentiment score is measured on a scale from **-4 to +4, where -4 is the most negative and +4 is the most positive. The midpoint 0 represents a neutral sentiment.**



In [201]:
start = time.time()
sia = SentimentIntensityAnalyzer()

#applying the analyzer to column'Review' of dataframe using apply and inserting the compound score to new column-vader_sentiment_score
#getting compound score out of 4 scores : neg/pos/neu/compound

df["vader_sentiment_score"] = df["Review"].apply(lambda x: sia.polarity_scores(x)["compound"])

end = time.time()

# total time taken
print(f"Runtime of the program is {(end - start)/60} minutes or {(end - start)} seconds")

Runtime of the program is 0.0027564446131388347 minutes or 0.16538667678833008 seconds


In [202]:
#populating sentiment column according to the value in sentiment_score column
df["vader_sentiment_label"] = np.select([df["vader_sentiment_score"] < 0, df["vader_sentiment_score"] == 0, df["vader_sentiment_score"] > 0],['neg', 'neu', 'pos'])


#creating a predicted column to check accuracy the sentiment analysis results against the Liked column in dataset
#0 for negative and neutral, 1 for positive
df["vader_predicted"] = np.select([df["vader_sentiment_score"] < 0, df["vader_sentiment_score"] == 0, df["vader_sentiment_score"] > 0],[0, 0, 1])

display(df)

Unnamed: 0,Review,Liked,vader_sentiment_score,vader_sentiment_label,vader_predicted
0,Wow... Loved this place.,1,0.5994,pos,1
1,Crust is not good.,0,-0.3412,neg,0
2,Not tasty and the texture was just nasty.,0,-0.5574,neg,0
3,Stopped by during the late May bank holiday of...,1,0.6908,pos,1
4,The selection on the menu was great and so wer...,1,0.6249,pos,1
...,...,...,...,...,...
995,I think food should have flavor and texture an...,0,0.0000,neu,0
996,Appetite instantly gone.,0,0.0000,neu,0
997,Overall I was not impressed and would not go b...,0,-0.3724,neg,0
998,"The whole experience was underwhelming, and I ...",0,0.0000,neu,0


Comparing the results of VADER sentiment analysis and defined sentiment colunm (Liked)

In [203]:
vader_accuracy = (df.vader_predicted==df.Liked).mean()

print("Accuracy of sentiment analysis using VADER :")
print("%.2f%%"%(vader_accuracy*100))

Accuracy of sentiment analysis using VADER :
81.20%


In [204]:
#dropping vader analysis columns
df.drop(['vader_sentiment_score','vader_sentiment_label','vader_predicted'],axis=1, inplace=True)
#display(df)

## USING TextBlob :
TextBlob is a python library for Natural Language Processing (NLP).TextBlob actively used Natural Language ToolKit (NLTK) to achieve its tasks.
It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

### Sentiment analysis using TextBlob :
TextBlob returns polarity and subjectivity of a sentence. 
* **Polarity** is a float value within the range [-1.0 to 1.0] where 0 indicates neutral, +1 indicates a very positive sentiment and -1 represents a very negative sentiment.
* **Subjectivity** is a float value within the range [0.0 to 1.0] where 0.0 is very objective and 1.0 is very subjective. Subjective sentence expresses some personal feelings, views, beliefs, opinions, allegations, desires, beliefs, suspicions, and speculations where as Objective sentences are factual.


In [205]:
from textblob import TextBlob

In [206]:
start = time.time()

#passing reviews to calculate polarity(negative/positive)
df["textblob_sentiment_polarity"] = df["Review"].apply(lambda x: TextBlob(str(x)).sentiment.polarity)

df["textblob_sentiment_subjectivity"] = df["Review"].apply(lambda x: TextBlob(str(x)).sentiment.subjectivity)

end = time.time()

df["textblob_label"] = np.select([df["textblob_sentiment_polarity"] < 0, df["textblob_sentiment_polarity"] == 0, df["textblob_sentiment_polarity"] > 0],['neg', 'pos', 'neu'])
df["textblob_predicted"] = np.select([df["textblob_sentiment_polarity"] < 0, df["textblob_sentiment_polarity"] == 0, df["textblob_sentiment_polarity"] > 0],[0, 0, 1])

# total time taken
print(f"Runtime of the program is {(end - start)/60} minutes or {(end - start)} seconds")
display(df)

Runtime of the program is 0.006019385655721029 minutes or 0.3611631393432617 seconds


Unnamed: 0,Review,Liked,textblob_sentiment_polarity,textblob_sentiment_subjectivity,textblob_label,textblob_predicted
0,Wow... Loved this place.,1,0.400000,0.900000,neu,1
1,Crust is not good.,0,-0.350000,0.600000,neg,0
2,Not tasty and the texture was just nasty.,0,-1.000000,1.000000,neg,0
3,Stopped by during the late May bank holiday of...,1,0.200000,0.700000,neu,1
4,The selection on the menu was great and so wer...,1,0.800000,0.750000,neu,1
...,...,...,...,...,...,...
995,I think food should have flavor and texture an...,0,0.000000,0.000000,pos,0
996,Appetite instantly gone.,0,0.000000,0.666667,pos,0
997,Overall I was not impressed and would not go b...,0,-0.166667,0.333333,neg,0
998,"The whole experience was underwhelming, and I ...",0,0.100000,0.200000,neu,1


In [207]:
textblob_accuracy = (df.textblob_predicted==df.Liked).mean()
print("Accuracy of sentiment analysis using TextBlob :")
print("%.2f%%"%(textblob_accuracy*100))

Accuracy of sentiment analysis using TextBlob :
77.40%


In [208]:
#dropping textblob analysis columns
df.drop(['textblob_sentiment_polarity','textblob_sentiment_subjectivity','textblob_label','textblob_predicted'],axis=1, inplace=True)
#display(df)

### USING FLAIR
Flair is a simple natural language processing (NLP) library developed and open-sourced by Zalando Research.The Flair framework is built on top of PyTorch.
Flair pretrained sentiment analysis model is trained on IMDB dataset.
It is a pre-trained embedding-based model. This means that each word is represented inside a vector space. Words with vector representations most similar to another word are often used in the same context. This allows us, to, therefore, determine the sentiment of any given vector, and therefore, any given sentence. 
Flair tends to be much slower than its rule-based counterparts but comes at the advantage of being a trained NLP model instead of a rule-based model, which, if done well comes with added performance. 

In [209]:
#downloading and importing required packages
!pip install --user flair
from flair.models import TextClassifier
from flair.data import Sentence

[0m

In [210]:
sia = TextClassifier.load('en-sentiment')
start=time.time()
def flair_prediction(x):
    sentence = Sentence(x)
    sia.predict(sentence)
    score = sentence.labels[0]
    if "POSITIVE" in str(score):
        return 1
    elif "NEGATIVE" in str(score):
        return 0
    else:
        return 0
    
df["flair_sentiment_score"] = df["Review"].apply(flair_prediction)
end=time.time()
print(f"Runtime of the program is {(end - start)/60} minutes or {(end - start)} seconds")

2023-01-12 15:28:44,867 loading file /root/.flair/models/sentiment-en-mix-distillbert_4.pt
Runtime of the program is 0.7639596382776896 minutes or 45.83757829666138 seconds


In [211]:
display(df)

Unnamed: 0,Review,Liked,flair_sentiment_score
0,Wow... Loved this place.,1,1
1,Crust is not good.,0,0
2,Not tasty and the texture was just nasty.,0,0
3,Stopped by during the late May bank holiday of...,1,1
4,The selection on the menu was great and so wer...,1,1
...,...,...,...
995,I think food should have flavor and texture an...,0,0
996,Appetite instantly gone.,0,1
997,Overall I was not impressed and would not go b...,0,0
998,"The whole experience was underwhelming, and I ...",0,0


In [212]:
flair_accuracy = (df.flair_sentiment_score==df.Liked).mean()
print("Accuracy of sentiment analysis using Flair :")
print("%.2f%%"%(flair_accuracy*100))

Accuracy of sentiment analysis using Flair :
93.00%


In [213]:
df.drop(['flair_sentiment_score'],axis=1, inplace=True)

## USING SPACYtextblob


In [214]:
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')

<spacytextblob.spacytextblob.SpacyTextBlob at 0x7f9719ef4d10>

In [215]:
!pip3 install spacy==3.2.0
!pip3 install spacytextblob
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

[0m

In [216]:
df_sent_score = []
df_sent_label = []
positive_words = []
negative_words = []

total_pos = []
total_neg = []

for x in df['Review']:
    doc = nlp(x)
    sentiment = doc._.blob.polarity
    sentiment = round(sentiment,2)

    if sentiment > 0:
        sent_label = "Positive"
    else:
        sent_label = "Negative"   
        
    df_sent_label.append(sent_label)
    df_sent_score.append(sentiment)
    
    for y in doc._.blob.sentiment_assessments.assessments:
        if y[1] > 0:
            positive_words.append(y[0][0])
        elif y[1] < 0:
            negative_words.append(y[0][0])
        else:
            pass
        
    total_pos.append(', '.join(set(positive_words)))
    total_neg.append(', '.join(set(negative_words)))
df["sentiement_score_spacy"] = df_sent_score
df["sentiment_label"] = df_sent_label
df["pos"] = total_pos
df["neg"] = total_neg
df['spacy_predicted']=np.select([df["sentiement_score_spacy"] < 0, df["sentiement_score_spacy"] == 0, df["sentiement_score_spacy"] > 0],[0, 0, 1])
display(df)

Unnamed: 0,Review,Liked,sentiement_score_spacy,sentiment_label,pos,neg,spacy_predicted
0,Wow... Loved this place.,1,0.40,Positive,"loved, wow",,1
1,Crust is not good.,0,-0.35,Negative,"loved, wow",not,0
2,Not tasty and the texture was just nasty.,0,-1.00,Negative,"loved, wow","nasty, not",0
3,Stopped by during the late May bank holiday of...,1,0.20,Positive,"loved, wow","late, nasty, not",1
4,The selection on the menu was great and so wer...,1,0.80,Positive,"loved, great, wow","late, nasty, not",1
...,...,...,...,...,...,...,...
995,I think food should have flavor and texture an...,0,0.00,Negative,"familiar, far, unique, pleased, elegantly, swe...","ridiculous, thin, dry, typical, strange, dead,...",0
996,Appetite instantly gone.,0,0.00,Negative,"familiar, far, unique, pleased, elegantly, swe...","ridiculous, thin, dry, typical, strange, dead,...",0
997,Overall I was not impressed and would not go b...,0,-0.17,Negative,"familiar, far, unique, pleased, elegantly, swe...","ridiculous, thin, dry, typical, strange, dead,...",0
998,"The whole experience was underwhelming, and I ...",0,0.10,Positive,"familiar, far, unique, pleased, elegantly, swe...","ridiculous, thin, dry, typical, strange, dead,...",1


In [217]:
spacy_accuracy = (df.spacy_predicted==df.Liked).mean()
print("Accuracy of sentiment analysis using Flair :")
print("%.2f%%"%(spacy_accuracy*100))

Accuracy of sentiment analysis using Flair :
77.50%
