## Unsupervised Sentiment Analysis: Rule Based Approach

* TextBlob
* Vader
* SentiWordNet

Author: Sumaia Parveen Shupti 

Created on: 8/5/2021

Updated on: 8/5/2021

In [1]:
#!pip install vaderSentiment

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

In [2]:
df1 = pd.read_csv('../input/womens-ecommerce-clothing-reviews/Womens Clothing E-Commerce Reviews.csv')

# function to analyze the reviews
def analysis(score):
    if score == 1 or score == 2:
        return "Negative"
    elif score == 4 or score == 5:
        return "Positive"
    else:
        return 'Neutral'
    
df1['Sentiment_Original'] = df1['Rating'].apply(analysis)
df1.head()

Unnamed: 0.1,Unnamed: 0,Clothing ID,Age,Title,Review Text,Rating,Recommended IND,Positive Feedback Count,Division Name,Department Name,Class Name,Sentiment_Original
0,0,767,33,,Absolutely wonderful - silky and sexy and comf...,4,1,0,Initmates,Intimate,Intimates,Positive
1,1,1080,34,,Love this dress! it's sooo pretty. i happene...,5,1,4,General,Dresses,Dresses,Positive
2,2,1077,60,Some major design flaws,I had such high hopes for this dress and reall...,3,0,0,General,Dresses,Dresses,Neutral
3,3,1049,50,My favorite buy!,"I love, love, love this jumpsuit. it's fun, fl...",5,1,0,General Petite,Bottoms,Pants,Positive
4,4,847,47,Flattering shirt,This shirt is very flattering to all due to th...,5,1,6,General,Tops,Blouses,Positive


In [24]:
_counts = df1['Sentiment_Original'].value_counts().to_frame().reset_index()
_counts.columns = ["Sentiment", "Count"]
_counts

import plotly.express as px
fig = px.pie(_counts, values='Count', names='Sentiment', color_discrete_sequence=px.colors.sequential.RdBu, opacity = 0.9, title="Actual Labels")
fig.show()

In [4]:
col_name = 'Review Text'
df = df1[[col_name]]
df[col_name] = df[col_name].replace('', np.nan)
df = df.dropna()
df.head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Review Text
0,Absolutely wonderful - silky and sexy and comf...
1,Love this dress! it's sooo pretty. i happene...
2,I had such high hopes for this dress and reall...
3,"I love, love, love this jumpsuit. it's fun, fl..."
4,This shirt is very flattering to all due to th...


In [5]:
## Step 1: Cleaning the text

import re

# Define a function to clean the text
def clean(text):
    # Removes all special characters and numericals leaving the alphabets
    text = re.sub('[^A-Za-z]+', ' ', text) 
    return text

# Cleaning the text in the review column
df['Cleaned Reviews'] = df[col_name].apply(clean)

## Steps 2-4: Tokenization, POS tagging, stopwords removal

import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize
from nltk import pos_tag
nltk.download('stopwords')
from nltk.corpus import stopwords
nltk.download('wordnet')
from nltk.corpus import wordnet
# POS tagger dictionary
pos_dict = {'J':wordnet.ADJ, 'V':wordnet.VERB, 'N':wordnet.NOUN, 'R':wordnet.ADV}

def token_stop_pos(text):
    tags = pos_tag(word_tokenize(text))
    newlist = []
    for word, tag in tags:
        if word.lower() not in set(stopwords.words('english')):
            newlist.append(tuple([word, pos_dict.get(tag[0])]))
    return newlist

df['POS tagged'] = df['Cleaned Reviews'].apply(token_stop_pos)

## Step 5: Obtaining the stem words
    
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

def lemmatize(pos_data):
    lemma_rew = " "
    for word, pos in pos_data:
        if not pos: 
            lemma = word
            lemma_rew = lemma_rew + " " + lemma
        else:  
            lemma = wordnet_lemmatizer.lemmatize(word, pos=pos)
            lemma_rew = lemma_rew + " " + lemma
    return lemma_rew
    
df['Lemma'] = df['POS tagged'].apply(lemmatize)

[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /usr/share/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /usr/share/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [6]:
## TextBlob in rescue

from textblob import TextBlob

# function to calculate subjectivity 
def getSubjectivity(review):
    return TextBlob(review).sentiment.subjectivity

# function to calculate polarity
def getPolarity(review):
    return TextBlob(review).sentiment.polarity

# function to analyze the reviews
def analysis(score):
    if score < 0:
        return 'Negative'
    elif score == 0:
        return 'Neutral'
    else:
        return 'Positive'
    
df = pd.DataFrame(df[[col_name, 'Lemma', 'POS tagged']])
df['Polarity_TextBlob'] = df['Lemma'].apply(getPolarity) 
df['Sentiment_TextBlob'] = df['Polarity_TextBlob'].apply(analysis)
df.head()

Unnamed: 0,Review Text,Lemma,POS tagged,Polarity_TextBlob,Sentiment_TextBlob
0,Absolutely wonderful - silky and sexy and comf...,Absolutely wonderful silky sexy comfortable,"[(Absolutely, r), (wonderful, a), (silky, n), ...",0.633333,Positive
1,Love this dress! it's sooo pretty. i happene...,Love dress sooo pretty happen find store gla...,"[(Love, v), (dress, n), (sooo, a), (pretty, r)...",0.31875,Positive
2,I had such high hopes for this dress and reall...,high hope dress really want work initially o...,"[(high, a), (hopes, n), (dress, n), (really, r...",0.0823,Positive
3,"I love, love, love this jumpsuit. it's fun, fl...",love love love jumpsuit fun flirty fabulous ...,"[(love, v), (love, r), (love, v), (jumpsuit, n...",0.5,Positive
4,This shirt is very flattering to all due to th...,shirt flattering due adjustable front tie pe...,"[(shirt, n), (flattering, a), (due, a), (adjus...",0.458333,Positive


In [23]:
tb_counts = df.Sentiment_TextBlob.value_counts().to_frame().reset_index()
tb_counts.columns = ["Sentiment", "Count"]
tb_counts

import plotly.express as px
fig = px.pie(tb_counts, values='Count', names='Sentiment', color_discrete_sequence=px.colors.sequential.RdBu, opacity = 0.9, title="TextBlob Results")
fig.show()

In [8]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

# function to calculate vader sentiment  
def vadersentimentanalysis(review):
    vs = analyzer.polarity_scores(review)
    return vs['compound']

df['Polarity_Vader'] = df['Lemma'].apply(vadersentimentanalysis)
df.head()

Unnamed: 0,Review Text,Lemma,POS tagged,Polarity_TextBlob,Sentiment_TextBlob,Polarity_Vader
0,Absolutely wonderful - silky and sexy and comf...,Absolutely wonderful silky sexy comfortable,"[(Absolutely, r), (wonderful, a), (silky, n), ...",0.633333,Positive,0.8991
1,Love this dress! it's sooo pretty. i happene...,Love dress sooo pretty happen find store gla...,"[(Love, v), (dress, n), (sooo, a), (pretty, r)...",0.31875,Positive,0.971
2,I had such high hopes for this dress and reall...,high hope dress really want work initially o...,"[(high, a), (hopes, n), (dress, n), (really, r...",0.0823,Positive,0.9184
3,"I love, love, love this jumpsuit. it's fun, fl...",love love love jumpsuit fun flirty fabulous ...,"[(love, v), (love, r), (love, v), (jumpsuit, n...",0.5,Positive,0.9437
4,This shirt is very flattering to all due to th...,shirt flattering due adjustable front tie pe...,"[(shirt, n), (flattering, a), (due, a), (adjus...",0.458333,Positive,0.9062


In [9]:
# function to analyse 
def vader_analysis(compound):
    if compound >= 0.5:
        return 'Positive'
    elif compound <= -0.5 :
        return 'Negative'
    else:
        return 'Neutral'
    
df['Sentiment_Vader'] = df['Polarity_Vader'].apply(vader_analysis)
df.head()

Unnamed: 0,Review Text,Lemma,POS tagged,Polarity_TextBlob,Sentiment_TextBlob,Polarity_Vader,Sentiment_Vader
0,Absolutely wonderful - silky and sexy and comf...,Absolutely wonderful silky sexy comfortable,"[(Absolutely, r), (wonderful, a), (silky, n), ...",0.633333,Positive,0.8991,Positive
1,Love this dress! it's sooo pretty. i happene...,Love dress sooo pretty happen find store gla...,"[(Love, v), (dress, n), (sooo, a), (pretty, r)...",0.31875,Positive,0.971,Positive
2,I had such high hopes for this dress and reall...,high hope dress really want work initially o...,"[(high, a), (hopes, n), (dress, n), (really, r...",0.0823,Positive,0.9184,Positive
3,"I love, love, love this jumpsuit. it's fun, fl...",love love love jumpsuit fun flirty fabulous ...,"[(love, v), (love, r), (love, v), (jumpsuit, n...",0.5,Positive,0.9437,Positive
4,This shirt is very flattering to all due to th...,shirt flattering due adjustable front tie pe...,"[(shirt, n), (flattering, a), (due, a), (adjus...",0.458333,Positive,0.9062,Positive


In [22]:
vd_counts = df['Sentiment_Vader'].value_counts().to_frame().reset_index()
vd_counts.columns = ["Sentiment", "Count"]
vd_counts

import plotly.express as px
fig = px.pie(vd_counts, values='Count', names='Sentiment', color_discrete_sequence=px.colors.sequential.RdBu, opacity = 0.9, title="Vader Results")
fig.show()

In [11]:
nltk.download('sentiwordnet')
from nltk.corpus import sentiwordnet as swn

def sentiwordnetanalysis(pos_data):
    sentiment = 0
    tokens_count = 0
    for word, pos in pos_data:
        if not pos:
            continue
        lemma = wordnet_lemmatizer.lemmatize(word, pos=pos)
        if not lemma:
            continue
        
        synsets = wordnet.synsets(lemma, pos=pos)
        if not synsets:
            continue

        # Take the first sense, the most common
        synset = synsets[0]
        swn_synset = swn.senti_synset(synset.name())
        sentiment += swn_synset.pos_score() - swn_synset.neg_score()
        tokens_count += 1
        # print(swn_synset.pos_score(),swn_synset.neg_score(),swn_synset.obj_score())
    if not tokens_count:
        return 0
    if sentiment>0:
        return "Positive"
    if sentiment==0:
        return "Neutral"
    else:
        return "Negative"

df['Sentiment_SWNet'] = df['POS tagged'].apply(sentiwordnetanalysis)
df.head()

[nltk_data] Downloading package sentiwordnet to
[nltk_data]     /usr/share/nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!


Unnamed: 0,Review Text,Lemma,POS tagged,Polarity_TextBlob,Sentiment_TextBlob,Polarity_Vader,Sentiment_Vader,Sentiment_SWNet
0,Absolutely wonderful - silky and sexy and comf...,Absolutely wonderful silky sexy comfortable,"[(Absolutely, r), (wonderful, a), (silky, n), ...",0.633333,Positive,0.8991,Positive,Positive
1,Love this dress! it's sooo pretty. i happene...,Love dress sooo pretty happen find store gla...,"[(Love, v), (dress, n), (sooo, a), (pretty, r)...",0.31875,Positive,0.971,Positive,Positive
2,I had such high hopes for this dress and reall...,high hope dress really want work initially o...,"[(high, a), (hopes, n), (dress, n), (really, r...",0.0823,Positive,0.9184,Positive,Negative
3,"I love, love, love this jumpsuit. it's fun, fl...",love love love jumpsuit fun flirty fabulous ...,"[(love, v), (love, r), (love, v), (jumpsuit, n...",0.5,Positive,0.9437,Positive,Positive
4,This shirt is very flattering to all due to th...,shirt flattering due adjustable front tie pe...,"[(shirt, n), (flattering, a), (due, a), (adjus...",0.458333,Positive,0.9062,Positive,Positive


In [21]:
swn_counts = df['Sentiment_SWNet'].value_counts().to_frame().reset_index()
swn_counts.columns = ["Sentiment", "Count"]
swn_counts

import plotly.express as px
fig = px.pie(swn_counts, values='Count', names='Sentiment', color_discrete_sequence=px.colors.sequential.RdBu, opacity = 0.9, title="SentiWordNet Results")
fig.show()

In [13]:
df['Original'] = df1[['Sentiment_Original']]
df.head()

Unnamed: 0,Review Text,Lemma,POS tagged,Polarity_TextBlob,Sentiment_TextBlob,Polarity_Vader,Sentiment_Vader,Sentiment_SWNet,Original
0,Absolutely wonderful - silky and sexy and comf...,Absolutely wonderful silky sexy comfortable,"[(Absolutely, r), (wonderful, a), (silky, n), ...",0.633333,Positive,0.8991,Positive,Positive,Positive
1,Love this dress! it's sooo pretty. i happene...,Love dress sooo pretty happen find store gla...,"[(Love, v), (dress, n), (sooo, a), (pretty, r)...",0.31875,Positive,0.971,Positive,Positive,Positive
2,I had such high hopes for this dress and reall...,high hope dress really want work initially o...,"[(high, a), (hopes, n), (dress, n), (really, r...",0.0823,Positive,0.9184,Positive,Negative,Neutral
3,"I love, love, love this jumpsuit. it's fun, fl...",love love love jumpsuit fun flirty fabulous ...,"[(love, v), (love, r), (love, v), (jumpsuit, n...",0.5,Positive,0.9437,Positive,Positive,Positive
4,This shirt is very flattering to all due to th...,shirt flattering due adjustable front tie pe...,"[(shirt, n), (flattering, a), (due, a), (adjus...",0.458333,Positive,0.9062,Positive,Positive,Positive


In [14]:
original = df.Original.value_counts().to_frame().reset_index()
original.columns = ['Sentiment', 'Actual Count']
original

Unnamed: 0,Sentiment,Actual Count
0,Positive,17448
1,Neutral,2823
2,Negative,2370


In [15]:
textblob = df.Sentiment_TextBlob.value_counts().to_frame().reset_index()
textblob.columns = ['Sentiment', 'TextBlob Count']
textblob

Unnamed: 0,Sentiment,TextBlob Count
0,Positive,21206
1,Negative,1294
2,Neutral,141


In [16]:
vader = df.Sentiment_Vader.value_counts().to_frame().reset_index()
vader.columns = ['Sentiment', 'Vader Count']
vader

Unnamed: 0,Sentiment,Vader Count
0,Positive,20516
1,Neutral,1980
2,Negative,145


In [17]:
swn = df.Sentiment_SWNet.value_counts().to_frame().reset_index()
swn.columns = ['Sentiment', 'SWNet Count']
swn

Unnamed: 0,Sentiment,SWNet Count
0,Positive,15841
1,Negative,5927
2,Neutral,872
3,0,1


In [18]:
comp = original.merge(textblob)
comp = comp.merge(vader)
comp = comp.merge(swn)
comp

Unnamed: 0,Sentiment,Actual Count,TextBlob Count,Vader Count,SWNet Count
0,Positive,17448,21206,20516,15841
1,Neutral,2823,141,1980,872
2,Negative,2370,1294,145,5927


In [20]:
import plotly.express as px

fig = px.bar(comp, x="Sentiment", y=["Actual Count", "TextBlob Count", "Vader Count", "SWNet Count"], title="Comparison of Results: Actual- TextBlob- Vader- SentiWordNet", color_discrete_sequence=px.colors.sequential.RdBu)
fig.show()

## References

* https://www.alphabold.com/sentiment-analysis-the-lexicon-based-approach/
* https://www.analyticsvidhya.com/blog/2021/06/rule-based-sentiment-analysis-in-python/