Subjectivity detection aims to filter out neutral/non-opinionated reviews from opinionated reviews. The underlying assumption from our dataset after manual annotation of the test set is that the majority of reviews are opinionated.
Here, we annotate the reviews in the train set as subjective or objective using different methods. 

Hence we need to consider:

1. Seemingly opinionated reviews (based on the number of stars != 3) that contain more factual content than opinionated content. Eg description of book plot, content unrelated to the book at all (bought the book for school reading, bought the book for someone else, spam advertisement of the book)
2. Opinionated reviews which are not about the book itself (tricky because these reviews are subjective)
3. Due to the large amount of reviews in the train set, it is impossible to fully check if every annotation of subjective/objective produced by the methods discussed below are correct. Based on the underlying assumption that most reviews crawled are opinionated in nature, we also assume that all reviews annotated as subjective are opinionated. We select a few critical examples (based on points 1 and 2) as sanity check.

In [1]:
! pip install textblob
! pip install pandas
! pip install nltk
! pip install sklearn
! pip install contractions
! pip install emoji 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sklearn
  Downloading sklearn-0.0.tar.gz (1.1 kB)
Building wheels for collected packages: sklearn
  Building wheel for sklearn (setup.py) ... [?25l[?25hdone
  Created wheel for sklearn: filename=sklearn-0.0-py2.py3-none-any.whl size=1310 sha256=55c7fd6d9b3b723f3f58f32534fb89a157c96a7ec963cbb13d9b76ffeed5bd1e
  Stored in directory: /root/.cache/pip/wheels/46/ef/c3/157e41f5ee1372d1be90b09f74f82b10e391eaacca8f22d33e
Successfully built sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/co

In [2]:
# Load the Drive helper and mount
from google.colab import drive

# This will prompt for authorization.
drive.mount('/content/drive/')

Mounted at /content/drive/


In [66]:
# import required libraries
import pandas as pd
import numpy as np
from textblob import TextBlob
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.preprocessing import LabelEncoder
from collections import defaultdict
from nltk.corpus import wordnet as wn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import model_selection, naive_bayes, svm
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import *
import re
import contractions
import emoji
import string 
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('sentiwordnet')
from nltk.corpus import stopwords, wordnet
from nltk.corpus import sentiwordnet as swn

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package sentiwordnet to /root/nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!


In [4]:
path_to_folder = "/content/drive/My Drive/data/cz4045/"
raw_data = pd.read_csv(path_to_folder+'train_df_imbalanced.csv')
raw_data.head()

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity
0,9406,B08NLXR9V5,4,Girl Alone,One person found this helpful,"My first Blake Pierce book was enjoyable, fast...",True,mystery,Language.ENGLISH,Girl Alone. My first Blake Pierce book was enj...,1
1,5701,670062510,5,A great book for young people. It has a great...,,My favorite book when I was young. I read it ...,True,children,Language.ENGLISH,A great book for young people. It has a great...,1
2,13621,1542046599,1,More gripes than hypes,One person found this helpful,This is the first book I‚Äôve read by this aut...,True,mystery,Language.ENGLISH,More gripes than hypes. This is the first book...,-1
3,5021,399226907,5,Add this book to your collection,,Cute and educational book to teach counting an...,True,children,Language.ENGLISH,Add this book to your collection. Cute and edu...,1
4,21354,125030170X,2,Just okay.,,This is one of those books you can read in a c...,False,children,Language.ENGLISH,Just okay.. This is one of those books you can...,0


In [5]:
raw_data.loc[raw_data['Unnamed: 0'] == 8866]

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity
18707,8866,887431453,5,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,True,children,Language.ENGLISH,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,1


In [6]:
def remove_emojis(data):
    emoj = re.compile("["
        u"\U0001F600-\U0001F64F"  
        u"\U0001F300-\U0001F5FF"  
        u"\U0001F680-\U0001F6FF"  
        u"\U0001F1E0-\U0001F1FF" 
        u"\U00002500-\U00002BEF"  
        u"\U00002702-\U000027B0"
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u"\U00010000-\U0010ffff"
        u"\u2640-\u2642" 
        u"\u2600-\u2B55"
        u"\u200d"
        u"\u23cf"
        u"\u23e9"
        u"\u231a"
        u"\ufe0f"  
        u"\u3030"
                      "]+", re.UNICODE)
    return re.sub(emoj, '', data)

def remove_stopwords(reviews):
    STOPWORDS = stopwords.words('english')
    STOPWORDS.remove('not')
    STOPWORDS.remove('is')
    STOPWORDS.remove('but')
    if STOPWORDS is None:
        STOPWORDS = set(stopwords.words('english'))
    return ' '.join([word for word in reviews.split() if word not in STOPWORDS])
def remove_extra_whitespace(reviews):
    return " ".join(reviews.split())

def get_wordnet_pos(text):
    # Map POS tag to first character lemmatize() accepts
    tags = nltk.pos_tag(text)
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}
    tags = [tag_dict.get(tag[1][0],  wordnet.NOUN) for tag in tags]
    return tags

def lemmaSentence(reviews):
    lemmatizer = WordNetLemmatizer()
    lemma_text = ''
    tok_text = word_tokenize(reviews)
    tags = get_wordnet_pos(tok_text)
    for i in range(len(tok_text)):
        lemma_text = lemma_text + ' ' + lemmatizer.lemmatize(tok_text[i], tags[i])
    return lemma_text[1:]

def lower_case(review):
    
    return review.lower()

# change contraction words such sa I'm = I am, shouldn't = should not
def change_contractions(review):
    
    expanded_words = [contractions.fix(word) for word in review.split()]

    expanded_review = ' '.join(expanded_words)
    return expanded_review

# Remove Punctuations
def remove_punctuations(review):
    
    new_review = review.translate(str.maketrans('', '', string.punctuation))
    return new_review

# Remove numbers
def remove_numbers(review):
    
    mapping = str.maketrans('', '', string.digits)
    new_review = review.translate(mapping)
    
    return new_review

def change_to_apostrophe(review):

    funny_symbol = '‚äô'
    return re.sub(funny_symbol, "'", review)

  

In [7]:
# pre-processing using code from Train_Test_Processing.ipynb
def clean_text(data, textcol):

    data['cleaned_text'] = data[textcol].apply(lower_case)
    data['cleaned_text'] = data[textcol].apply(change_contractions)
    data['cleaned_text'] = data[textcol].apply(remove_emojis)
    data['cleaned_text'] = data[textcol].apply(remove_punctuations)
    data['cleaned_text'] = data[textcol].apply(remove_numbers)
    data['cleaned_text'] = data[textcol].apply(remove_stopwords)
    data['cleaned_text'] = data[textcol].apply(remove_extra_whitespace)
    data['cleaned_text'] = data[textcol].apply(lemmaSentence)
    data['cleaned_text'] = data[textcol].apply(change_to_apostrophe)
    
    return data

In [8]:
clean_data = clean_text(raw_data, 'concat_review')
clean_data.head()

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text
0,9406,B08NLXR9V5,4,Girl Alone,One person found this helpful,"My first Blake Pierce book was enjoyable, fast...",True,mystery,Language.ENGLISH,Girl Alone. My first Blake Pierce book was enj...,1,Girl Alone. My first Blake Pierce book was enj...
1,5701,670062510,5,A great book for young people. It has a great...,,My favorite book when I was young. I read it ...,True,children,Language.ENGLISH,A great book for young people. It has a great...,1,A great book for young people. It has a great...
2,13621,1542046599,1,More gripes than hypes,One person found this helpful,This is the first book I‚Äôve read by this aut...,True,mystery,Language.ENGLISH,More gripes than hypes. This is the first book...,-1,More gripes than hypes. This is the first book...
3,5021,399226907,5,Add this book to your collection,,Cute and educational book to teach counting an...,True,children,Language.ENGLISH,Add this book to your collection. Cute and edu...,1,Add this book to your collection. Cute and edu...
4,21354,125030170X,2,Just okay.,,This is one of those books you can read in a c...,False,children,Language.ENGLISH,Just okay.. This is one of those books you can...,0,Just okay.. This is one of those books you can...


## Critical examples

As mentioned in the introduction, we will select a few critical examples to check the accuracy of the methods to be used for subjectivity annotation. All of the reviews in the critical examples should be discarded and thus the subjectivity scores should not be high in order for these reviews to be discarded according to an arbitrary threshold. We will see that this is difficult for certain subjective reviews which should be discarded.


In [9]:
critical_list = [2238, 55, 360, 756, 8866, 7244, 7890, 19070, 2182, 18719]
critical_examples = clean_data[clean_data['Unnamed: 0'].isin(critical_list)]
critical_examples = critical_examples.reset_index().drop(columns=['index'])
critical_examples

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text
0,7890,1589255518,5,Baby shower gift!,,Bought this for my nephew to go into a baby sh...,True,children,Language.ENGLISH,Baby shower gift!. Bought this for my nephew t...,1,Baby shower gift!. Bought this for my nephew t...
1,7244,B01M0JHBEG,5,Perfect!,,We gave this book to our granddaughter for Hal...,True,children,Language.ENGLISH,Perfect!. We gave this book to our granddaught...,1,Perfect!. We gave this book to our granddaught...
2,360,451524934,5,"Ages like a fine wine with a dark, full-bodied...",3 people found this helpful,What can be said about this book that has not ...,True,humor_entertainment,Language.ENGLISH,"Ages like a fine wine with a dark, full-bodied...",-1,"Ages like a fine wine with a dark, full-bodied..."
3,2182,1451673310,5,I wish Amazon could sell translated to othe...,,I read all Bradbury stories tanslated to Rus...,True,humor_entertainment,Language.ENGLISH,I wish Amazon could sell translated to othe...,-1,I wish Amazon could sell translated to othe...
4,756,451526341,5,"Animal Farm, an Extremely Engaging, Dystopian ...",2 people found this helpful,Animal Farm is a dystopian book by George Orwe...,True,humor_entertainment,Language.ENGLISH,"Animal Farm, an Extremely Engaging, Dystopian ...",1,"Animal Farm, an Extremely Engaging, Dystopian ..."
5,19070,606389830,1,Your process sucks,,Stop forcing me to review a apbook I haven't r...,True,humor_entertainment,Language.ENGLISH,Your process sucks. Stop forcing me to review ...,-1,Your process sucks. Stop forcing me to review ...
6,2238,1451673310,4,School reading,,"I work at a school, and the 8th graders, read ...",True,humor_entertainment,Language.ENGLISH,"School reading. I work at a school, and the 8t...",1,"School reading. I work at a school, and the 8t..."
7,18719,B019MMUA8S,1,Arrived damaged,2,It looks like someone spilled something on it....,True,humor_entertainment,Language.ENGLISH,Arrived damaged. It looks like someone spilled...,-1,Arrived damaged. It looks like someone spilled...
8,55,451524934,5,More Relevant Now Than When I Read it in High ...,16 people found this helpful,Read this in the mid 70's as required high sch...,True,humor_entertainment,Language.ENGLISH,More Relevant Now Than When I Read it in High ...,-1,More Relevant Now Than When I Read it in High ...
9,8866,887431453,5,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,True,children,Language.ENGLISH,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,1,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...


Review mostly factual (eg describing plot of the book) without much insight to reviewer's actual thoughts on the book

In [10]:
critical_examples.iloc[2]['concat_review']



In [11]:
critical_examples.iloc[4]['concat_review']

'Animal Farm, an Extremely Engaging, Dystopian Novel. Animal Farm is a dystopian book by George Orwell about a farm full of animals all overthrowing their alcoholic, abusive farmer, Mr. Jones, in hopes of creating a better society within the farm, where everyone is equal. This execution is much harder than how the animals imagined it, and their seemingly perfect world slowly turns more and more into a society run by totalitarianism. Another book made by the same author, George Orwell, is yet another dystopian novel by the name of Nineteen Eighty-Four. Both of these books both involve communism, which could be representing Orwell‚Äôs political views. The story starts with three pigs, Old Major, Snowball, and Napoleon, as the main animals in charge, leading the revolt and helping all the other animals create a carefree, peaceful society. This leads the audience to believe that the three of them are the protagonists, and the antagonist is Mr. Jones, the farmer mistreating the animals. But

In [12]:
critical_examples.iloc[6]['concat_review']

'School reading. I work at a school, and the 8th graders, read the book. I was in one of the classes, so I wanted to know what the book was about, so that I would be able to help any students that may need help.'

In [13]:
critical_examples.iloc[8]['concat_review']

"More Relevant Now Than When I Read it in High School. Read this in the mid 70's as required high school reading.  Just re-read it as a 59 year old and was struck by how prophetic George Orwell was.  Regrettably, his fiction is becoming non-fiction, albeit 36 years later, thanks to the far left leanings of the Democratic Party and those media outlets that have tossed their journalistic integrity into Orwell's Memory Hole.  Bill DeBlasio has likely slept with this book under his pillow since his youth; the only logical explanation for how he continues to destroy NYC."

Reviewer did not actually read the book (book was a gift, etc)

In [14]:
critical_examples.iloc[0]['concat_review']

'Baby shower gift!. Bought this for my nephew to go into a baby shower basket. I always love gifting books. This one was perfect.'

In [15]:
critical_examples.iloc[1]['concat_review']

'Perfect!. We gave this book to our granddaughter for Halloween! So cute!'

Spam review

In [16]:
critical_examples.iloc[9]['concat_review']

':*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*¬®¬®*:¬∑.THANK.¬∑:*¬®¬®*:¬∑ YOU !!!!!. :*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*¬®¬®*:¬∑.THANK.¬∑:*¬®¬®*:¬∑ YOU !!!!! :*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*¬®¬®*:¬∑.THANK.¬∑:*¬®¬®*:¬∑ YOU !!!!! :*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*¬®¬®*:¬∑.THANK.¬∑:*¬®¬®*:¬∑ YOU !!!!!'

Complaints on Amazon and the company's processes

In [17]:
critical_examples.iloc[3]['concat_review']

'I wish  Amazon could  sell translated  to other  languages. I read  all Bradbury  stories tanslated to Russian, back in 1990 after "Perestroika "  . So had to order books via Amazon competitor,  and takes do long go get it from overseas...May be Amazon can get books from other countries.When I staryed to read it in English, it wasnot the same,  but I want to enjoy the story., and then to read it in English and learn more  English grammar, as clearly I\'m  not good in it'

In [18]:
critical_examples.iloc[5]['concat_review']

"Your process sucks. Stop forcing me to review a apbook I haven't read yet and try to get a life because you're really starting to irritate me with this commercial drivel."

In [19]:
critical_examples.iloc[7]['concat_review']

'Arrived damaged. It looks like someone spilled something on it. I was expecting a mint condition book.'

## Method 1: TextBlob

A popular tool for labelling subjectivity of text is Textblob which we will try first. 

In [20]:
def tb_add_subjectivity(row):
  review = TextBlob(row['cleaned_text'])
  subjectivity = review.sentiment.subjectivity
  return subjectivity 
def tb_add_polarity(row):
  review = TextBlob(row['cleaned_text'])
  polarity = review.sentiment.polarity
  return polarity 

In [21]:
clean_data['tb_subjectivity'] = clean_data.apply(tb_add_subjectivity, axis=1)
clean_data['tb_polarity'] = clean_data.apply(tb_add_polarity, axis=1)
clean_data.head()

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text,tb_subjectivity,tb_polarity
0,9406,B08NLXR9V5,4,Girl Alone,One person found this helpful,"My first Blake Pierce book was enjoyable, fast...",True,mystery,Language.ENGLISH,Girl Alone. My first Blake Pierce book was enj...,1,Girl Alone. My first Blake Pierce book was enj...,0.458333,0.060714
1,5701,670062510,5,A great book for young people. It has a great...,,My favorite book when I was young. I read it ...,True,children,Language.ENGLISH,A great book for young people. It has a great...,1,A great book for young people. It has a great...,0.539394,0.271633
2,13621,1542046599,1,More gripes than hypes,One person found this helpful,This is the first book I‚Äôve read by this aut...,True,mystery,Language.ENGLISH,More gripes than hypes. This is the first book...,-1,More gripes than hypes. This is the first book...,0.367403,0.088295
3,5021,399226907,5,Add this book to your collection,,Cute and educational book to teach counting an...,True,children,Language.ENGLISH,Add this book to your collection. Cute and edu...,1,Add this book to your collection. Cute and edu...,0.58,0.355
4,21354,125030170X,2,Just okay.,,This is one of those books you can read in a c...,False,children,Language.ENGLISH,Just okay.. This is one of those books you can...,0,Just okay.. This is one of those books you can...,0.5,0.5


In [22]:
critical_examples['tb_subjectivity'] = critical_examples.apply(tb_add_subjectivity, axis=1)
critical_examples['tb_polarity'] = critical_examples.apply(tb_add_polarity, axis=1)
critical_examples

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text,tb_subjectivity,tb_polarity
0,7890,1589255518,5,Baby shower gift!,,Bought this for my nephew to go into a baby sh...,True,children,Language.ENGLISH,Baby shower gift!. Bought this for my nephew t...,1,Baby shower gift!. Bought this for my nephew t...,0.8,0.75
1,7244,B01M0JHBEG,5,Perfect!,,We gave this book to our granddaughter for Hal...,True,children,Language.ENGLISH,Perfect!. We gave this book to our granddaught...,1,Perfect!. We gave this book to our granddaught...,1.0,0.8125
2,360,451524934,5,"Ages like a fine wine with a dark, full-bodied...",3 people found this helpful,What can be said about this book that has not ...,True,humor_entertainment,Language.ENGLISH,"Ages like a fine wine with a dark, full-bodied...",-1,"Ages like a fine wine with a dark, full-bodied...",0.498415,0.136967
3,2182,1451673310,5,I wish Amazon could sell translated to othe...,,I read all Bradbury stories tanslated to Rus...,True,humor_entertainment,Language.ENGLISH,I wish Amazon could sell translated to othe...,-1,I wish Amazon could sell translated to othe...,0.239583,0.020833
4,756,451526341,5,"Animal Farm, an Extremely Engaging, Dystopian ...",2 people found this helpful,Animal Farm is a dystopian book by George Orwe...,True,humor_entertainment,Language.ENGLISH,"Animal Farm, an Extremely Engaging, Dystopian ...",1,"Animal Farm, an Extremely Engaging, Dystopian ...",0.439359,0.233237
5,19070,606389830,1,Your process sucks,,Stop forcing me to review a apbook I haven't r...,True,humor_entertainment,Language.ENGLISH,Your process sucks. Stop forcing me to review ...,-1,Your process sucks. Stop forcing me to review ...,0.133333,-0.1
6,2238,1451673310,4,School reading,,"I work at a school, and the 8th graders, read ...",True,humor_entertainment,Language.ENGLISH,"School reading. I work at a school, and the 8t...",1,"School reading. I work at a school, and the 8t...",0.625,0.5
7,18719,B019MMUA8S,1,Arrived damaged,2,It looks like someone spilled something on it....,True,humor_entertainment,Language.ENGLISH,Arrived damaged. It looks like someone spilled...,-1,Arrived damaged. It looks like someone spilled...,0.0,0.0
8,55,451524934,5,More Relevant Now Than When I Read it in High ...,16 people found this helpful,Read this in the mid 70's as required high sch...,True,humor_entertainment,Language.ENGLISH,More Relevant Now Than When I Read it in High ...,-1,More Relevant Now Than When I Read it in High ...,0.465333,0.141333
9,8866,887431453,5,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,True,children,Language.ENGLISH,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,1,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,0.0,0.0


In [23]:
# Using a threshold of 0.5, we see which reviews would have been labelled as subjective and not discarded by TextBlob
critical_examples.loc[critical_examples['tb_subjectivity'] >= 0.5]

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text,tb_subjectivity,tb_polarity
0,7890,1589255518,5,Baby shower gift!,,Bought this for my nephew to go into a baby sh...,True,children,Language.ENGLISH,Baby shower gift!. Bought this for my nephew t...,1,Baby shower gift!. Bought this for my nephew t...,0.8,0.75
1,7244,B01M0JHBEG,5,Perfect!,,We gave this book to our granddaughter for Hal...,True,children,Language.ENGLISH,Perfect!. We gave this book to our granddaught...,1,Perfect!. We gave this book to our granddaught...,1.0,0.8125
6,2238,1451673310,4,School reading,,"I work at a school, and the 8th graders, read ...",True,humor_entertainment,Language.ENGLISH,"School reading. I work at a school, and the 8t...",1,"School reading. I work at a school, and the 8t...",0.625,0.5


In [24]:
# Using a stricter threshold of 0.3, we see which reviews would have been labelled as subjective and not discarded by TextBlob
critical_examples.loc[critical_examples['tb_subjectivity'] >= 0.3]

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text,tb_subjectivity,tb_polarity
0,7890,1589255518,5,Baby shower gift!,,Bought this for my nephew to go into a baby sh...,True,children,Language.ENGLISH,Baby shower gift!. Bought this for my nephew t...,1,Baby shower gift!. Bought this for my nephew t...,0.8,0.75
1,7244,B01M0JHBEG,5,Perfect!,,We gave this book to our granddaughter for Hal...,True,children,Language.ENGLISH,Perfect!. We gave this book to our granddaught...,1,Perfect!. We gave this book to our granddaught...,1.0,0.8125
2,360,451524934,5,"Ages like a fine wine with a dark, full-bodied...",3 people found this helpful,What can be said about this book that has not ...,True,humor_entertainment,Language.ENGLISH,"Ages like a fine wine with a dark, full-bodied...",-1,"Ages like a fine wine with a dark, full-bodied...",0.498415,0.136967
4,756,451526341,5,"Animal Farm, an Extremely Engaging, Dystopian ...",2 people found this helpful,Animal Farm is a dystopian book by George Orwe...,True,humor_entertainment,Language.ENGLISH,"Animal Farm, an Extremely Engaging, Dystopian ...",1,"Animal Farm, an Extremely Engaging, Dystopian ...",0.439359,0.233237
6,2238,1451673310,4,School reading,,"I work at a school, and the 8th graders, read ...",True,humor_entertainment,Language.ENGLISH,"School reading. I work at a school, and the 8t...",1,"School reading. I work at a school, and the 8t...",0.625,0.5
8,55,451524934,5,More Relevant Now Than When I Read it in High ...,16 people found this helpful,Read this in the mid 70's as required high sch...,True,humor_entertainment,Language.ENGLISH,More Relevant Now Than When I Read it in High ...,-1,More Relevant Now Than When I Read it in High ...,0.465333,0.141333


In [25]:
objective_reviews = clean_data.loc[clean_data['tb_subjectivity'] < 0.3]

In [26]:
objective_reviews['ratingScore'].value_counts()

1    413
5    368
4    103
2     50
3     47
Name: ratingScore, dtype: int64

In [27]:
len(objective_reviews)

981

In [28]:
len(clean_data.loc[clean_data['tb_subjectivity'] < 0.5])

6144

If we use a threshold of 0.5, a substantial amount of reviews will be discarded (over 6000), whereas a stricter threshold of 0.3 will result in nearly 1000 reviews being discarded.

Based on the critical examples, a threshold of 0.5 will result in 0.4 accuracy, while a stricter threshold of 0.3 will result in 0.7 accuracy. Observing the wrongly kept reviews for a threshold of 0.5, these reviews indeed expressed some opinion, however the opinion wasn't strongly related to the book being reviewed.

In [29]:
i = 0
print(objective_reviews.iloc[i]['concat_review'])
print('subjectivity: ', objective_reviews.iloc[i]['tb_subjectivity'])
print('polarity: ', objective_reviews.iloc[i]['tb_polarity'])
print('stars: ', objective_reviews.iloc[i]['ratingScore'])

just Read IT Already. I‚Äôve passed by this book time and again‚Ä¶didn‚Äôt want the aggravation of going back and forth between youth and adult  But last night I sampled it and finished all but the last chapter as light peaked under the window shade. I had to save the last scoop of dessert ofElliott and Macy. The pace increased with each  encounter and the alternative times built tension. The love felt real and the loss crushed and the forgiveness saved.  it
subjectivity:  0.2444444444444444
polarity:  0.1222222222222222
stars:  5


In [30]:
i = 0
review = TextBlob(objective_reviews.iloc[i]['concat_review'])
for w in review.sentences:
  print(w)
  print(w.sentiment)

just Read IT Already.
Sentiment(polarity=0.0, subjectivity=0.0)
I‚Äôve passed by this book time and again‚Ä¶didn‚Äôt want the aggravation of going back and forth between youth and adult  But last night I sampled it and finished all but the last chapter as light peaked under the window shade.
Sentiment(polarity=0.1, subjectivity=0.22666666666666666)
I had to save the last scoop of dessert ofElliott and Macy.
Sentiment(polarity=0.0, subjectivity=0.06666666666666667)
The pace increased with each  encounter and the alternative times built tension.
Sentiment(polarity=0.0, subjectivity=0.0)
The love felt real and the loss crushed and the forgiveness saved.
Sentiment(polarity=0.19999999999999998, subjectivity=0.3333333333333333)
it
Sentiment(polarity=0.0, subjectivity=0.0)


## Method 2: POS tagging and SentiWordNet

credits: https://www.kaggle.com/code/yommnamohamed/sentiment-analysis-using-sentiwordnet/notebook


In [31]:
pos=neg=obj=count=0

postagging = []

for review in clean_data['cleaned_text']:
  l = word_tokenize(review)
  postagging.append(nltk.pos_tag(l))

clean_data['pos_tags'] = postagging
clean_data.head()

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text,tb_subjectivity,tb_polarity,pos_tags
0,9406,B08NLXR9V5,4,Girl Alone,One person found this helpful,"My first Blake Pierce book was enjoyable, fast...",True,mystery,Language.ENGLISH,Girl Alone. My first Blake Pierce book was enj...,1,Girl Alone. My first Blake Pierce book was enj...,0.458333,0.060714,"[(Girl, NNP), (Alone, NNP), (., .), (My, PRP$)..."
1,5701,670062510,5,A great book for young people. It has a great...,,My favorite book when I was young. I read it ...,True,children,Language.ENGLISH,A great book for young people. It has a great...,1,A great book for young people. It has a great...,0.539394,0.271633,"[(A, DT), (great, JJ), (book, NN), (for, IN), ..."
2,13621,1542046599,1,More gripes than hypes,One person found this helpful,This is the first book I‚Äôve read by this aut...,True,mystery,Language.ENGLISH,More gripes than hypes. This is the first book...,-1,More gripes than hypes. This is the first book...,0.367403,0.088295,"[(More, RBR), (gripes, NNS), (than, IN), (hype..."
3,5021,399226907,5,Add this book to your collection,,Cute and educational book to teach counting an...,True,children,Language.ENGLISH,Add this book to your collection. Cute and edu...,1,Add this book to your collection. Cute and edu...,0.58,0.355,"[(Add, VB), (this, DT), (book, NN), (to, TO), ..."
4,21354,125030170X,2,Just okay.,,This is one of those books you can read in a c...,False,children,Language.ENGLISH,Just okay.. This is one of those books you can...,0,Just okay.. This is one of those books you can...,0.5,0.5,"[(Just, RB), (okay, RB), (.., VB), (This, DT),..."


In [32]:
def penn_to_wn(tag):
  if tag.startswith('J'):
    return wn.ADJ
  elif tag.startswith('N'):
    return wn.NOUN
  elif tag.startswith('R'):
    return wn.ADV
  elif tag.startswith('V'):
    return wn.VERB
  return None 

In [33]:
lemmatizer = WordNetLemmatizer()
# Returns list of pos-neg and objective score. But returns empty list if not present in senti wordnet.
def get_sentiment(word, tag):
  wn_tag = penn_to_wn(tag)
  if wn_tag not in (wn.NOUN, wn.ADJ, wn.ADV):
    return []
  lemma = lemmatizer.lemmatize(word, pos=wn_tag)
  if not lemma:
    return []
  #Synset is a special kind of a simple interface that is present in NLTK to look up words in WordNet. 
  #Synset instances are the groupings of synonymous words that express the same concept. 
  #Some of the words have only one Synset and some have several.
  synsets = wn.synsets(word, pos=wn_tag)
  if not synsets:
    return []
  # Take the first sense, the most common
  synset = synsets[0]
  swn_synset = swn.senti_synset(synset.name())
  return [synset.name(), swn_synset.pos_score(),swn_synset.neg_score(),swn_synset.obj_score()]

In [34]:
senti_score = []

for pos_val in clean_data['pos_tags']:
  senti_val = [get_sentiment(x,y) for (x,y) in pos_val]
  for score in senti_val:
    try:
      pos = pos + score[1] #positive score is stored at 2nd position
      neg = neg + score[2] #negative score is stored at 3rd position
    except:
      continue
  senti_score.append(pos-neg)
  pos=neg=0

clean_data['senti_score'] = senti_score

In [35]:
overall = []
for i in range(len(clean_data)):
  if (clean_data['senti_score'][i] < 0.05) & (clean_data['senti_score'][i] > -0.05):
    overall.append(0)
  else:
    overall.append(1)
clean_data['swn_subjectivity'] = overall

In [36]:
critical_examples = clean_data[clean_data['Unnamed: 0'].isin(critical_list)]
critical_examples = critical_examples.reset_index().drop(columns=['index'])
critical_examples

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text,tb_subjectivity,tb_polarity,pos_tags,senti_score,swn_subjectivity
0,7890,1589255518,5,Baby shower gift!,,Bought this for my nephew to go into a baby sh...,True,children,Language.ENGLISH,Baby shower gift!. Bought this for my nephew t...,1,Baby shower gift!. Bought this for my nephew t...,0.8,0.75,"[(Baby, NNP), (shower, VBD), (gift, NN), (!, ....",0.25,1
1,7244,B01M0JHBEG,5,Perfect!,,We gave this book to our granddaughter for Hal...,True,children,Language.ENGLISH,Perfect!. We gave this book to our granddaught...,1,Perfect!. We gave this book to our granddaught...,1.0,0.8125,"[(Perfect, JJ), (!, .), (., .), (We, PRP), (ga...",1.0,1
2,360,451524934,5,"Ages like a fine wine with a dark, full-bodied...",3 people found this helpful,What can be said about this book that has not ...,True,humor_entertainment,Language.ENGLISH,"Ages like a fine wine with a dark, full-bodied...",-1,"Ages like a fine wine with a dark, full-bodied...",0.498415,0.136967,"[(Ages, NNS), (like, IN), (a, DT), (fine, JJ),...",1.5,1
3,2182,1451673310,5,I wish Amazon could sell translated to othe...,,I read all Bradbury stories tanslated to Rus...,True,humor_entertainment,Language.ENGLISH,I wish Amazon could sell translated to othe...,-1,I wish Amazon could sell translated to othe...,0.239583,0.020833,"[(I, PRP), (wish, VBP), (Amazon, NNP), (could,...",-0.75,1
4,756,451526341,5,"Animal Farm, an Extremely Engaging, Dystopian ...",2 people found this helpful,Animal Farm is a dystopian book by George Orwe...,True,humor_entertainment,Language.ENGLISH,"Animal Farm, an Extremely Engaging, Dystopian ...",1,"Animal Farm, an Extremely Engaging, Dystopian ...",0.439359,0.233237,"[(Animal, NNP), (Farm, NNP), (,, ,), (an, DT),...",0.0,0
5,19070,606389830,1,Your process sucks,,Stop forcing me to review a apbook I haven't r...,True,humor_entertainment,Language.ENGLISH,Your process sucks. Stop forcing me to review ...,-1,Your process sucks. Stop forcing me to review ...,0.133333,-0.1,"[(Your, PRP$), (process, NN), (sucks, NNS), (....",1.125,1
6,2238,1451673310,4,School reading,,"I work at a school, and the 8th graders, read ...",True,humor_entertainment,Language.ENGLISH,"School reading. I work at a school, and the 8t...",1,"School reading. I work at a school, and the 8t...",0.625,0.5,"[(School, NNP), (reading, NN), (., .), (I, PRP...",0.875,1
7,18719,B019MMUA8S,1,Arrived damaged,2,It looks like someone spilled something on it....,True,humor_entertainment,Language.ENGLISH,Arrived damaged. It looks like someone spilled...,-1,Arrived damaged. It looks like someone spilled...,0.0,0.0,"[(Arrived, NNP), (damaged, VBD), (., .), (It, ...",0.0,0
8,55,451524934,5,More Relevant Now Than When I Read it in High ...,16 people found this helpful,Read this in the mid 70's as required high sch...,True,humor_entertainment,Language.ENGLISH,More Relevant Now Than When I Read it in High ...,-1,More Relevant Now Than When I Read it in High ...,0.465333,0.141333,"[(More, RBR), (Relevant, JJ), (Now, RB), (Than...",1.625,1
9,8866,887431453,5,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,True,children,Language.ENGLISH,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,1,:*¬®¬®*:¬∑.EXCELLENT.¬∑:*¬®¬®*:¬∑.PRODUCT.¬∑:*...,0.0,0.0,"[(:, :), (*, NN), (¬®¬®, VBZ), (*, NNS), (:, :...",0.0,0


In [37]:
objective_reviews_tb_swn = clean_data.loc[(clean_data['tb_subjectivity'] < 0.5) & (clean_data['swn_subjectivity'] == 0)]

In [38]:
objective_reviews_tb_swn

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text,tb_subjectivity,tb_polarity,pos_tags,senti_score,swn_subjectivity
144,1025,B01IW9TM5O,5,Can't wait for his next book -- Destined A Com...,,"The book is miraculous, such superb understand...",True,humor_entertainment,Language.ENGLISH,Can't wait for his next book -- Destined A Com...,1,Can't wait for his next book -- Destined A Com...,0.435357,0.163849,"[(Ca, NNP), (n't, RB), (wait, VBD), (for, IN),...",0.0,0
152,497,451526341,5,Timeless classic,,This book was mentioned by our prime minister ...,True,humor_entertainment,Language.ENGLISH,Timeless classic. This book was mentioned by o...,1,Timeless classic. This book was mentioned by o...,0.378571,0.130952,"[(Timeless, NNP), (classic, JJ), (., .), (This...",0.0,0
339,525,451526341,4,Please enter your headline.,,Please enter your headline.,True,humor_entertainment,Language.ENGLISH,Please enter your headline.. Please enter your...,1,Please enter your headline.. Please enter your...,0.000000,0.000000,"[(Please, NNP), (enter, VB), (your, PRP$), (he...",0.0,0
392,17644,399587683,5,Loved it,,I couldn't put it down cuz the book was Ah-ama...,False,romance,Language.ENGLISH,Loved it. I couldn't put it down cuz the book ...,1,Loved it. I couldn't put it down cuz the book ...,0.459722,0.192795,"[(Loved, VBN), (it, PRP), (., .), (I, PRP), (c...",0.0,0
414,756,451526341,5,"Animal Farm, an Extremely Engaging, Dystopian ...",2 people found this helpful,Animal Farm is a dystopian book by George Orwe...,True,humor_entertainment,Language.ENGLISH,"Animal Farm, an Extremely Engaging, Dystopian ...",1,"Animal Farm, an Extremely Engaging, Dystopian ...",0.439359,0.233237,"[(Animal, NNP), (Farm, NNP), (,, ,), (an, DT),...",0.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20638,16283,1984806734,4,Second Read,,I don‚Äôt know why I read this book a second t...,True,romance,Language.ENGLISH,Second Read. I don‚Äôt know why I read this bo...,1,Second Read. I don‚Äôt know why I read this bo...,0.378125,0.175000,"[(Second, JJ), (Read, NNP), (., .), (I, PRP), ...",0.0,0
20656,4979,399226907,5,Kids love this book!,,I was a kindergarten teacher for 16 years and ...,True,children,Language.ENGLISH,Kids love this book!. I was a kindergarten tea...,1,Kids love this book!. I was a kindergarten tea...,0.466667,0.437500,"[(Kids, NNS), (love, VBP), (this, DT), (book, ...",0.0,0
20843,10241,1542046599,4,Four Stars,,Keeps your attention,True,mystery,Language.ENGLISH,Four Stars. Keeps your attention,1,Four Stars. Keeps your attention,0.000000,0.000000,"[(Four, CD), (Stars, NNP), (., .), (Keeps, NNP...",0.0,0
20852,8826,887431453,5,Workbook,,My granddaughter loves it!! She will start kin...,True,children,Language.ENGLISH,Workbook. My granddaughter loves it!! She will...,1,Workbook. My granddaughter loves it!! She will...,0.000000,0.000000,"[(Workbook, NNP), (., .), (My, PRP$), (grandda...",0.0,0


In [39]:
# Wrongly kept critical examples 
critical_examples.loc[(critical_examples['tb_subjectivity'] >= 0.5) & (critical_examples['swn_subjectivity'] == 1)]

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text,tb_subjectivity,tb_polarity,pos_tags,senti_score,swn_subjectivity
0,7890,1589255518,5,Baby shower gift!,,Bought this for my nephew to go into a baby sh...,True,children,Language.ENGLISH,Baby shower gift!. Bought this for my nephew t...,1,Baby shower gift!. Bought this for my nephew t...,0.8,0.75,"[(Baby, NNP), (shower, VBD), (gift, NN), (!, ....",0.25,1
1,7244,B01M0JHBEG,5,Perfect!,,We gave this book to our granddaughter for Hal...,True,children,Language.ENGLISH,Perfect!. We gave this book to our granddaught...,1,Perfect!. We gave this book to our granddaught...,1.0,0.8125,"[(Perfect, JJ), (!, .), (., .), (We, PRP), (ga...",1.0,1
6,2238,1451673310,4,School reading,,"I work at a school, and the 8th graders, read ...",True,humor_entertainment,Language.ENGLISH,"School reading. I work at a school, and the 8t...",1,"School reading. I work at a school, and the 8t...",0.625,0.5,"[(School, NNP), (reading, NN), (., .), (I, PRP...",0.875,1


Using a threshold of 0.5 for TextBlob subjectivity score, we see that both tb and swn wrongly kept the same reviews as before. 

In [40]:
# Wrongly kept critical examples with stricter threshold
critical_examples.loc[(critical_examples['tb_subjectivity'] >= 0.3) & (critical_examples['swn_subjectivity'] == 1)]

Unnamed: 0.1,Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,concat_review,polarity,cleaned_text,tb_subjectivity,tb_polarity,pos_tags,senti_score,swn_subjectivity
0,7890,1589255518,5,Baby shower gift!,,Bought this for my nephew to go into a baby sh...,True,children,Language.ENGLISH,Baby shower gift!. Bought this for my nephew t...,1,Baby shower gift!. Bought this for my nephew t...,0.8,0.75,"[(Baby, NNP), (shower, VBD), (gift, NN), (!, ....",0.25,1
1,7244,B01M0JHBEG,5,Perfect!,,We gave this book to our granddaughter for Hal...,True,children,Language.ENGLISH,Perfect!. We gave this book to our granddaught...,1,Perfect!. We gave this book to our granddaught...,1.0,0.8125,"[(Perfect, JJ), (!, .), (., .), (We, PRP), (ga...",1.0,1
2,360,451524934,5,"Ages like a fine wine with a dark, full-bodied...",3 people found this helpful,What can be said about this book that has not ...,True,humor_entertainment,Language.ENGLISH,"Ages like a fine wine with a dark, full-bodied...",-1,"Ages like a fine wine with a dark, full-bodied...",0.498415,0.136967,"[(Ages, NNS), (like, IN), (a, DT), (fine, JJ),...",1.5,1
6,2238,1451673310,4,School reading,,"I work at a school, and the 8th graders, read ...",True,humor_entertainment,Language.ENGLISH,"School reading. I work at a school, and the 8t...",1,"School reading. I work at a school, and the 8t...",0.625,0.5,"[(School, NNP), (reading, NN), (., .), (I, PRP...",0.875,1
8,55,451524934,5,More Relevant Now Than When I Read it in High ...,16 people found this helpful,Read this in the mid 70's as required high sch...,True,humor_entertainment,Language.ENGLISH,More Relevant Now Than When I Read it in High ...,-1,More Relevant Now Than When I Read it in High ...,0.465333,0.141333,"[(More, RBR), (Relevant, JJ), (Now, RB), (Than...",1.625,1


With a stricter threshold of 0.3 for tb, together with swn, one less review was wrongly kept! The review correctly discarded was:

In [41]:
critical_examples.iloc[4]['concat_review']

'Animal Farm, an Extremely Engaging, Dystopian Novel. Animal Farm is a dystopian book by George Orwell about a farm full of animals all overthrowing their alcoholic, abusive farmer, Mr. Jones, in hopes of creating a better society within the farm, where everyone is equal. This execution is much harder than how the animals imagined it, and their seemingly perfect world slowly turns more and more into a society run by totalitarianism. Another book made by the same author, George Orwell, is yet another dystopian novel by the name of Nineteen Eighty-Four. Both of these books both involve communism, which could be representing Orwell‚Äôs political views. The story starts with three pigs, Old Major, Snowball, and Napoleon, as the main animals in charge, leading the revolt and helping all the other animals create a carefree, peaceful society. This leads the audience to believe that the three of them are the protagonists, and the antagonist is Mr. Jones, the farmer mistreating the animals. But

In [42]:
objective_reviews_tb_swn['ratingScore'].value_counts()

5    210
1    153
4     57
2     22
3     14
Name: ratingScore, dtype: int64

In [43]:
len(objective_reviews_tb_swn)

456

In [44]:
i = 0
print(objective_reviews_tb_swn.iloc[i]['concat_review'])
print('textblob:', objective_reviews_tb_swn.iloc[i]['tb_subjectivity'])
print('senti score:', objective_reviews_tb_swn.iloc[i]['senti_score'])
print('stars:', objective_reviews_tb_swn.iloc[i]['ratingScore'])

Can't wait for his next book -- Destined A Comedian?. The book is miraculous, such superb understanding of oneself, society and humanity, and such lucidity and humor narrating the horrors and triumphs of growing up in poverty, discrimination and violence in South Africa! Trevor's life is a miracle -- imagine a black girl saying to a white man that let's commit a crime mating (intercourse between black and white was punishable for five years of prison term)  to produce a baby for myself to own (to love and be loved back), and raising him in the post-apartheid era's poverty & violence to become the host of The Daily Show at age of 31!Many stories of his life were tough and seemed hopeless, but I was often laughing out loud and seeing in my mind's eyes Trevor cracking witty jokes telling them. The only regret is that he didn't tell us how a high school graduate barely making through the days selling pirated CDs and stolen goods in the hood got to become the smartest, beloved, politically 

In [45]:
objective_reviews_tb_swn.to_csv(path_to_folder+'objective_reviews.csv')

In [46]:
subjective_reviews_tb_swn = clean_data.loc[(clean_data['tb_subjectivity'] >= 0.5) & (clean_data['swn_subjectivity'] == 1)]

In [47]:
len(subjective_reviews_tb_swn)

13796

In [48]:
subjective_reviews_tb_swn['ratingScore'].value_counts()

5    7748
1    2919
4    1845
2     698
3     586
Name: ratingScore, dtype: int64

In [49]:
# Drop all 3 star review as it is ambiguous if it's positive or negative; some are positive, some are negative
subjective_reviews_tb_swn = subjective_reviews_tb_swn.loc[subjective_reviews_tb_swn['ratingScore'] != 3]

In [50]:
len(subjective_reviews_tb_swn)

13210

In [51]:
# Change polarity
# positive (1) -> 4, 5 stars
# negative (0) -> 1, 2 stars
def change_polarity(row):
  if row['ratingScore'] >= 4:
    val = 1
  else:
    val = 0
  return val 

In [52]:
subjective_reviews_tb_swn['polarity'] = clean_data.apply(change_polarity, axis=1)

In [53]:
subjective_reviews_tb_swn['polarity'].value_counts()

1    9593
0    3617
Name: polarity, dtype: int64

In [54]:
subjective_reviews_tb_swn.to_csv(path_to_folder+'subjective_reviews.csv')

## Evaluation on test

In [55]:
test_raw1 = pd.read_csv(path_to_folder + 'test_df_Bryson.csv')
test_raw2 = pd.read_csv(path_to_folder + 'test_df_Gx.csv')
test_raw3 = pd.read_csv(path_to_folder + 'test_df_Kelvin.csv')
df_list = [test_raw1, test_raw2, test_raw3]
test = pd.concat(df_list, ignore_index=True)
test = test.drop(columns=['Unnamed: 0', 'Unnamed: 0.1', 'Unnamed: 0.1.1'])
test['concat_review'] = test['reviewTitle'] + '. ' + test['reviewDescription']

In [56]:
test = clean_text(test, 'concat_review')
test.head()

Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,Annotator_1,Annotator_2,concat_review,cleaned_text
0,1982137452,1,The content is all messed up,,I started this book this week for my book club...,True,children,Language.ENGLISH,-1,-1,The content is all messed up. I started this b...,The content is all messed up. I started this b...
1,125030170X,1,Duplicate copy.Damaged book.,,Pages missing.,True,children,Language.ENGLISH,-1,-1,Duplicate copy.Damaged book.. Pages missing.,Duplicate copy.Damaged book.. Pages missing.
2,63215381,1,Awful,,I gave up after 38% of my Kindle. Yes we were ...,True,children,Language.ENGLISH,-1,-1,Awful. I gave up after 38% of my Kindle. Yes w...,Awful. I gave up after 38% of my Kindle. Yes w...
3,60935464,1,Syrupy Overload,3.0,The book is an example of leading the witness.,True,children,Language.ENGLISH,-1,-1,Syrupy Overload. The book is an example of lea...,Syrupy Overload. The book is an example of lea...
4,1501161938,1,Couldn‚Äôt read it; type too small!,1.0,"Beware, the type is TINY, I mean TINY. I am 60...",True,children,Language.ENGLISH,-1,-1,"Couldn‚Äôt read it; type too small!. Beware, t...","Couldn‚Äôt read it; type too small!. Beware, t..."


In [57]:
def change_polarity_test(row):
  if row['polarity'] == 0:
    val = 0
  else:
    val = 1
  return val

In [58]:
test['polarity'] = test['Annotator_1']
test['polarity'] = test.apply(change_polarity_test, axis=1)
test.head()

Unnamed: 0,productAsin,ratingScore,reviewTitle,reviewReaction,reviewDescription,isVerified,category,languages,Annotator_1,Annotator_2,concat_review,cleaned_text,polarity
0,1982137452,1,The content is all messed up,,I started this book this week for my book club...,True,children,Language.ENGLISH,-1,-1,The content is all messed up. I started this b...,The content is all messed up. I started this b...,1
1,125030170X,1,Duplicate copy.Damaged book.,,Pages missing.,True,children,Language.ENGLISH,-1,-1,Duplicate copy.Damaged book.. Pages missing.,Duplicate copy.Damaged book.. Pages missing.,1
2,63215381,1,Awful,,I gave up after 38% of my Kindle. Yes we were ...,True,children,Language.ENGLISH,-1,-1,Awful. I gave up after 38% of my Kindle. Yes w...,Awful. I gave up after 38% of my Kindle. Yes w...,1
3,60935464,1,Syrupy Overload,3.0,The book is an example of leading the witness.,True,children,Language.ENGLISH,-1,-1,Syrupy Overload. The book is an example of lea...,Syrupy Overload. The book is an example of lea...,1
4,1501161938,1,Couldn‚Äôt read it; type too small!,1.0,"Beware, the type is TINY, I mean TINY. I am 60...",True,children,Language.ENGLISH,-1,-1,"Couldn‚Äôt read it; type too small!. Beware, t...","Couldn‚Äôt read it; type too small!. Beware, t...",1


In [61]:
def evaluate(row):
  # TextBlob
  review = TextBlob(row['cleaned_text'])
  tb_subjectivity = review.sentiment.subjectivity
  # SentiWordNet
  pos=neg=obj=count=0
  l = word_tokenize(row['cleaned_text'])
  postagging = nltk.pos_tag(l)
  row['pos_tags'] = postagging
  senti_val = [get_sentiment(x,y) for (x,y) in postagging]
  for score in senti_val:
    try:
      pos = pos + score[1] #positive score is stored at 2nd position
      neg = neg + score[2] #negative score is stored at 3rd position
    except:
      continue
  senti_score = pos-neg
  row['senti_score'] = senti_score
  if (senti_score < 0.05) & (senti_score > -0.05):
    swn_subjectivity = 0
  else:
    swn_subjectivity = 1 
  row['tb_subjectivity'] = tb_subjectivity
  row['swn_subjectivity'] = swn_subjectivity
  # Final verdict
  if (tb_subjectivity < 0.5) & (swn_subjectivity == 0):
    return 0
  else:
    return 1

In [62]:
test['pred'] = test.apply(evaluate, axis=1)

In [63]:
y_true = test['polarity'].tolist()
y_pred = test['pred'].tolist()

In [64]:
accuracy_score(y_true, y_pred)

0.9377682403433476

In [67]:
precision_score(y_true, y_pred)

0.9561787905346187

In [68]:
recall_score(y_true, y_pred)

0.9797934440951953

In [69]:
f1_score(y_true, y_pred)

0.9678420935905966

In [65]:
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

           0       0.06      0.03      0.04       103
           1       0.96      0.98      0.97      2227

    accuracy                           0.94      2330
   macro avg       0.51      0.50      0.50      2330
weighted avg       0.92      0.94      0.93      2330



We used the manually annotated test data to evaluate how good the TextBlob + SentiWordNet annotator is. Test accuracy of 93.8% was obtained which is optimistic.