<a href="https://colab.research.google.com/github/Shuaib-8/NLP-growth-hacking-project/blob/master/task2-dict-based-sentiment-anal/dictionary-based-sentiment-analyser.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Creating a Dictionary-based Sentiment Analyser

Given the small corpus (reviews textual dataset) generated in task 1, the objective is to construct a dictionary-based sentiment analyser.
<br>
<br>
Some of the lessons taught during these tasks:
* Word and sentence tokenization 
* Review score classification 
* Insights surrounding score and review comparisons 
* Correlation analysis surrounding review groups 
* Accounting for negation within sentiment analyser



In [1]:
## Necessary imports needed for analysis 
import pandas as pd 
import numpy as np 
import os, pathlib
import matplotlib.pyplot as plt 
import altair
import pandas_bokeh
pandas_bokeh.output_notebook()
%matplotlib inline 

### Load small corpus dataset

In [2]:
# Move back into task1 repo where review (small) corpus is located
path = pathlib.Path().home()/'Desktop/nlp-map-project/task1-create-dataset'
try:
    path = os.chdir(path) 
except FileNotFoundError as a :
    print('Already in directory/folder, carry on!')

In [3]:
df = pd.read_csv('small_corpus.csv')

When dealing with NLP/ML problems, we must initially ensure that there are no missing values, otherwise this will lead to problems later down the line.

In [4]:
# there are some missing reviews 
# which is not substanstial relative to size of dataset/corpus 
df.isna().sum()

ratings    0
reviews    4
dtype: int64

In [5]:
df[df['reviews'].isna()]

Unnamed: 0,ratings,reviews
686,1,
2590,4,
3197,5,
3470,5,


In [6]:
# fill NaNs with empty string (whitespace) 
df['reviews'] = np.where(df['reviews'].isna(), 'None', df['reviews'])

In [7]:
df[df['reviews'] == 'None']

Unnamed: 0,ratings,reviews
686,1,
2590,4,
3197,5,
3470,5,


In [8]:
# check to ensure there's no nulls 
assert df['reviews'].notna().all()

In [9]:
review_sample = df['reviews'].head().tolist()
rating_sample = df['ratings'].head().tolist()

In [10]:
print(*rating_sample)
for review in review_sample:
    print(10*'-')
    print(review)
#print(*review_sample, sep='\n') # each review is seperated by a '-'

1 1 1 1 1
----------
Recently UBISOFT had to settle a huge class-action suit brought against the company for bundling (the notoriously harmful) StarFORCE DRM with its released games. So what the geniuses at the helm do next? They decide to make the same mistake yet again - by choosing the same DRM scheme that made BIOSHOCK, MASS EFFECT and SPORE infamous: SecuROM 7.xx with LIMITED ACTIVATIONS!

MASS EFFECT can be found in clearance bins only months after its release; SPORE not only undersold miserably but also made history as the boiling point of gamers lashing back, fed up with idiotic DRM schemes. And the clueless MBAs that run an art-form as any other commodity business decided that, "hey, why not jump into THAT mud-pond ourselves?"

The original FAR CRY was such a GREAT game that any sequel of it would have to fight an uphill battle to begin with (especially without its original developing team). Now imagine shooting this sequel on the foot with a well known, much hated and totally

### Word and sentence tokenization

In [11]:
# import relevant tokenization modules from nltk 
from nltk.tokenize import word_tokenize, sent_tokenize

In [12]:
# text normalization (in this ex phrases are lowercase) is a nice addition for text analysis
# followed by the appropriate token parsing 
word_tokenization = df['reviews'].str.lower().apply(lambda x: word_tokenize(x))
word_tokenization

0       [recently, ubisoft, had, to, settle, a, huge, ...
1        [code, did, n't, work, ,, got, me, a, refund, .]
2       [these, do, not, work, at, all, ,, all, i, get...
3       [well, let, me, start, by, saying, that, when,...
4       [dont, waste, your, money, ,, you, will, just,...
                              ...                        
4495    [nice, long, micro, usb, cable, ,, battery, la...
4496    [i, 've, been, having, a, great, time, with, t...
4497                                                  [d]
4498    [really, pretty, ,, funny, ,, interesting, gam...
4499    [i, had, a, lot, of, fun, playing, this, game,...
Name: reviews, Length: 4500, dtype: object

In [13]:
sent_tokenization = df['reviews'].str.lower().apply(lambda x: sent_tokenize(x))
sent_tokenization

0       [recently ubisoft had to settle a huge class-a...
1                    [code didn't work, got me a refund.]
2       [these do not work at all, all i get is static...
3       [well let me start by saying that when i first...
4       [dont waste your money, you will just end up u...
                              ...                        
4495    [nice long micro usb cable, battery lasts a lo...
4496    [i've been having a great time with this game....
4497                                                  [d]
4498    [really pretty, funny, interesting game., work...
4499    [i had a lot of fun playing this game, if your...
Name: reviews, Length: 4500, dtype: object

### Download NLTK `opinion lexicon`

In [14]:
# corresponding module import 
import nltk
nltk.download('opinion_lexicon')

[nltk_data] Downloading package opinion_lexicon to
[nltk_data]     /Users/ShuaibAhmed/nltk_data...
[nltk_data]   Package opinion_lexicon is already up-to-date!


True

In [15]:
from nltk.corpus import opinion_lexicon

In [16]:
# Examine this module/corpus - i.e. first 10 
positive = opinion_lexicon.positive()[:10]
negative = opinion_lexicon.negative()[:10]
words = sorted(opinion_lexicon.words())[:10] # sorted alphabetically

In [17]:
print(negative)
print(positive)
print(words)

['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted']
['a+', 'abound', 'abounds', 'abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed', 'acclamation']
['2-faced', '2-faces', 'a+', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort']


In [18]:
# check the length of each corpus/set 
print(len(opinion_lexicon.positive()))
print(len(opinion_lexicon.negative()))
print(len(opinion_lexicon.words()))

2006
4783
6789


In [19]:
# create a function to check/test if certain words are in the opinion_lexicon 
def word_check(word):
    if word in opinion_lexicon.positive():
        return f'{word} is positive'
    elif word in opinion_lexicon.negative():
        return f'{word} is negative' 
    else: 
        return f'{word} not covered in lexicon' 

In [20]:
print(word_check('sad'))
print(word_check('bad'))
print(word_check('wonderful'))
# purposely checking that lexicon is always lowercase 
print(word_check('AWESOME')) 
print(word_check('awesome'))
print(word_check('none'))

sad is negative
bad is negative
wonderful is positive
AWESOME not covered in lexicon
awesome is positive
none not covered in lexicon


### Classify reviews - negative (-1) to positive (+1)

It is recommended to score the reviews in two steps: 
<br>
1) First score the sentences of the reviews from 1 to 1 based on the sum of the positive and negative words they include. 
<br>
2) Then count the sentiment score of the reviews, which you preliminary sliced into sentences.

In [21]:
def sentiment(sentence):
    try:
        sentiment=0
        words = [word.lower() for word in word_tokenize(sentence) if word.isalnum()]
        for word in words:
            if word in opinion_lexicon.positive():
                sentiment += 1
            elif word in opinion_lexicon.negative():
                sentiment -= 1
        # normalize scores to make sure score is within -1/+1 range 
        return sentiment/len(words)
    except ZeroDivisionError:
        print(0)

In [22]:
# Applying this function to every row in the dataframe (4.5k rows) took almost 4 hours to complete 
# Processes like these can be computationally expensive! 
#sentiment = df['reviews'].apply(sentiment)
#sentiment

In [23]:
# Move back into task1 repo where review (small) corpus is located
path = pathlib.Path().home()/'Desktop/nlp-map-project/task2-dict-based-sentiment-anal'
try:
    path = os.chdir(path) 
except FileNotFoundError as a :
    print('Already in directory/folder, carry on!')

In [24]:
df = pd.read_csv('review_corpus_dictionary_based_sentiment.csv')
df

Unnamed: 0,ratings,reviews,dictionary_review_sentiment
0,1,Recently UBISOFT had to settle a huge class-ac...,-0.027197
1,1,"code didn't work, got me a refund.",0.285714
2,1,"these do not work at all, all i get is static ...",0.000000
3,1,well let me start by saying that when i first ...,-0.028476
4,1,"Dont waste your money, you will just end up us...",0.000000
...,...,...,...
4495,5,"Nice long micro USB cable, battery lasts a lon...",0.058824
4496,5,I've been having a great time with this game. ...,0.166667
4497,5,d,0.000000
4498,5,"Really pretty, funny, interesting game. Works ...",0.307692


### Score and review comparison insights 

Compare the scores of the product reviews with the product ratings using a plot. In this step, you need to accomplish three sub-tasks:
<br>
<br>
1) Create a plot of the distribution of the ratings. Explore which is the most common rating
<br>
2) Create a plot of the distribution of the sentiment scores. Explore which is the most common 
<br>
3) Create a plot about the relation of the sentiment scores and product ratings

#### plot of the distribution of the ratings

In [25]:
# Records frequencies 
from collections import Counter

In [26]:
ratings = df['ratings'].tolist()
ratings_freq = Counter(ratings)
print(f'distributing of ratings: {ratings_freq}')

distributing of ratings: Counter({1: 1500, 5: 1500, 2: 500, 3: 500, 4: 500})


In [27]:
ratings_freq_df = pd.DataFrame({'ratings': [*ratings_freq.keys()], 'freq':[*ratings_freq.values()]})
ratings_freq_df 

Unnamed: 0,ratings,freq
0,1,1500
1,2,500
2,3,500
3,4,500
4,5,1500


In [28]:
# Similar solution using pandas
df['ratings'].value_counts()

5    1500
1    1500
3     500
2     500
4     500
Name: ratings, dtype: int64

In [29]:
df['ratings'].value_counts().plot_bokeh(kind='bar', xlabel='ratings', ylabel='frequency', title='Distribution of the review ratings');

Majority of ratings for this sample reviews corpus/dataset is jointly seen as 1 and 5 i.e. towards the extremes.

#### plot of the distribution of the sentiment scores

To plot the distribution of sentiment scores, it's ideal to check again if the data is intact i.e. ensuring no missing values.

In [30]:
df[df.dictionary_review_sentiment.isnull()]

Unnamed: 0,ratings,reviews,dictionary_review_sentiment
2144,3,3/5,
3468,5,A+,
3474,5,10/10,
3870,5,A+!,
4316,5,:),


As seen here, we have 5 nulls given the function applied previously couldn't decode sentiment for numerical/emoji abbreviated reviews. We can filter these out anyway...

In [31]:
df_sentiment_fil = df[np.isfinite(df['dictionary_review_sentiment'])]
df_sentiment_fil['dictionary_review_sentiment']

0      -0.027197
1       0.285714
2       0.000000
3      -0.028476
4       0.000000
          ...   
4495    0.058824
4496    0.166667
4497    0.000000
4498    0.307692
4499    0.038462
Name: dictionary_review_sentiment, Length: 4495, dtype: float64

In [32]:
# ideal to use flattened (i.e. numpy) arrays
# hist returns values (weights towards bin count) of the histogram where density normalises this distribution (integral over range is 1)
hist, bin_edges = np.histogram(df_sentiment_fil['dictionary_review_sentiment'].values, density=True)

In [33]:
print(len(bin_edges))
print(len(hist))

11
10


In [34]:
bin_edges.round(2)

array([-1. , -0.8, -0.6, -0.4, -0.2,  0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

Given one of the bin edges have one extra label i.e. they overlap, instead of -1.0 for instance, we need edges to feature a range i.e. -1.0 to -0.8.

In [35]:
labels = [(str(label[0]), str(label[1])) for label in zip(bin_edges.round(2), bin_edges[0+1:].round(2))]
print(labels)
labels = [" ".join(label) for label in labels]
print('labels: {}'.format(labels))

[('-1.0', '-0.8'), ('-0.8', '-0.6'), ('-0.6', '-0.4'), ('-0.4', '-0.2'), ('-0.2', '0.0'), ('0.0', '0.2'), ('0.2', '0.4'), ('0.4', '0.6'), ('0.6', '0.8'), ('0.8', '1.0')]
labels: ['-1.0 -0.8', '-0.8 -0.6', '-0.6 -0.4', '-0.4 -0.2', '-0.2 0.0', '0.0 0.2', '0.2 0.4', '0.4 0.6', '0.6 0.8', '0.8 1.0']


In [36]:
df_sent = pd.DataFrame({'sentiment_score_range':labels, 'bin_count':hist})

In [37]:
df_sent.plot_bokeh(kind='bar', x='sentiment_score_range', y='bin_count', figsize=(550, 550), \
    title='Distribution of the sentiment scores');

#### plot about the relation of the sentiment scores and product ratings

In [38]:
df_sentiment_fil = df[np.isfinite(df['dictionary_review_sentiment'])]
df_sentiment_fil[['ratings', 'dictionary_review_sentiment']]

Unnamed: 0,ratings,dictionary_review_sentiment
0,1,-0.027197
1,1,0.285714
2,1,0.000000
3,1,-0.028476
4,1,0.000000
...,...,...
4495,5,0.058824
4496,5,0.166667
4497,5,0.000000
4498,5,0.307692


In [39]:
rating_sentiments_chart = altair.Chart(df_sentiment_fil).mark_bar()\
    .encode(x="ratings", y="dictionary_review_sentiment", color="ratings", \
            tooltip=["ratings", "dictionary_review_sentiment"])\
    .interactive()
rating_sentiments_chart

### Exploring Correlation of the sentiment scores and product ratings

Measure the correlation of the sentiment scores and product ratings. Try out more methods. Study the contradictions, namely those cases where the rating is high but the score is low, or the other way around.
* Choose the most effective correlation measure.

In [40]:
# scipy contains specific correlation (modules) measures to choose accordingly based on the nature of the data
# scipy.stats returns corr coef (rho) and p-value in each case but we only need coef 
from scipy import stats

# pearson coef - measures linear relationship (parametric) between two datasets (columns - feature and predictor) with the additional requirement that each dataset is normally distributed
pearsonr, _ = stats.pearsonr(df_sentiment_fil['ratings'], df_sentiment_fil['dictionary_review_sentiment'])
print(f'pearson correlation coefficient is: {pearsonr:.2f}')

# spearman coef - measures somewhat non-linear (nonparametric) relationship between two datasets which doesn't assume that both datasets are normally distributed
spearmanr, _  = stats.spearmanr(df_sentiment_fil['ratings'], df_sentiment_fil['dictionary_review_sentiment'])
print(f'spearman correlation coefficient is: {spearmanr:.2f}')

pearson correlation coefficient is: 0.41
spearman correlation coefficient is: 0.57


The pearson correlation coefficient illustrates that there's a weak/mild positive correlation between product ratings and the corresponding sentiment of reviews.
<br>
The spearman correlation coefficient illustrates a marginal increase, presenting something closer to a moderate positive correlation between product ratings and the corresponding sentiment of reviews.
<br>
<br>
Hence, it could be said that the spearman correlation in this case is more appropriate, as it can somewhat account for nonlinearities among the dataset (given the two variables/features).

### Accounting for negation - improving sentiment analyser 

In [41]:
from nltk.sentiment.util import mark_negation

Given brief scans of the product reviews, it's noticeable that some of the reviews contain cases of negation such as `'code didn't work..., graphics are not good..., just doesn't feel right...'` etc. 
<br>
The negation contained in a sentence of each review in terms of natural language could be well interpreted even when the whole sentiment is generally negative, but for machines, this is not so easy and we need to configure certain rules so that such phrases do not throw off the model's predictions/interpretability.

In [42]:
for idx, review in df['reviews'].items():
    rating = df.ratings.loc[idx]
    sent = df.dictionary_review_sentiment.loc[idx]
    if (rating == 5) & (sent < -0.2):
        # retrieving the highest rating but the rules based sentiment analyser predicts a negative score 
        print(f'({rating}, {sent}) ---- {review}')
    if (rating == 1) & (sent > 0.2):
        # retrieving the lowest rating 1 but the rules based sentiment analyser predicts a positive score 
        print(f'({rating}, {sent}) ---- {review}')

(1, 0.2857142857142857) ---- code didn't work, got me a refund.
(1, 0.6666666666666666) ---- Never worked right
(1, 0.5) ---- Not Good
(1, 0.3333333333333333) ---- not kid appropriate
(1, 0.5) ---- doesn't work
(1, 0.5) ---- Never worked.
(1, 0.25) ---- No fun at all!
(1, 0.25) ---- Didn't work to will
(1, 0.3333333333333333) ---- Not well made
(1, 0.5) ---- Doesn't work
(1, 0.3333333333333333) ---- Did not work.
(1, 0.3333333333333333) ---- returning don't work
(1, 0.3333333333333333) ---- Does not work correctly with xbox
(1, 0.5) ---- Doesn't work
(1, 0.2857142857142857) ---- Good but PS3 is better than PC.
(1, 0.25) ---- Not very well written
(1, 0.3333333333333333) ---- Does not work
(1, 0.5) ---- doesn't work
(1, 0.25) ---- no good for me.
(1, 0.25) ---- He didn't like it
(5, -0.2857142857142857) ---- Killing zombies, how can you go wrong!


Testing out the `mark_negation` function on certain sentences.

In [43]:
sent_1 = "code didn't work, got me a refund."
sent_2 = "no good for me."
print(mark_negation(sent_1.split()))
print(mark_negation(sent_2.split()))

['code', "didn't", 'work,_NEG', 'got_NEG', 'me_NEG', 'a_NEG', 'refund._NEG']
['no', 'good_NEG', 'for_NEG', 'me._NEG']


`mark_negation` is able to spot the contrary phrases and more likely to understand when a sentence may actually be ***negative*** in context.

In [44]:
def sentiment_neg_update(sentence):
    try:
        sentiment=0
        words = [word.lower() for word in word_tokenize(sentence) if word.isalnum()]
        words = mark_negation(words)
        for word in words:
            if word in opinion_lexicon.positive():
                sentiment += 1
            elif word in opinion_lexicon.negative() or "_NEG" in word:
                sentiment -= 1
        # normalize scores to make sure score is within -1/+1 range 
        return sentiment/len(words)
    except ZeroDivisionError:
        print(0)

In [45]:
df.ratings.head(20).tolist()

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [46]:
df.reviews.head(20).apply(sentiment_neg_update)

0    -0.855649
1     0.285714
2    -0.869565
3    -0.931323
4    -0.980769
5     0.117647
6    -0.567010
7     0.000000
8    -0.466667
9    -0.235294
10   -0.947826
11   -0.204082
12   -0.794872
13    0.045455
14   -0.446809
15   -0.983972
16   -0.987500
17   -0.928571
18   -0.090909
19   -0.750000
Name: reviews, dtype: float64

To see how this rules-based sentiment analyser does when modified to handle negation via the `_NEG` mark as defined in the `sentiment_neg_update`, testing the first twenty rows shows that although there's supposedly some scores that aren't exactly negative - when it should be the case given the poor corresponding rating (1) - it does show a noticeable improvement, computing more of the correct boundaries (scores) for negative reviews than previously attempted using the `sentiment` function.

In [47]:
# sentiment = df['reviews'].apply(sentiment_neg_update)
# sentiment

In [48]:
df = pd.read_csv('review_corpus_dictionary_based_sentiment_negation.csv')

In [49]:
df[['ratings', 'dictionary_review_sentiment', 'dictionary_review_sentiment_updated']].tail(20)

Unnamed: 0,ratings,dictionary_review_sentiment,dictionary_review_sentiment_updated
4480,5,0.035088,-0.122807
4481,5,0.333333,0.333333
4482,5,0.045455,-0.477273
4483,5,0.036585,-0.097561
4484,5,0.056604,0.056604
4485,5,0.1,0.1
4486,5,0.333333,0.333333
4487,5,0.5,0.5
4488,5,0.056338,-0.43662
4489,5,0.2,0.2


Examining the last 20 rows of the reviews dataset (small corpus) - giving us a sample of reviews with corresponding ratings of 5 - we would expect the sentiment analyser to compute positive scores (>0) within the `dictionary_review_sentiment_updated` column. However, even though we updated the function to handle instances of negation, it looks like there could be unintended consequences as there are slightly more predictions of negative scores compared to the original computation (`dictionary_review_sentiment`), when they should in reality only be positive given ratings of 5. 

### Evaluating sentiment analyser 

To check whether the results improved overall, we need to compare the scores of the product reviews with the product ratings again.

#### plot of the distribution of the sentiment scores

In [50]:
df_sentiment_fil = df[np.isfinite(df['dictionary_review_sentiment_updated'])]
df_sentiment_fil['dictionary_review_sentiment_updated']

0      -0.855649
1       0.285714
2      -0.869565
3      -0.931323
4      -0.980769
          ...   
4495    0.058824
4496    0.166667
4497    0.000000
4498    0.307692
4499   -0.192308
Name: dictionary_review_sentiment_updated, Length: 4495, dtype: float64

In [51]:
hist, bin_edges = np.histogram(df_sentiment_fil['dictionary_review_sentiment_updated'].values, density=True)

In [52]:
print(len(bin_edges))
print(len(hist))

11
10


In [53]:
bin_edges.round(2)

array([-1. , -0.8, -0.6, -0.4, -0.2,  0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

In [54]:
labels = [(str(label[0]), str(label[1])) for label in zip(bin_edges.round(2), bin_edges[0+1:].round(2))]
print(labels)
labels = [" ".join(label) for label in labels]
print('labels: {}'.format(labels))

[('-1.0', '-0.8'), ('-0.8', '-0.6'), ('-0.6', '-0.4'), ('-0.4', '-0.2'), ('-0.2', '0.0'), ('0.0', '0.2'), ('0.2', '0.4'), ('0.4', '0.6'), ('0.6', '0.8'), ('0.8', '1.0')]
labels: ['-1.0 -0.8', '-0.8 -0.6', '-0.6 -0.4', '-0.4 -0.2', '-0.2 0.0', '0.0 0.2', '0.2 0.4', '0.4 0.6', '0.6 0.8', '0.8 1.0']


In [55]:
df_sent = pd.DataFrame({'sentiment_score_range':labels, 'bin_count':hist})

In [56]:
df_sent.plot_bokeh(kind='bar', x='sentiment_score_range', y='bin_count', figsize=(550, 550), \
    title='Distribution of the sentiment scores');

Compared to the distribution of sentiment scores first computed above, we see apparent differences.
<br>
One apparent change illustrates a multimodal distribution, where there are two different peaks regarding weighted bin counts of the sentiment score range, given a bin count of around 1.1 in the range (-1.0,-0.8) and bin count of 1.4 in the range (0,0.2).
<br>
Another change also follows that relative to the distribution bins for this small corpus dataset, the weights for the bins are noticeably frequent over the sentiment score range over (-1.0, 0.0), where we do see a noticeable, albeit marginal increase in the weighted bins for the sentiment score range over (0.0,1.0).
<br>
<br>
Altogether, it's fair to say that when updating the dict based sentiment analysis function to account for negation, we find that overall, the predictions are weighted more to the negative side than the positive side. Hence, we get more negaitve sentiment score ratings compared with positive ones.

In [58]:
df_sentiment_fil = df[np.isfinite(df['dictionary_review_sentiment_updated'])]
df_sentiment_fil[['ratings', 'dictionary_review_sentiment_updated']]

Unnamed: 0,ratings,dictionary_review_sentiment_updated
0,1,-0.855649
1,1,0.285714
2,1,-0.869565
3,1,-0.931323
4,1,-0.980769
...,...,...
4495,5,0.058824
4496,5,0.166667
4497,5,0.000000
4498,5,0.307692


In [60]:
rating_sentiments_chart_updated = altair.Chart(df_sentiment_fil).mark_bar()\
    .encode(x="ratings", y="dictionary_review_sentiment_updated", color="ratings", \
            tooltip=["ratings", "dictionary_review_sentiment_updated"])\
    .interactive()
rating_sentiments_chart_updated

Regarding the comparison between ratings and sentiment scores, we find some interesting changes when accounting for negation with the rules based sentiment analyser compared with the same visualisation that originally did not consider negation rules within the lexicon. 
* Negative ratings (ratings < 3) - Albeit a marginal change, we see range of values for ratings 1 and 2 are slightly lower on the higher end (max value is 0.5 for both), visibly seen for ratings 1 going down from 0.67
* Positive ratings (ratings > 3) - The range of values on the lower end for ratings 4 and 5 have in fact decreased, visibly seen for both rating going down from around -0.3 to -1 
<br>
<br>
Hence, following the distribution of sentiment scores, we find that for positive ratings when accounting for negation, it tends to have a propensity to predict a negative score for the review which is unexpected.

In [61]:
# scipy.stats returns corr coef (rho) and p-value in each case but we only need coef 
from scipy import stats

# pearson coef - measures linear relationship (parametric) between two datasets (columns - feature and predictor) with the additional requirement that each dataset is normally distributed
pearsonr, _ = stats.pearsonr(df_sentiment_fil['ratings'], df_sentiment_fil['dictionary_review_sentiment_updated'])
print(f'pearson correlation coefficient is: {pearsonr:.2f}')

# spearman coef - measures somewhat non-linear (nonparametric) relationship between two datasets which doesn't assume that both datasets are normally distributed
spearmanr, _  = stats.spearmanr(df_sentiment_fil['ratings'], df_sentiment_fil['dictionary_review_sentiment_updated'])
print(f'spearman correlation coefficient is: {spearmanr:.2f}')

pearson correlation coefficient is: 0.39
spearman correlation coefficient is: 0.42


Finally, based on the correlation results between ratings and sentiment scores which account for negation given the updated rules based sentiment analyser, we find there's a weaker/less visible correlation between the two variables.
* pearson - decreased from 0.41 to 0.39 
* spearman - decreased from 0.57 to 0.42 

Overall, accounting for negation within this rules based sentiment analyser for this review (lexicon) dataset sample actually made it hard to model/certify certain insights, as results compared to earlier suggests no improvement.