# <div class='alert alert-info'>Sentiment Analysis</div>

**Sentiment analysis is the process of classifying whether a block of text is positive, negative, or, neutral. Sentiment analysis is contextual mining of words which indicates the social sentiment of a brand and also helps the business to determine whether the product which they are manufacturing is going to make a demand in the market or not. The goal which Sentiment analysis tries to gain is to analyze people’s opinion in a way that it can help the businesses expand. It focuses not only on polarity (positive, negative & neutral) but also on emotions (happy, sad, angry, etc.). It uses various Natural Language Processing algorithms such as Rule-based, Automatic, and Hybrid.**

## <font color='blue'>Vader</font>

**VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. VADER uses a combination of A sentiment lexicon is a list of lexical features (e.g., words) which are generally labeled according to their semantic orientation as either positive or negative. VADER not only tells about the Positivity and Negativity score but also tells us about how positive or negative a sentiment is.**

## <font color='blue'>Lets start coding</font>

In [1]:
import nltk

In [2]:
#nltk.download('vader_lexicon') 
#You only need to do this once

In [3]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer as ss

In [4]:
sia=ss()

**Lets start**

**1)**

In [5]:
a="Data Science is the best thing to study"

In [6]:
sia.polarity_scores(a)

{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'compound': 0.6369}

**INFERENCE**:<br>
neutral value is 0.625<br>
positive value is 0.375<br>

**2)**

In [7]:
a="Machine Learning is the best ever thing I have studied and I gained Good and very helpful knowledge related to it!!!"

In [8]:
sia.polarity_scores(a)

{'neg': 0.0, 'neu': 0.523, 'pos': 0.477, 'compound': 0.9283}

**INference**:<br>
We can see the positive rate increased and there is stil no negastive value

**3)**

In [9]:
a="Chinese Food with Roti is the Worst Combination and can make you feel bad for a long time!"

In [10]:
sia.polarity_scores(a)

{'neg': 0.345, 'neu': 0.655, 'pos': 0.0, 'compound': -0.8356}

**Inference**<br>
Now you can see there is no positive value instead we got a negative value

Compund score below 0 indicates a negative sentence whereas above 0 it indicates a positive sentence

# <div class='alert alert-info'>Lets start working with Data Frames</div>

In [11]:
import pandas as pd
import numpy as np

In [13]:
df=pd.read_csv("Review.csv")

In [14]:
df.head()

Unnamed: 0,sentiment,review
0,Negative,I had no background knowledge of this movie be...
1,Negative,I am a huge Jane Austen fan and I ordered the ...
2,Negative,Nothing to say but Wow! Has anyone actually ha...
3,Negative,i like Jane Austin novels. I love Pride and Pr...
4,Negative,In this day and age of incredible special movi...


In [15]:
df['sentiment'].value_counts()

Negative    5081
Positive    4919
Name: sentiment, dtype: int64

**Null Values handling**

In [16]:
df.isnull().sum()

sentiment    0
review       0
dtype: int64

so no need to drop anything

type

Lets check if we have some empty blanks

In [18]:
blanks=[]
for i,s,r in df.itertuples():
    if type(r)==str:
        if r.isspace():
            blanks.append(i)
        

In [19]:
blanks

[]

So no blanks so we do need to remove anything

In [20]:
#IF We would have blanks then
#df.drop(blanks,inplace=True)

In [23]:
df.head(3)

Unnamed: 0,sentiment,review
0,Negative,I had no background knowledge of this movie be...
1,Negative,I am a huge Jane Austen fan and I ordered the ...
2,Negative,Nothing to say but Wow! Has anyone actually ha...


In [27]:
df.iloc[0]['review']  #just check what happens if we peint this

"I had no background knowledge of this movie before I bought it, but it sounded cool and I've been wanting to see a really kick-butt Viking movie for awhile now... alas, this film was not what I was looking for. I had hoped for the best, but instead, was delivered a boring Nordic soap-opera that seemed to drag on too long despite its 84 minute running time. The film's premise is intriguing enough: It's about a Viking warlord who defies his God and Odin is so enraged that he curses the warlord's son, named Barek, to death and rebirth as a Berserker. This Barek guy is then forced to live enraged, insane, and violent lifetime after lifetime. The movie is filmed competently enough, with some rich cinematography and quasi-good performances by the actors, but again, I found myself bored and questioning when this dribble would end. The filmmakers had a chance to make something rather entertaining and semi-unique but they dropped the ball. Perhaps it could've been improved with some cheap expl

In [31]:
sia.polarity_scores(df.iloc[0]['review'])

{'neg': 0.208, 'neu': 0.665, 'pos': 0.127, 'compound': -0.9813}

**We can see it has a more negative value then positive and you can verify from the table in the sentiment it has been given negative** 

In [32]:
df['sia.polarity_scores']=df['review'].apply(lambda r:sia.polarity_scores(r))

In [33]:
df

Unnamed: 0,sentiment,review,sia.polarity_scores
0,Negative,I had no background knowledge of this movie be...,"{'neg': 0.208, 'neu': 0.665, 'pos': 0.127, 'co..."
1,Negative,I am a huge Jane Austen fan and I ordered the ...,"{'neg': 0.028, 'neu': 0.889, 'pos': 0.083, 'co..."
2,Negative,Nothing to say but Wow! Has anyone actually ha...,"{'neg': 0.131, 'neu': 0.735, 'pos': 0.134, 'co..."
3,Negative,i like Jane Austin novels. I love Pride and Pr...,"{'neg': 0.113, 'neu': 0.678, 'pos': 0.209, 'co..."
4,Negative,In this day and age of incredible special movi...,"{'neg': 0.081, 'neu': 0.726, 'pos': 0.193, 'co..."
...,...,...,...
9995,Positive,I first saw this movie back in the early'90's ...,"{'neg': 0.048, 'neu': 0.855, 'pos': 0.097, 'co..."
9996,Positive,"NYC, 2022: The Greenhouse effect, vanished oce...","{'neg': 0.061, 'neu': 0.851, 'pos': 0.089, 'co..."
9997,Positive,"Those individuals familiar with Asian cinema, ...","{'neg': 0.124, 'neu': 0.796, 'pos': 0.08, 'com..."
9998,Positive,"The kids, aged 7 to 14, got such a huge kick o...","{'neg': 0.069, 'neu': 0.693, 'pos': 0.238, 'co..."


**Since we can't see the compound score lets see the compound score**

In [34]:
df['compound']=df['sia.polarity_scores'].apply(lambda x:x['compound'])

In [41]:
df.tail()

Unnamed: 0,sentiment,review,sia.polarity_scores,compound
9995,Positive,I first saw this movie back in the early'90's ...,"{'neg': 0.048, 'neu': 0.855, 'pos': 0.097, 'co...",0.7985
9996,Positive,"NYC, 2022: The Greenhouse effect, vanished oce...","{'neg': 0.061, 'neu': 0.851, 'pos': 0.089, 'co...",0.3759
9997,Positive,"Those individuals familiar with Asian cinema, ...","{'neg': 0.124, 'neu': 0.796, 'pos': 0.08, 'com...",-0.9788
9998,Positive,"The kids, aged 7 to 14, got such a huge kick o...","{'neg': 0.069, 'neu': 0.693, 'pos': 0.238, 'co...",0.9792
9999,Positive,I so love this movie! The animation is great (...,"{'neg': 0.0, 'neu': 0.663, 'pos': 0.337, 'comp...",0.99


In [42]:
df['compound_score']=df['compound'].apply(lambda x:'Positive' if x>0 else 'Negative')

In [43]:
df.tail()

Unnamed: 0,sentiment,review,sia.polarity_scores,compound,compound_score
9995,Positive,I first saw this movie back in the early'90's ...,"{'neg': 0.048, 'neu': 0.855, 'pos': 0.097, 'co...",0.7985,Positive
9996,Positive,"NYC, 2022: The Greenhouse effect, vanished oce...","{'neg': 0.061, 'neu': 0.851, 'pos': 0.089, 'co...",0.3759,Positive
9997,Positive,"Those individuals familiar with Asian cinema, ...","{'neg': 0.124, 'neu': 0.796, 'pos': 0.08, 'com...",-0.9788,Negative
9998,Positive,"The kids, aged 7 to 14, got such a huge kick o...","{'neg': 0.069, 'neu': 0.693, 'pos': 0.238, 'co...",0.9792,Positive
9999,Positive,I so love this movie! The animation is great (...,"{'neg': 0.0, 'neu': 0.663, 'pos': 0.337, 'comp...",0.99,Positive


So it looks like instead of index 9997 we got all the others as right

Lets check for all the other indexes and have a overall report

In [44]:
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix

In [46]:
accuracy_score(df['sentiment'],df['compound_score'])

0.6935

So the accauracy score is 0.69

**<font color='blue'>If we would have chosen normally we would have probably got an accuracy around 0.5 or maybe more less.<br><br>
So we are definitely doing better.</font>**


In [48]:
print(classification_report(df['sentiment'],df['compound_score']))

              precision    recall  f1-score   support

    Negative       0.79      0.54      0.64      5081
    Positive       0.64      0.85      0.73      4919

    accuracy                           0.69     10000
   macro avg       0.72      0.70      0.69     10000
weighted avg       0.72      0.69      0.69     10000



In [49]:
print(confusion_matrix(df['sentiment'],df['compound_score']))

[[2740 2341]
 [ 724 4195]]
