<h1> VADER : Valence Aware Dictionary for Sentiment Reasoning </h1>
<h2> In this notebook, I'll be assigning a Polarity score to my data using NLTK Vader Library. </h2>
<h2> Vectorization and Word Embedding will also be considered. </h2>

In [1]:
from IPython.display import display
import pandas as pd
import numpy as np

In [2]:
#Import Dataset into Dataframe
df=pd.read_csv("preprocessed_2.csv")
df

Unnamed: 0,body,score,permalink
0,youd want eat early much eco impact least tend...,1,avoid meat dairy single big way
1,dude like trump policy know want say like poli...,0,trump tape give fbi spanish authority
2,nice policy deluded patriot think america grea...,1,trump administration put steel aluminum
3,right stay away whole industry die bother r am...,-3,avoid meat dairy single big way
4,think steak blue cheese dinner edit suck vegan...,-6,avoid meat dairy single big way
...,...,...,...
3254,anti zionism inherently antisemitic defined cr...,1,israel ambassador u say radical leave college
3255,well lot right wing israeli jew one power diff...,4,israel ambassador u say radical leave college
3256,read comment say anything two paragraph follow...,-3,fox friend warns muslim enclave west
3257,israel palestine debate give people socially a...,0,israel ambassador u say radical leave college


<h2> Task 1: Acquiring the sentiment score</h2>
<h3>    </h3>

In [3]:
import nltk
nltk.download('vader_lexicon')  #download NLTK wrapper for vader library

from nltk.sentiment.vader import SentimentIntensityAnalyzer
#  Import the downloaded NLTK-based VADER library
#   and acquire the sentiment score for available input or dataset

sid = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\malik\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [4]:
neu = ' seems reasonable does it not?'
pos = 'i love cake'
neg = 'i hate drones'
print(sid.polarity_scores(neu))
print(sid.polarity_scores(pos))
print(sid.polarity_scores(neg))

{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
{'neg': 0.0, 'neu': 0.192, 'pos': 0.808, 'compound': 0.6369}
{'neg': 0.787, 'neu': 0.213, 'pos': 0.0, 'compound': -0.5719}


In [5]:
# Rename columns'score' -> 'vote' and 'permalink' -> 'topic' in dataframe
df.rename(columns={"score": "vote", "permalink": "topic", "body": "comment"}, inplace= True)
df.tail(10)

Unnamed: 0,comment,vote,topic
3249,videri make wonderful milk chocolate product a...,4,court revive nestl child slavery lawsuit
3250,sure tell truth though go step far say iran or...,1,khashoggi disclose saudi use chemical
3251,clue brazil judge comment base exactly,1,brazil elect rightwing candidate jair bolsonaro
3252,great link enjoy hahaha wonder whyu gettin vot...,0,woman blow tunis capital
3253,oppose country military,0,belgium purchase usmade f jet
3254,anti zionism inherently antisemitic defined cr...,1,israel ambassador u say radical leave college
3255,well lot right wing israeli jew one power diff...,4,israel ambassador u say radical leave college
3256,read comment say anything two paragraph follow...,-3,fox friend warns muslim enclave west
3257,israel palestine debate give people socially a...,0,israel ambassador u say radical leave college
3258,high school history enough go event lead trans...,3,brazil fearful lgbt community prepares


In [6]:
# Sneakpeak at distribution of topics in the dataframe
df['topic'].value_counts()

eighth amendment effectively ban               57
un say credible report china hold              51
cost lifesaving heroin withdrawal drug soar    39
fbi contradicts trump claim china hack         38
late trump say misspoke russia                 37
                                               ..
lindsey graham receive campaign donation        1
bolton say u sanction stay russia               1
german budget surplus hit record billion        1
alcohol safe drink global study confirm         1
fox friend warns muslim enclave west            1
Name: topic, Length: 1130, dtype: int64

# Generate Vader Compound Score for "COMMENT"

In [7]:
df['m_scores'] = df['comment'].apply(lambda review:sid.polarity_scores(review))

df['m_compound'] = df['m_scores'].apply(lambda d:d['compound'])

""" This syntax below says:
        comment scoring is +ve if compound score >0.09
        comment scoring is -ve if compound score <-0.09
        comment scoring is neutral otherwise
"""
df['m_comscore'] = df['m_compound'].apply(lambda score: 'pos' if score>0.09 else 'neg' if score<-0.09 else 'neu')

In [8]:
df.tail(20)

Unnamed: 0,comment,vote,topic,m_scores,m_compound,m_comscore
3239,let honest prolly know mean,4,u military deploy troop southern,"{'neg': 0.0, 'neu': 0.548, 'pos': 0.452, 'comp...",0.5106,pos
3240,yeah reese wholly owned subsidiary hershey lis...,3,court revive nestl child slavery lawsuit,"{'neg': 0.0, 'neu': 0.864, 'pos': 0.136, 'comp...",0.296,pos
3241,gon na back comment anything donald,-1,u military deploy troop southern,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neu
3242,muslim commit atrocity attribute ideology indi...,2,brazil fearful lgbt community prepares,"{'neg': 0.0, 'neu': 0.694, 'pos': 0.306, 'comp...",0.296,pos
3243,except actually know live brazil,1,jair bolsonaro elect president brazil,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neu
3244,except often actual criticism israel right bas...,1,israel ambassador u say radical leave college,"{'neg': 0.561, 'neu': 0.439, 'pos': 0.0, 'comp...",-0.7506,neg
3245,regular dude thing learn guy talk stuff learn ...,12,russian malware infect u government computer,"{'neg': 0.0, 'neu': 0.734, 'pos': 0.266, 'comp...",0.5719,pos
3246,tell name farmer right label,3,court revive nestl child slavery lawsuit,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neu
3247,talk extreme group u antisemitic example sjp r...,20,israel ambassador u say radical leave college,"{'neg': 0.346, 'neu': 0.573, 'pos': 0.08, 'com...",-0.891,neg
3248,evidence every post positive china thread tend...,-6,china allow rhino horn tiger bone use,"{'neg': 0.265, 'neu': 0.623, 'pos': 0.112, 'co...",-0.5994,neg


In [9]:
#Checking the distribution of the scoring
df['m_comscore'].value_counts()

pos    1420
neg    1281
neu     558
Name: m_comscore, dtype: int64

# Generate Vader Compound Score for "TOPIC"

In [10]:
df['t_scores'] = df['topic'].apply(lambda review:sid.polarity_scores(review))

df['t_compound'] = df['t_scores'].apply(lambda d:d['compound'])

df['t_comscore'] = df['t_compound'].apply(lambda score: 'pos' if score>0.09 else 'neg' if score<-0.09 else 'neu')


In [11]:
df.head(10)

Unnamed: 0,comment,vote,topic,m_scores,m_compound,m_comscore,t_scores,t_compound,t_comscore
0,youd want eat early much eco impact least tend...,1,avoid meat dairy single big way,"{'neg': 0.0, 'neu': 0.874, 'pos': 0.126, 'comp...",0.0772,neu,"{'neg': 0.306, 'neu': 0.694, 'pos': 0.0, 'comp...",-0.296,neg
1,dude like trump policy know want say like poli...,0,trump tape give fbi spanish authority,"{'neg': 0.0, 'neu': 0.445, 'pos': 0.555, 'comp...",0.9531,pos,"{'neg': 0.0, 'neu': 0.794, 'pos': 0.206, 'comp...",0.0772,neu
2,nice policy deluded patriot think america grea...,1,trump administration put steel aluminum,"{'neg': 0.155, 'neu': 0.464, 'pos': 0.381, 'co...",0.9217,pos,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neu
3,right stay away whole industry die bother r am...,-3,avoid meat dairy single big way,"{'neg': 0.354, 'neu': 0.417, 'pos': 0.229, 'co...",-0.4588,neg,"{'neg': 0.306, 'neu': 0.694, 'pos': 0.0, 'comp...",-0.296,neg
4,think steak blue cheese dinner edit suck vegan...,-6,avoid meat dairy single big way,"{'neg': 0.132, 'neu': 0.868, 'pos': 0.0, 'comp...",-0.4404,neg,"{'neg': 0.306, 'neu': 0.694, 'pos': 0.0, 'comp...",-0.296,neg
5,lol yeah sure nothing want relinquish grasp eu...,-42,merkel tell obama felt compel run,"{'neg': 0.098, 'neu': 0.319, 'pos': 0.583, 'co...",0.7251,pos,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neu
6,border clear separation hamas lead gaza strip ...,3,israel invest hundred million dollar,"{'neg': 0.376, 'neu': 0.53, 'pos': 0.094, 'com...",-0.9287,neg,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neu
7,yes relationship north korea look like year hi...,19,eu announces retaliation trump tariff,"{'neg': 0.0, 'neu': 0.47, 'pos': 0.53, 'compou...",0.7845,pos,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neu
8,although agree prior poster stretch credulity ...,16,france macron say u tariff illegal,"{'neg': 0.138, 'neu': 0.681, 'pos': 0.181, 'co...",0.2023,pos,"{'neg': 0.474, 'neu': 0.526, 'pos': 0.0, 'comp...",-0.5574,neg
9,capitalism cause innovation technology drive t...,3,trump administration put steel aluminum,"{'neg': 0.0, 'neu': 0.559, 'pos': 0.441, 'comp...",0.9218,pos,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,neu


In [12]:
#Checking the distribution of the scoring
df['t_comscore'].value_counts()


neu    1403
neg    1346
pos     510
Name: t_comscore, dtype: int64

In [13]:
df.to_csv("vaderlysis_3.1.csv", index=False)

In [14]:
df.columns
df.drop(['m_scores', 't_scores'], axis=1, inplace=True)

In [15]:
df.head(5)

Unnamed: 0,comment,vote,topic,m_compound,m_comscore,t_compound,t_comscore
0,youd want eat early much eco impact least tend...,1,avoid meat dairy single big way,0.0772,neu,-0.296,neg
1,dude like trump policy know want say like poli...,0,trump tape give fbi spanish authority,0.9531,pos,0.0772,neu
2,nice policy deluded patriot think america grea...,1,trump administration put steel aluminum,0.9217,pos,0.0,neu
3,right stay away whole industry die bother r am...,-3,avoid meat dairy single big way,-0.4588,neg,-0.296,neg
4,think steak blue cheese dinner edit suck vegan...,-6,avoid meat dairy single big way,-0.4404,neg,-0.296,neg


<h2> Use scikit learn to create one-hot-encoding for m_comscore  & t_comscore features  </h2>

<h3> After some consideration i have decided to drop the 'vote' column as it has little effect in determining the sentiment of our comments </h3>
<h3> But if considered it would be important to normalize it </h3>

In [16]:
df.columns
df.drop(['m_compound', 't_compound', 'vote'], axis=1, inplace=True)

In [17]:
df.rename(columns={"m_comscore": "m_score", "t_comscore": "t_score"}, inplace= True)
df

Unnamed: 0,comment,topic,m_score,t_score
0,youd want eat early much eco impact least tend...,avoid meat dairy single big way,neu,neg
1,dude like trump policy know want say like poli...,trump tape give fbi spanish authority,pos,neu
2,nice policy deluded patriot think america grea...,trump administration put steel aluminum,pos,neu
3,right stay away whole industry die bother r am...,avoid meat dairy single big way,neg,neg
4,think steak blue cheese dinner edit suck vegan...,avoid meat dairy single big way,neg,neg
...,...,...,...,...
3254,anti zionism inherently antisemitic defined cr...,israel ambassador u say radical leave college,neg,neu
3255,well lot right wing israeli jew one power diff...,israel ambassador u say radical leave college,neg,neu
3256,read comment say anything two paragraph follow...,fox friend warns muslim enclave west,neg,pos
3257,israel palestine debate give people socially a...,israel ambassador u say radical leave college,neg,neu


In [18]:
#Store cleaned and Scored Data in CSV
df.to_csv("vader_result_3.2.csv", index=False)

# ...

<h1> End... </h1>