# Sentiment Analysis on the SVOs 

Here, I am going to calculate the sentiment around the SVO triplets for which I already have EPA values. There is going to be a lot of work here. I need to: 

1) Retrieve the whole document.
3) Create a moving window around the triplet. 
4) Extract that text. 
5) Conduct sentiment analysis on that text. 

Let's begin by importing the necessary libraries. 

In [1]:
import spacy 
from spacy.matcher import Matcher
import textacy
import pandas as pd 
import numpy as np 
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()
nlp = spacy.load('en_core_web_sm')

from spacy.symbols import NOUN, PROPN, VERB
from spacy.tokens import Doc, Span, Token

Now I am going to import the two dataframes: 1) The full text articles 2) the triplets for which I already have scores.

In [2]:
# Import full articles
vox = pd.read_csv("~/Documents/moral_templates/Data/breitbart_articles.csv")
# Drop NAs before continuing with the analysis 
# Remember we did this in the original analysis
vox = vox.dropna(subset=['clean_strings'])
# Import known SVOs 
already_known = pd.read_csv("~/Documents/moral_templates/Data/bb_already_known.csv")

Let's check out these datasets to make sure everything is correct. 

In [3]:
# Check already known dataframe
already_known.head()

Unnamed: 0,index,subject,verb,object,start,end,subj_dep,subj_tag,obj_dep,obj_tag,Document,publication,clean_verbs,clean_objs
0,63,group,is urging,americans,0,7,nsubj,NN,dobj,NNPS,6,Breitbart,urge,american
1,175,landrieu,derided,critics,193,202,nsubj,NNP,dobj,NNS,10,Breitbart,deride,critic
2,191,city,serves,tourists,658,664,nsubj,NN,dobj,NNS,10,Breitbart,serve,tourist
3,249,julia hahn,has followed,boss,177,209,nsubj,NNP,dobj,NN,14,Breitbart,follow,boss
4,261,faith,has inspired,men,54,61,nsubj,NN,dobj,NNS,16,Breitbart,inspire,man


Everything looks okay. 

I am going to write a function that is be able to identify the triplet's document and extract the text around it. 

In [4]:
def extract_text(row, padding):
    doc = nlp(vox.iloc[already_known.iloc[row]['Document']]['clean_strings'])
    if already_known.iloc[row]['start']-padding <= 0:
        start = 0
    else:
        start = already_known.iloc[row]['start']-padding
    if already_known.iloc[row]['end']+padding+2 >= len(doc):
        end = len(doc)
    else: 
        end = already_known.iloc[row]['end']+padding+2
    text = doc[start:end]
    string = f"{text}"
    return(string)

def extract_sentiment(row, padding):
    doc = nlp(vox.iloc[already_known.iloc[row]['Document']]['clean_strings'])
    if already_known.iloc[row]['start']-padding <= 0:
        start = 0
    else:
        start = already_known.iloc[row]['start']-padding
    if already_known.iloc[row]['end']+padding+2 >= len(doc):
        end = len(doc)
    else: 
        end = already_known.iloc[row]['end']+padding+2
    text = doc[start:end]
    string = f"{text}"
    sent_score = sid.polarity_scores(string)
    return(sent_score)

Run the function across all triplets, taking 50 words before and after the event. 

This takes a while. 

In [5]:
list_sentiments = []

for x in range(len(already_known)):
    sent = extract_sentiment(row = x, padding = 50)
    if (x % 100 == 0):
        print(f'working on row {x}')
    list_sentiments.append(sent)

working on row 0
working on row 100
working on row 200
working on row 300
working on row 400
working on row 500
working on row 600
working on row 700
working on row 800
working on row 900
working on row 1000
working on row 1100
working on row 1200
working on row 1300
working on row 1400
working on row 1500
working on row 1600
working on row 1700
working on row 1800
working on row 1900
working on row 2000
working on row 2100
working on row 2200
working on row 2300
working on row 2400
working on row 2500
working on row 2600
working on row 2700
working on row 2800
working on row 2900
working on row 3000
working on row 3100
working on row 3200
working on row 3300
working on row 3400
working on row 3500
working on row 3600
working on row 3700
working on row 3800
working on row 3900
working on row 4000
working on row 4100
working on row 4200
working on row 4300
working on row 4400
working on row 4500
working on row 4600
working on row 4700
working on row 4800
working on row 4900
working on r

Compile a series of lists so we can create a neat dataset we can later add to our original data frame.

In [6]:
negative_list = []
neutral_list = []
positive_list = []
compound_list = []

for sents in list_sentiments: 
    neg = sents['neg']
    neu = sents['neu']
    pos = sents['pos']
    com = sents['compound']
    negative_list.append(neg)
    neutral_list.append(neu)
    positive_list.append(pos)
    compound_list.append(com) 

Turn lists into a dictionary and then data frame. 

In [7]:
sent_dictionary = {'negative': negative_list, 
                       'neutral': neutral_list, 
                       'positive': positive_list, 
                       'compound': compound_list}
sent_df = pd.DataFrame(sent_dictionary)


Add new dataframe to our existing data.


In [8]:
df = pd.concat([already_known, sent_df], axis = 1)

Finally save our new dataset

In [9]:
df.to_csv("~/Documents/moral_templates/Data/known_triplets_sentiments_bb.csv")