# INFOMCDMMC Critical Data Mining of Media Culture

## Utrecht University, MSc Applied Data Science


### Team members:
* Meagan Loerakker, m.b.loerakker@students.uu.nl
* Celesta Terwisscha van Scheltinga, c.c.m.terwisschavanscheltinga@students.uu.nl
* Nina Alblas, n.m.alblas@students.uu.nl
* Berber van Drunen, b.p.vandrunen@students.uu.nl
* Debarupa Roy Choudhury, d.roychoudhury@students.uu.nl

# Comparing sentiment analysis results

In [44]:
# Stats
import pandas as pd

# Support
import re
import csv
import string

In [45]:
sentiments_df = pd.read_csv("data/vader_textblob_sentiments.csv").iloc[:, 1:]
sentiments_df.head(2)

Unnamed: 0,filename,outlet,title,description,datetime,body,year,month,preprocessed_description,preprocessed_body,vader_desc_sentiment,vader_body_sentiment,textblob_desc_sentiment_no_cutoff,textblob_body_sentiment_no_cutoff,textblob_desc_sentiment,textblob_body_sentiment
0,2010-06-gears-of-war-3-beast.html,Wired,Gears of War 3 Co-op Makes Beasts of Gamers,"LOS ANGELES — Back in 2008, Gears of War 2 int...",2010-06-17 16:22:00.000,"LOS ANGELES – Back in 2008, Gears of War 2 in...",2010,6,los angeles gear war introduced horde mode co ...,los angeles gear war introduced horde mode co ...,0,-1,0,0,1,0
1,sponsored-story-innovating-for-the-individual....,Wired,WIRED Brand Lab | Innovating for the Individual,What every leader can learn from the technolog...,2021-08-27 12:14:31.296,Innovative technology is making healthcare mor...,2021,8,leader learn transforming healthcare,innovative making healthcare personal bit inge...,0,1,0,0,0,1


In [46]:
# load manual sentiment analysis (200 articles)
all_manual_coding = pd.read_csv("data/sentiment_labeling.csv", index_col=0)
all_manual_coding

Unnamed: 0,description,Berber,Celesta,Debarupa,Meagan,Nina
13369,When the Food and Drug Administration recently...,1,1,1,1,1
12938,Apple CEO Tim Cook surprised everyone at this ...,1,1,1,1,1
13322,"Over the last few days, an unknown Go player n...",-1,-1,0,0,0
13201,A devious device looking suspiciously like the...,-1,-1,-1,-1,-1
14670,"By tracking how people live their values, busi...",0,0,0,0,0
...,...,...,...,...,...,...
1240,A debate is raging over the social media giant...,x,x,x,x,0
1937,Opinion: Kids today have an online presence st...,x,x,x,x,-1
16121,New report heralds mass data monitoring in hig...,x,x,x,x,0
13541,"Speaking in Washington, D.C. earlier today, fo...",x,x,x,x,0


In [47]:
# get the articles we manually labeled together
shared_coding = all_manual_coding.iloc[:20, :]
individual_coding = all_manual_coding.iloc[20:, :]
shared_coding.head(2)

Unnamed: 0,description,Berber,Celesta,Debarupa,Meagan,Nina
13369,When the Food and Drug Administration recently...,1,1,1,1,1
12938,Apple CEO Tim Cook surprised everyone at this ...,1,1,1,1,1


In [48]:
# get the articles we labelled unanimously
uni_coding = shared_coding.apply(lambda x : x.Berber == x.Celesta == x.Debarupa == x.Meagan == x.Nina, axis=1)
shared_uni_coding = shared_coding[uni_coding]
shared_uni_coding

Unnamed: 0,description,Berber,Celesta,Debarupa,Meagan,Nina
13369,When the Food and Drug Administration recently...,1,1,1,1,1
12938,Apple CEO Tim Cook surprised everyone at this ...,1,1,1,1,1
13201,A devious device looking suspiciously like the...,-1,-1,-1,-1,-1
14670,"By tracking how people live their values, busi...",0,0,0,0,0
14675,Furhat Robotics is on a quest to create a mach...,0,0,0,0,0
17394,"Frightening, informative and skeptical takes o...",-1,-1,-1,-1,-1
17607,"In “Uncharted,” two former Harvard colleagues ...",0,0,0,0,0
13703,Australia has rolled out a pilot program using...,0,0,0,0,0
13840,Google has faced widespread public backlash an...,-1,-1,-1,-1,-1
13817,A Canadian grocery chain says its introducing ...,0,0,0,0,0


In [49]:
# combine unanimous shared labels and individual labels
manual_coding = shared_uni_coding.append(individual_coding)
manual_coding

Unnamed: 0,description,Berber,Celesta,Debarupa,Meagan,Nina
13369,When the Food and Drug Administration recently...,1,1,1,1,1
12938,Apple CEO Tim Cook surprised everyone at this ...,1,1,1,1,1
13201,A devious device looking suspiciously like the...,-1,-1,-1,-1,-1
14670,"By tracking how people live their values, busi...",0,0,0,0,0
14675,Furhat Robotics is on a quest to create a mach...,0,0,0,0,0
...,...,...,...,...,...,...
1240,A debate is raging over the social media giant...,x,x,x,x,0
1937,Opinion: Kids today have an online presence st...,x,x,x,x,-1
16121,New report heralds mass data monitoring in hig...,x,x,x,x,0
13541,"Speaking in Washington, D.C. earlier today, fo...",x,x,x,x,0


In [50]:
# create new column with all manual sentiments combined
manual_coding["All"] = manual_coding.iloc[:, 1:].apply(lambda row : row.loc[row != "x"].values[0], axis=1)
manual_coding = manual_coding.sort_index()
manual_coding

Unnamed: 0,description,Berber,Celesta,Debarupa,Meagan,Nina,All
784,Humans have been snapping pics of their plates...,x,x,x,1,x,1
786,A month-long battle between gangs of green and...,0,x,x,x,x,0
890,Just because Idris Elba does it doesn't make i...,x,-1,x,x,x,-1
1237,Though having Elmo run the show would be fun.,0,x,x,x,x,0
1240,A debate is raging over the social media giant...,x,x,x,x,0,0
...,...,...,...,...,...,...,...
17418,Here are some trends to watch.,0,x,x,x,x,0
17529,Although banks are hiring armies of legal and ...,x,x,x,-1,x,-1
17607,"In “Uncharted,” two former Harvard colleagues ...",0,0,0,0,0,0
17677,There are lessons to be learned — and some ben...,x,x,x,1,x,1


In [51]:
# get the articles that were manually coded from sentiments df
sentiments_df_manual = sentiments_df[sentiments_df.index.isin(manual_coding.index)] 
sentiments_df_manual.head(2)

Unnamed: 0,filename,outlet,title,description,datetime,body,year,month,preprocessed_description,preprocessed_body,vader_desc_sentiment,vader_body_sentiment,textblob_desc_sentiment_no_cutoff,textblob_body_sentiment_no_cutoff,textblob_desc_sentiment,textblob_body_sentiment
784,story-grand-theft-auto-v-purple-green-alien-wa...,Wired,An Alien War Took Over Grand Theft Auto V. It ...,A month-long battle between gangs of green and...,2020-05-15 17:14:00.000,Adam Long was scrolling through the Grand Thef...,2020,5,month long battle gang green purple alien spil...,adam long scrolling grand theft auto v subredd...,-1,-1,0,0,-1,-1
786,story-control-review.html,Wired,'Control' Is a Paranoiac's Dream Turned Into a...,"On the surface, the game feels cold and brutal...",2019-08-29 16:32:00.000,It's easy for videogames to be surreal. By the...,2019,8,surface game feel cold brutal alive,easy videogames surreal nature design game goo...,-1,1,0,0,-1,1


In [52]:
# Add manual coding to final sentiments df (merge while not ignoring index)
sentiments_df_manual["manual_coding"] = manual_coding["All"].astype('int64')
sentiments_df_manual.head(2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,filename,outlet,title,description,datetime,body,year,month,preprocessed_description,preprocessed_body,vader_desc_sentiment,vader_body_sentiment,textblob_desc_sentiment_no_cutoff,textblob_body_sentiment_no_cutoff,textblob_desc_sentiment,textblob_body_sentiment,manual_coding
784,story-grand-theft-auto-v-purple-green-alien-wa...,Wired,An Alien War Took Over Grand Theft Auto V. It ...,A month-long battle between gangs of green and...,2020-05-15 17:14:00.000,Adam Long was scrolling through the Grand Thef...,2020,5,month long battle gang green purple alien spil...,adam long scrolling grand theft auto v subredd...,-1,-1,0,0,-1,-1,1
786,story-control-review.html,Wired,'Control' Is a Paranoiac's Dream Turned Into a...,"On the surface, the game feels cold and brutal...",2019-08-29 16:32:00.000,It's easy for videogames to be surreal. By the...,2019,8,surface game feel cold brutal alive,easy videogames surreal nature design game goo...,-1,1,0,0,-1,1,0


In [53]:
def compare_sentiments(df, analyzer_results):
    """
    Takes the 'analyzer_results' column from df and computes
    the fraction of unanimously labelled sentiments.    
    """
    
    unanimous = (df[analyzer_results] == df["manual_coding"]).sum()
    total = df.shape[0]
    
    return f"{round(unanimous / total, 2)}"

In [54]:
# vader on descriptions vs manual coding
vader_desc = compare_sentiments(sentiments_df_manual, "vader_desc_sentiment")

In [55]:
# vader on bodies vs manual coding
vader_body = compare_sentiments(sentiments_df_manual, "vader_body_sentiment")

In [65]:
# the code for un-preprocessed VADER has unfortunately been lost, but proportions are correct

# vader on descriptions vs manual coding (not pre-processed)
vader_desc_no_preprocess = .29 

# vader on bodies vs manual coding (not pre-processed)
vader_body_no_preprocess = .21

In [56]:
# textblob on descriptions vs manual coding
textblob_desc = compare_sentiments(sentiments_df_manual, "textblob_desc_sentiment_no_cutoff")

In [57]:
# textblob on bodies vs manual coding
textblob_body = compare_sentiments(sentiments_df_manual, "textblob_body_sentiment_no_cutoff")

In [58]:
# textblob on descriptions (edited cut-off points) vs manual coding
textblob_desc_cutoff = compare_sentiments(sentiments_df_manual, "textblob_desc_sentiment")

In [59]:
# textblob on bodies (edited cut-off points) vs manual coding
textblob_body_cutoff = compare_sentiments(sentiments_df_manual, "textblob_body_sentiment")

In [66]:
compared_sentiments = pd.DataFrame({
                                    "VADER" : [vader_desc, vader_body],
                                    "VADER (un-preprocessed)" : [vader_desc_no_preprocess, vader_body_no_preprocess],
                                    "TextBlob" : [textblob_desc, textblob_body],
                                    "TextBlob (edited cut-off)" : [textblob_desc_cutoff, textblob_body_cutoff],
                                   },
                                    index=["Description", "Body"]
                                  )
compared_sentiments

Unnamed: 0,VADER,VADER (un-preprocessed),TextBlob,TextBlob (edited cut-off)
Description,0.32,0.29,0.47,0.34
Body,0.25,0.21,0.49,0.3
