# NRC Emotional Lexicon

This is the [NRC Emotional Lexicon](http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm): "The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing."

I don't trust it, but everyone uses it.

In [1]:
import pandas as pd

In [3]:
filepath = "NRC-Emotion-Lexicon-Wordlevel-v0.92.txt"
emolex_df = pd.read_csv(filepath,  names=["word", "emotion", "association"], skiprows=45, sep='\t')
emolex_df.head(12)

Unnamed: 0,word,emotion,association
0,abandonment,joy,0
1,abandonment,negative,1
2,abandonment,positive,0
3,abandonment,sadness,1
4,abandonment,surprise,1
5,abandonment,trust,0
6,abate,anger,0
7,abate,anticipation,0
8,abate,disgust,0
9,abate,fear,0


Seems kind of simple. A column for a word, a column for an emotion, and whether it't associated or not. You see "aback aback aback aback" because there's a row for every word-emotion pair.

## What emotions are covered?

Let's look at the 'emotion' column. What can we talk about?

In [4]:
emolex_df.emotion.unique()

array(['joy', 'negative', 'positive', 'sadness', 'surprise', 'trust',
       'anger', 'anticipation', 'disgust', 'fear'], dtype=object)

In [5]:
emolex_df.emotion.value_counts()

trust           14178
joy             14178
surprise        14178
sadness         14178
negative        14178
positive        14178
disgust         14177
anticipation    14177
anger           14177
fear            14177
Name: emotion, dtype: int64

## How many words does each emotion have?

Each emotion doesn't have 14182 words associated with it, unfortunately! `1` means "is associated" and `0` means "is not associated."

We're only going to care about "is associated."

In [6]:
emolex_df[emolex_df.association == 1].emotion.value_counts()

negative        3322
positive        2312
fear            1473
anger           1245
trust           1230
sadness         1189
disgust         1058
anticipation     839
joy              689
surprise         534
Name: emotion, dtype: int64

In theory things could be *kind of* angry or *kind of* joyous, but it doesn't work like that. If you want to spend a few hundred dollars on Mechnical Turk, though, *your own personal version can.*

## What if I just want the angry words?

In [7]:
emolex_df[(emolex_df.association == 1) & (emolex_df.emotion == 'anger')].word

126             abhor
136         abhorrent
226           abolish
256       abomination
586             abuse
             ...     
141176       wrongful
141186        wrongly
141426           yell
141456           yelp
141596          youth
Name: word, Length: 1245, dtype: object

## Reshaping

You can also reshape the data in order to look at it a slightly different way

In [8]:
emolex_words = emolex_df.pivot(index='word', columns='emotion', values='association').reset_index()
emolex_words.head()

emotion,word,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise,trust
0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,abandonment,,,,,0.0,1.0,0.0,1.0,1.0,0.0
2,abate,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,abatement,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,abba,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


You can now pull out individual words...

In [9]:
# If you didn't reset_index you could do this more easily
# by doing emolex_words.loc['charitable']
emolex_words[emolex_words.word == 'charitable']

emotion,word,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise,trust
1998,charitable,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0


...or individual emotions....

In [10]:
emolex_words[emolex_words.anger == 1].head()

emotion,word,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise,trust
14,abhor,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
15,abhorrent,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
24,abolish,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
27,abomination,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
60,abuse,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0


...or multiple emotions!

In [11]:
emolex_words[(emolex_words.joy == 1) & (emolex_words.negative == 1)].head()

emotion,word,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise,trust
58,abundance,0.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0
1015,balm,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0
1379,boisterous,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0
1913,celebrity,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0
2001,charmed,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0


The useful part is going to be just getting words for a **single emotion.**

In [12]:
# Angry words
emolex_words[emolex_words.anger == 1].word

14             abhor
15         abhorrent
24           abolish
27       abomination
60             abuse
            ...     
14118       wrongful
14119        wrongly
14143           yell
14146           yelp
14160          youth
Name: word, Length: 1245, dtype: object

In [13]:
filepath = "Twitter_RunDisney201602_Top1000.csv"
test_df = pd.read_csv(filepath)

In [14]:
test_df.head()

Unnamed: 0,date,username,to,replies,retweets,favorites,text,geo,mentions,hashtags,id,permalink
0,2016-02-29 23:53:38,KristenTutas,,0,0,2,Gave in and bought my finisher photo. #runDisn...,,,#runDisney,704454510274617344,https://twitter.com/KristenTutas/status/704454...
1,2016-02-29 23:43:20,loki1891,,0,0,0,New day. New week. New run. Feeling good. #Emb...,,,#EmbracingtheGrind #DarkSideChallenge #runDisn...,704451920367390722,https://twitter.com/loki1891/status/7044519203...
2,2016-02-29 23:02:03,PixieDustSaving,,0,20,6,Think you can't #runDisney? then read Christin...,,,#runDisney #cignaruntogether #running #5k #Disney,704441531932012545,https://twitter.com/PixieDustSaving/status/704...
3,2016-02-29 22:26:13,LoveDisneyRun,,0,1,1,Happy leap day! Even though I am not a usual l...,,,#runDisney,704432513595412480,https://twitter.com/LoveDisneyRun/status/70443...
4,2016-02-29 20:11:19,AceGordon,DrawPlayDave,0,0,0,@DrawPlayDave ...or a #RunDisney person. Not w...,,@DrawPlayDave,#RunDisney,704398565146284032,https://twitter.com/AceGordon/status/704398565...


In [15]:
test_df['text']

0      Gave in and bought my finisher photo. #runDisn...
1      New day. New week. New run. Feeling good. #Emb...
2      Think you can't #runDisney? then read Christin...
3      Happy leap day! Even though I am not a usual l...
4      @DrawPlayDave ...or a #RunDisney person. Not w...
                             ...                        
995    I'm ready! #PHM2016 #princesshalfmarathon2016 ...
996    Run Happily Ever After #runDisney PM Ed. http:...
997    Starting to get excited to add these to my col...
998    Good luck tomorrow to all the runners! #Prince...
999    Good luck tomorrow to all the runners! #Prince...
Name: text, Length: 1000, dtype: object

In [70]:
#Convert text to lowercase
text = test_df['text'].str.lower()

In [91]:
#remove punctuation
text = text.str.replace(".","")
text = text.str.replace(",","")
text = text.str.replace("?","")
text = text.str.replace("!","")
text = text.str.replace(";","")
text = text.str.replace("-","")
text = text.str.replace("(","")
text = text.str.replace(")","")
text = text.str.replace("'","")

#remove "rt"
text = text.str.replace("rt","")
#remove usernames
text = text.str.replace("^@\w+", "")

#remove links
text = text.str.replace("^http\w+", "")

#remove leading spaces
text = text.str.replace("^ ", "")

#remove trailing spaces
text = text.str.replace(" $", "")

#remove multiple spaces
text = text.str.replace("  ", " ")

#remove stop words
#

In [100]:
tweet_list = text.str.split(" ",expand=False)
tweet_list.head()

0    [gave, in, and, bought, my, finisher, photo, #...
1    [new, day, new, week, new, run, feeling, good,...
2    [think, you, cant, #rundisney, then, read, chr...
3    [happy, leap, day, even, though, i, am, not, a...
4    [or, a, #rundisney, person, not, weirdjustdedi...
Name: text, dtype: object

In [141]:
emotions = emolex_words.columns.values
emotions = emotions[1:len(emotions)-1]
col = ['tweet']
for e in emotions:
    col.append(e)
    
sentiment_df = pd.DataFrame(columns=col)
tweet_df = pd.DataFrame(columns=emotions)

In [162]:
emolex_words = emolex_words.fillna(0)
for tweet in tweet_list: #[['abandonment','great']]: #tweet_list:
    for i in tweet:
        if i != "None":
            #print(i)
            tweet_df = tweet_df.append(emolex_words.loc[emolex_words['word']==i][emotions],sort=False)
    sentiment_df = sentiment_df.append(tweet_df.sum(),sort='False', ignore_index=True)
    tweet_df = tweet_df[0:0]

In [106]:
col[1:len(col)-1]

array(['anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative',
       'positive', 'sadness', 'surprise'], dtype=object)

In [113]:
emolex_words = emolex_words.fillna(0)
emolex_words.head()

emotion,word,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise,trust
0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,abandonment,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0
2,abate,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,abatement,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,abba,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


In [120]:
emolex_words.loc[emolex_words['word']=='abandonment']

emotion,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise
1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0


In [156]:
sentiment_df.head()
sentiment_df = sentiment_df.drop(columns=['tweet'])

In [165]:
sentiment_df.mean()

anger           0.047406
anticipation    0.227191
disgust         0.022958
fear            0.052475
joy             0.210197
negative        0.083184
positive        0.365832
sadness         0.037269
surprise        0.090936
dtype: float64

In [166]:
sentiment_df.sum()

anger            159.0
anticipation     762.0
disgust           77.0
fear             176.0
joy              705.0
negative         279.0
positive        1227.0
sadness          125.0
surprise         305.0
dtype: float64

In [164]:
sentiment_df

Unnamed: 0,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
4,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
...,...,...,...,...,...,...,...,...,...
3349,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3350,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
3351,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
3352,0.0,3.0,0.0,0.0,2.0,0.0,2.0,0.0,2.0
