# NRC Emotional Lexicon

This is the [NRC Emotional Lexicon](http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm): "The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing."

I don't trust it, but everyone uses it.

In [1]:
import pandas as pd

In [7]:
filepath = "NRC-Emotion-Lexicon-v0.92/NRC-emotion-lexicon-wordlevel-alphabetized-v0.92.txt"
emolex_df = pd.read_csv(filepath,  names=["word", "emotion", "association"], skiprows=45, sep='\t')
emolex_df.head()

Unnamed: 0,word,emotion,association
0,aback,anger,0
1,aback,anticipation,0
2,aback,disgust,0
3,aback,fear,0
4,aback,joy,0


Seems kind of simple. A column for a word, a column for an emotion, and whether it't associated or not. You see "aback aback aback aback" because there's a row for every word-emotion pair.

## What emotions are covered?

Let's look at the 'emotion' column. What can we talk about?

In [9]:
emolex_df.emotion.unique()

array(['anger', 'anticipation', 'disgust', 'fear', 'joy', 'negative',
       'positive', 'sadness', 'surprise', 'trust'], dtype=object)

In [11]:
emolex_df.emotion.value_counts()

fear            14182
anticipation    14182
disgust         14182
surprise        14182
sadness         14182
joy             14182
negative        14182
trust           14182
anger           14182
positive        14182
Name: emotion, dtype: int64

## How many words does each emotion have?

Each emotion doesn't have 14182 words associated with it, unfortunately! `1` means "is associated" and `0` means "is not associated."

We're only going to care about "is associated."

In [13]:
emolex_df[emolex_df.association == 1].emotion.value_counts()

negative        3324
positive        2312
fear            1476
anger           1247
trust           1231
sadness         1191
disgust         1058
anticipation     839
joy              689
surprise         534
Name: emotion, dtype: int64

In theory things could be *kind of* angry or *kind of* joyous, but it doesn't work like that. If you want to spend a few hundred dollars on Mechnical Turk, though, *your own personal version can.*

## What if I just want the angry words?

In [15]:
emolex_df[(emolex_df.association == 1) & (emolex_df.emotion == 'anger')].word

30          abandoned
40        abandonment
170             abhor
180         abhorrent
270           abolish
300       abomination
630             abuse
1120         accursed
1130       accusation
1150          accused
1160          accuser
1170         accusing
1470       actionable
1650            adder
2390        adversary
2400          adverse
2410        adversity
2500         advocacy
2840          affront
2920        aftermath
3030       aggravated
3040      aggravating
3050      aggravation
3080       aggression
3090       aggressive
3100        aggressor
3140         agitated
3150        agitation
3190            agony
3570       alcoholism
             ...     
138470        warlike
138530           warp
138600        warrior
138680         wasted
138690       wasteful
139330          wench
139550           whip
139950        willful
140020          wimpy
140030          wince
140220       wireless
140290          witch
140300     witchcraft
140610            wop
140640    

## Reshaping

You can also reshape the data in order to look at it a slightly different way

In [21]:
reshaped = emolex_df.pivot(index='word', columns='emotion', values='association')
reshaped.head()

emotion,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise,trust
word,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
aback,0,0,0,0,0,0,0,0,0,0
abacus,0,0,0,0,0,0,0,0,0,1
abandon,0,0,0,1,0,1,0,1,0,0
abandoned,1,0,0,1,0,1,0,1,0,0
abandonment,1,0,0,1,0,1,0,1,1,0


You can now pull out individual words...

In [38]:
reshaped.loc['charitable']

emotion
anger           0
anticipation    1
disgust         0
fear            0
joy             1
negative        0
positive        1
sadness         0
surprise        0
trust           1
Name: charitable, dtype: int64

...or individual emotions....

In [34]:
reshaped[reshaped.anger == 1].head()

emotion,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise,trust
word,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
abandoned,1,0,0,1,0,1,0,1,0,0
abandonment,1,0,0,1,0,1,0,1,1,0
abhor,1,0,1,1,0,1,0,0,0,0
abhorrent,1,0,1,1,0,1,0,0,0,0
abolish,1,0,0,0,0,1,0,0,0,0


...or multiple emotions!

In [39]:
reshaped[(reshaped.joy == 1) & (reshaped.negative == 1)].head()

emotion,anger,anticipation,disgust,fear,joy,negative,positive,sadness,surprise,trust
word,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
abundance,0,1,1,0,1,1,1,0,0,1
balm,0,1,0,0,1,1,1,0,0,0
boisterous,1,1,0,0,1,1,1,0,0,0
celebrity,1,1,1,0,1,1,1,0,1,1
charmed,0,0,0,0,1,1,1,0,0,0
