# Emotion Analysis

This notebook analyzes the level of anger, fear and sadness present in the open-ended survey comments. The comments can be viewed individually or grouped into themes, sub-themes and agreement levels. 

### Method

The idea behind this method is that words are associated to an emotion. To determine the emotion of a comment the words within the comment were compared to the [NRC Affect Intensity Lexicon v0.5](http://saifmohammad.com/WebDocs/NRC-AffectIntensity-Lexicon.txt) (Saif M. Mohammad). The lexicon contains 6,000 words associated with anger, fear, sadness and joy. Each word has a score from 0 to 1 where the higher the scores indicates the word is more strongly associated with that emotion. To get each emotion the words were summed up to give an emotion score for the comment, to determine the main emotion present the emotion with the highest score was chosen. 


# To be able to compare the scores from one comment to another to determine the angriest, saddest and most fearful comments the scores were divided by the **total number of words**. 


The lexicon does contain joy as an emotion but joy was left out of this analysis since the comments
are assumed to be negative and we were trying to understand what are the underlying causes of the employee’s negative sentiment. Joy was included at first but removed as it was found to be the prevailing emotion. Through investigating the results, we found that joy was not the correct label for the comment because the words associated are common words but were being used with a negative connotation like "not good". 

The rules of the matcher only count if the word is present and does not understand the context of the word. When examples of anger, fear and sadness were looked at the words were being used in the appropriate manner for this analysis. 

### Instructions for use

This notebook can be used to look at different emotions, comments, subthemes and themes. Change the parameters of the functions to look at different examples.

### Info about working directories

This notebook had been set up to run from the root directory. To switch the working directory, follow the instructions in the cell below.



In [4]:
# This code chunck will change the working directory to be project root

import os
# uncomment and run this line once before preceeding
#os.chdir("..")   # comment and uncomment this line
os.getcwd()

# the file path displayed is the current working directory
# this should be the project root for the follow code to run below

'C:\\Users\\payla\\Documents\\MDS\\Capstone\\DSCI_591_capstone-BCStats'

In [5]:
# Required packages
import pandas as pd
import numpy as np
import time

import spacy
# Load English model for SpaCy
nlp = spacy.load('en_core_web_sm')

In [6]:
# remove once notebook is finished
# ensure packages reload after every change 

%load_ext autoreload
%autoreload 2
import src

from src.analysis.emotion_analysis import *

## Load Data and Lexicon

In [32]:
# read full data
data_full = pd.read_csv(".\data\interim\desensitized_qualitative-data2018.csv",
                            usecols=[0, 1, 2, 3, 4, 5, 6], 
                            names=["USERID", "text", "code1", "code2", "code3", "code4", "code5"], skiprows=1)

# read agreement data
data_agreement = pd.read_csv(".\data\interim\linking_joined_qual_quant.csv",
                            usecols=[0, 1, 4, 5, 6, 7, 8])

# load lexicon
lexicon = pd.read_csv("http://saifmohammad.com/WebDocs/NRC-AffectIntensity-Lexicon.txt", 
                      sep="\t", 
                      skiprows=35) 
# read in data legend
legend = pd.read_csv("./references/data-dictionaries/theme_subtheme_names.csv")

In [33]:
display(data_full.head(3))
display(data_agreement.head(3))
display(lexicon.head(3))

Unnamed: 0,USERID,text,code1,code2,code3,code4,code5
0,192723-544650,I would suggest having a developmental growth ...,62,13.0,,,
1,188281-540434,Base decisions regarding fish and wildlife on ...,116,,,,
2,191202-862188,"Improved office space (fix HVAC, etc) but NO LWS",102,51.0,,,


Unnamed: 0,USERID,code,question,diff,text,theme,subtheme_description
0,191202-862188,102,Q39,0,"Improved office space (fix HVAC, etc) but NO LWS","Tools, Equipment & Physical Environment","Improve facilities (e.g. office space, noise l..."
1,173110-932228,14,Q46,1,Administration people should have better oppor...,Career & Personal Development,Provide opportunities for career advancement
2,185914-180608,24,Q20,0,We are the lowest paid in Canada with a worklo...,Compensation & Benefits,Increase salary


Unnamed: 0,term,score,AffectDimension
0,outraged,0.964,anger
1,brutality,0.959,anger
2,hatred,0.953,anger


## Process Full Comment data and Add Themes Names

In [42]:
data_full = src.analysis.emotion_analysis.pre_process_comments(data_full, legend)
data_full.head()




Unnamed: 0,USERID,code,text,theme,subtheme_description
0,192723-544650,62,I would suggest having a developmental growth ...,Staffing Practices,Focus on Human Resources planning (recruitment...
1,188281-540434,116,Base decisions regarding fish and wildlife on ...,"Vision, Mission & Goals",Reduce political influence
2,191202-862188,102,"Improved office space (fix HVAC, etc) but NO LWS","Tools, Equipment & Physical Environment","Improve facilities (e.g. office space, noise l..."
3,174789-230694,51,Get rid of Leading Workplace Strategies and gi...,Flexible Work Environment,Improve and/or expand Leading Workplace Strate...
4,189787-166634,114,upgrading accessibility for Deaf and Hard of H...,"Vision, Mission & Goals",Pay attention to the public interest and servi...


## Obtain Emotion Scores for Each Comment

In [44]:
start = time.time()
full_scores = src.analysis.emotion_analysis.obtain_emotion_scores(data_full, 
                                                                  lexicon, 
                                                                  anger=True, 
                                                                  fear=True, 
                                                                  sadness=True, 
                                                                  joy=False)
end = time.time()
print((end - start) / 60, "mins")
display(full_scores.head(3))

9.223340121905009 mins


Unnamed: 0,USERID,code,text,theme,subtheme_description,anger,fear,sad
0,192723-544650,62,i would suggest having a developmental growth ...,Staffing Practices,Focus on Human Resources planning (recruitment...,0.0,0.0,0.0
1,188281-540434,116,base decisions regarding fish and wildlife on ...,"Vision, Mission & Goals",Reduce political influence,0.0,0.0,0.0
2,191202-862188,102,"improved office space (fix hvac, etc) but no lws","Tools, Equipment & Physical Environment","Improve facilities (e.g. office space, noise l...",0.0,0.0,0.0


### Overall Emotions

In [None]:
plot_all = src.analysis.emotion_analysis.plot_data(data=full_scores)

In [None]:
full_scores.shape

In [None]:
agreement_scores.shape

In [None]:
src.analysis.emotion_analysis.filter_emotionless_comments(agreement_scores).shape

In [None]:
30934 - 19713

In [None]:
aa = one_hot_emotions(agreement_scores, groupby="code", agreement=False)
aa.head()

In [None]:
one_hot_emotions(agreement_scores, groupby=None, agreement=False)

In [None]:
full = src.analysis.emotion_analysis.create_bar_plot_percent(full_scores)

In [None]:
full.savefig("./reports/figures/final_pres/full_emotions.png", dpi=300);

In [None]:
full_scores.head()
sup = full_scores[full_scores["theme"]=="Supervisors"]
sup_plot = src.analysis.emotion_analysis.create_bar_plot_percent(sup)
sup_plot.savefig("./reports/figures/final_pres/sup_emotions.png", dpi=300);

In [None]:
create_bar_plot_percent(data)

In [None]:
a1 = filter_depth(12, "code", False, aa)
#print(a1)

In [None]:
a = pd.Series([14,12], index=["sad","anger"])
a.rename({"sad":"sadness"})



In [None]:
create_bar_plot(agreement=None, data=aa, title="hi")

In [None]:
a = pd.Series([11221], index=["emotionless"])

aa = aa.append(a)
aa

In [None]:
aaa = aa/aa.sum()

In [None]:
aaa.plot.bar(rot=0)

In [None]:
src.analysis.emotion_analysis.create_bar_plot_percent(agreement_scores)

In [None]:
def emotionless_count():
    

In [None]:
src.analysis.emotion_analysis.plot_data(data=agreement_scores)

In [None]:
plot_12 = src.analysis.emotion_analysis.plot_data(data=agreement_scores, 
                                                  depth="subtheme", 
                                                  name=12)

In [None]:
plot_stress = src.analysis.emotion_analysis.plot_data(data=agreement_scores, 
                                                      depth="theme", 
                                                      name="Stress & Workload")

In [None]:
## need to move this to a viz script
plot_stress.savefig("./reports/figures/final_pres/emotion_stress_workload.png");

plot_all.savefig("./reports/figures/final_pres/emotion_all.png");




In [None]:
top_fear = src.analysis.emotion_analysis.display_top_emotions(full_scores, "fear", 5)
top_fear

In [None]:
src.analysis.emotion_analysis.examine_emotion_scoring(full_scores, "fear", lexicon, 18497)

In [None]:
 src.analysis.emotion_analysis.examine_emotion_scoring(full_scores, "fear", lexicon, 10000, normalize=True)

In [None]:
src.analysis.emotion_analysis.display_top_emotions(full_scores, "fear", 5, normalize=True)

In [None]:
src.analysis.emotion_analysis.examine_emotion_scoring(full_scores, "fear", lexicon, 18497)

In [None]:
# can easily change description and split into maybe 2 tables? 
# the 2nd one with a column of emotion name, any emotion, emotion max
src.analysis.emotion_analysis.summary(agreement_scores)

In [None]:
sample = pd.read_excel("./data/raw/2018_WES_Qual_Samples.xlsx", 
                       usecols=[0, 1, 2, 3, 4, 5, 6], 
                       names=["USERID", "text", "code1", "code2", "code3", "code4", "code5"])
sample.head()


sample = src.analysis.emotion_analysis.get_theme_labels(src.analysis.emotion_analysis.format_raw_comments(sample), legend)

In [None]:
presentation_ex = 

In [None]:
sample_x = src.analysis.emotion_analysis.obtain_emotion_scores(sample, 
                                                                  lexicon, 
                                                                  anger=True, 
                                                                  fear=True, 
                                                                  sadness=True, 
                                                                  joy=False)

In [None]:
src.analysis.emotion_analysis.examine_emotion_scoring(sample_x, "fear", lexicon)

In [None]:
plot_all = src.analysis.emotion_analysis.plot_data(data=sample_x)

In [None]:
themes = full_scores["theme"].unique()

for theme in themes:
    src.analysis.emotion_analysis.plot_data(data=full_scores, 
                                                      depth="theme", 
                                                      name=theme)

In [None]:
themes = full_scores["code"].unique()

for theme in themes:
    src.analysis.emotion_analysis.plot_data(data=full_scores, 
                                                      depth="subtheme", 
                                                      name=theme)

In [None]:
benefits_all = src.analysis.emotion_analysis.plot_data(data=agreement_scores, 
                                                      depth="subtheme", 
                                                      name=24,
                                           agreement="all")
type(benefits_all)

In [None]:
benefits_all.savefig("./reports/figures/final_pres/benefits_all.png");

In [None]:
supervisor = src.analysis.emotion_analysis.plot_data(data=full_scores, 
                                                      depth="theme", 
                                                      name="Supervisors")