# CHI Square Test
To further look into differences in our data, we conduct multiple chi² tests to see if there are any significant differences between females/males and grad students/PhDs regarding different emotion, affect, level of interest and arousal valence attributes.

## Import relevant libraries

In [1]:
import numpy as np
import pandas as pd
from os import listdir
import matplotlib.pyplot as plt
import itertools as it
from statsmodels.sandbox.stats.multicomp import multipletests
import sklearn.preprocessing as pp
import statsmodels.api as sm
#import nltk
import scipy.stats as st
import statsmodels.formula.api as smf
import seaborn as sns
import Helper as hp

## Load .csv data with results of OpenSMILE Analysis
First we load .csv data and clean it (removing of NaNs), then we store information of all files in seperate panda dataframes containing information about affect, emotion and valence/arousal for all participants.

In [2]:
data = pd.read_csv("CHI_2019_FULL.csv")

#Set Labels 
emotion_label = ['Anger', 'Boredom', 'Disgust', 'Fear', 'Happiness', 'Emo_Neutral', 'Sadness']
affect_label = ['Aggressiv', 'Cheerful', 'Intoxicated', 'Nervous', 'Aff_Neutral', 'Tired']
loi_label = ['Disinterest', 'Normal', 'High Interest']

#Get specific data and save it into new data frames
# We use the pandas .copy(deep=True) function to prevent the SettingWithCopyWarning we would otherwise get. Since we do
# not write, but only read from the data, the warning does not affect the data frames
df_emotion = data[['Anger', 'Boredom', 'Disgust', 'Fear', 'Happiness', 'Emo_Neutral', 'Sadness', 'Filename']].copy(deep=True)
df_affect = data[['Aggressiv', 'Cheerful', 'Intoxicated', 'Nervous', 'Aff_Neutral', 'Tired', 'Filename']].copy(deep=True)
df_loi = data[['Disinterest', 'Normal', 'High Interest', 'Filename']].copy(deep=True)
df_ar_val = data[['Arousal', 'Valence', 'Filename']].copy(deep=True)
#For further usage, we want to append the CharacterID as a column, which is saved with other information in the filename
#Since we only want the digits, we can remove all non-digit characters of the filename column and append the column to the df

df_emotion['Char_ID'] = df_emotion['Filename'].replace('\D+','', regex = True).copy(deep=True)
df_affect['Char_ID'] = df_affect['Filename'].replace('\D+','', regex = True).copy(deep=True)
df_loi['Char_ID'] = df_loi['Filename'].replace('\D+','', regex = True).copy(deep=True)
df_ar_val['Char_ID'] = df_ar_val['Filename'].replace('\D+','', regex = True).copy(deep=True)



## Let's load information about the speakers
The speaker ID is saved in a single .csv file containing four important columns: ID, Age, Sex and Acadedmic Status. Since before loaded OpenSMILE csv files are named using the corresponding index (ex. speaker with id 0 has two files 0_a.csv and 0_b.csv), so that a link can be created

In [3]:
char_data = pd.read_csv("CHI_2019_CharacterData.csv")  

#Join above tables and Character Tables

#To Join DataFrames we have to cast the column on which we want to join to int, so that both columns have the same data type
char_data['ID'] = char_data['ID'].astype(int)
df_ar_val['Char_ID'] = df_ar_val['Char_ID'].astype(int)
df_emotion['Char_ID'] = df_emotion['Char_ID'].astype(int)
df_affect['Char_ID'] = df_affect['Char_ID'].astype(int)
df_loi['Char_ID'] = df_loi['Char_ID'].astype(int)

#Safe new data frames
df_ar_val_char = df_ar_val.merge(char_data, how = 'left', left_on='Char_ID', right_on='ID')
df_emotion_char = df_emotion.merge(char_data, how = 'left', left_on='Char_ID', right_on= 'ID')
df_affect_char = df_affect.merge(char_data, how = 'left', left_on='Char_ID', right_on= 'ID')
df_loi_char = df_loi.merge(char_data, how = 'left', left_on='Char_ID', right_on= 'ID')

#Now, we only want to have data containing information about the answers
#For that we need to extract from the filename column, whether the file was part of an answer
#a = answer, p = presentation, q = question
#sentence_type should be the same for all tables, but just to be sure
arval_sentence_type = df_ar_val_char.Filename.str.replace('\d+','').str[3:-4]
df_ar_val_char['SentenceType'] = arval_sentence_type
emo_sentence_type = df_emotion_char.Filename.str.replace('\d+','').str[3:-4]
df_emotion_char['SentenceType'] = emo_sentence_type
aff_sentence_type = df_affect_char.Filename.str.replace('\d+','').str[3:-4]
df_affect_char['SentenceType'] = aff_sentence_type
loi_sentence_type = df_loi_char.Filename.str.replace('\d+','').str[3:-4]
df_loi_char['SentenceType'] = loi_sentence_type

#Now select only those who have SentenceType == 'a'
df_ar_val_char = df_ar_val_char.loc[df_ar_val_char['SentenceType'] == 'a']
df_emotion_char = df_emotion_char.loc[df_emotion_char['SentenceType'] == 'a']
df_affect_char = df_affect_char.loc[df_affect_char['SentenceType'] == 'a']
df_loi_char = df_loi_char.loc[df_loi_char['SentenceType'] == 'a']

affect_label.remove('Intoxicated')
df_affect_char = df_affect_char.drop(['Intoxicated'], axis = 1)
norm_test = pp.normalize(df_affect_char[affect_label], norm = 'l1')
df_affect_char[affect_label] = norm_test

df_loi_char['Normal Interest'] = df_loi_char['Disinterest'] + df_loi_char['Normal']
df_loi_char = df_loi_char.drop(['Disinterest', 'Normal'], axis = 1)

## Chi-squared Test of Independence
We Start with characteristic sex. The null hypothesis states that the two categorical variables sex and e.g. emotion are independent.

Since we have float data and chi² needs integer data, such as observation counts, we have to convert our data. To illustrate how this is done, we'll look at a specific emotion, 'Anger'. We need to make sure that in our observation count, we do not have any cells with a value of less than 5, since this yields errors and may falsify the result. So we calculate the quantiles of our emotion 'Anger', which yields us three thresholds to compare the float data. This way, we can count how many samples were in the 1st, 2nd, 3rd or 4th Quantile. We want to compare two (or more) groups, so we compare only the female values and sort them into quartiles, then for male values. This yields a 2x4 table. An example table is printed below. This table is used to calculate the chi2 statistic. Note that the function 'calcFrequencyTable' takes in a pd.DataFrame, not a pd.Series and returns an array of pd.DataFrames. This means, that the function calculates these tables for all different emotions defined in e.g. emotion_label.

In [4]:
#Example Frequency Table for the emotion 'Anger':
#Since the function does the table calculation for all different emotions, we only want to select the first table
#which holds the table for 'anger' (since it's the first element, see declaration of emotion_label at the start)
anger_table = hp.calcFrequencyTable(df_emotion_char, emotion_label, 'Sex')[0]
anger_table

Unnamed: 0,1st Quartile,2nd Quartile,3rd Quartile,4th Quartile
Male,27,28,24,21
Female,22,20,24,27


In [5]:
print('EMOTION\n')
emo_sex_chi2 = hp.chi2(df_emotion_char, emotion_label,'Sex',  True)
print('\nAFFECT\n')
aff_sec_chi2 = hp.chi2(df_affect_char, affect_label,'Sex',  True)
print('\nAROUSAL-VALENCE\n')
ar_val_sec_chi2 = hp.chi2(df_ar_val_char, ['Arousal', 'Valence'], 'Sex', True)
print('\nLEVEL OF INTEREST\n')
loi_sec_chi2 = hp.chi2(df_loi_char, ['Normal Interest', 'High Interest'], 'Sex', True)
#Have a look at residuals for significant results
print('\nResiduals of: '+ affect_label[3])
aff_sec_chi2[1][3]

EMOTION

Chi square of Anger : 2.3427332034964516 with p-value of: 0.5043838259554803
Chi square of Boredom : 14.329590008046232 with p-value of: 0.002489182209427164
Chi square of Disgust : 50.95429938373198 with p-value of: 5.002998277346449e-11
Chi square of Fear : 11.826296997293543 with p-value of: 0.008002510240578584
Chi square of Happiness : 7.172215501792116 with p-value of: 0.06660661048725251
Chi square of Emo_Neutral : 12.82761420159462 with p-value of: 0.0050246252659379475
Chi square of Sadness : 27.494867740106802 with p-value of: 4.635997808400464e-06

AFFECT

Chi square of Aggressiv : 33.569184862116884 with p-value of: 2.4426308550526904e-07
Chi square of Cheerful : 19.24081248628483 with p-value of: 0.0002437759363511484
Chi square of Nervous : 0.9139829749103936 with p-value of: 0.8220521512810797
Chi square of Aff_Neutral : 7.514502505303194 with p-value of: 0.05718698914632772
Chi square of Tired : 15.107824637919686 with p-value of: 0.0017267856144965957

AROUSAL

Unnamed: 0,1st Quartile,2nd Quartile,3rd Quartile,4th Quartile
Male,2.188309,0.709699,-0.956626,-1.956422
Female,-2.188309,-0.709699,0.956626,1.956422


If we have a look at the p-values regarding the different emotions, we can see significant differences in boredom, disgust, fear, neutral and sadness and can reject our hypothesis.
To further investigate where the differences are, we'll have a look at the standardized residuals. 
Regarding boredom we see that the main differences between females and males lie between the 1st and 2nd quartile: Females have a value of ~3.11 in the 1st quartile, while males have a value of ~3.04 in the 2nd quartile. Regarding disgust we see that females tend to have values above the median (positive values in the 3rd and 4th Quartile) whereas males tend to have values below the median (positive values in 1st and 2nd Quartile). Looking at fear we see that females tend to have lower values (~3.11 in 1st Quartile) while males have a value of ~1.37 for the 2nd Quartile and ~2.04 for the 3rd Quartile, the 4th quartile does not show great differences. This means, that men tend to have more values around the median than females. Looking at neutral (emotion) we can see that males and females only differ between the 1st and 2nd quartile (female 1st quartile ~3.1; male 2nd quartile ~2.7). Regarding sadness, we see that females and males differ in the extremes: females have a value of ~4.10 for the 1st quartile whereas males have a value of ~4.04 for the 4th quartile. This implies that females tend to have lower values for sadness than males.

Looking at the affect p-values, we also see statistical significance in aggressive, cheerful, intoxicated and tired; meaning the two populations are significantly different from each other and therefore again rejecting our hypotheses.
Further looking at the residuals, we are able to see where the differences are. Regarding aggressive, we see that males tend to have lower values (1st quartile ~4.17 for males) than females (4th quartile ~4.62 for females). Looking at cheerful, we see that females tend to have lower values than males, since the value for the 4th quartile for males is ~4.04, whereas the values for females for the 2nd and 3rd quartile are ~2.96 and ~1.62. Regarding intoxication we are able to see that females tend to have higher values (3rd quartile and 4th quartile positive), whereas males have lower intoxication values (1st quartile ~4.84). Regarding tiredness, we see the most differences in the 1st and 4th quartile: males have higher values for tiredness (4th quartile ~3.71) than females (1st quartile ~3.44).

Also for Arousal-Valence, we can say that the populations differ in arousal significantly. Looking at the residuals we see that females tend to have higher values than males (only 1st quartile is positive for males ~5.50).

Regarding Level of Interest, we only see a statistic significant difference in disinterest. Regarding the residuals we see that females have lower values (1st and 2nd Quartile positive) for disinterest than males.

So now we know, that females and males differ significantly regarding the distribution into the quantiles.
Now move on to academic status, the hypothesis being that the variables academic status and e.g. emotion are independent.

In [6]:
print('EMOTION\n')
emo_aca_chi2 = hp.chi2(df_emotion_char, emotion_label,'Academic' , True)
print('\nAFFECT\n')
aff_aca_chi2 = hp.chi2(df_affect_char, affect_label,'Academic', True)
print('\nAROUSAL-VALENCE\n')
ar_val_aca_chi2 = hp.chi2(df_ar_val_char, ['Arousal', 'Valence'],  'Academic',True)
print('\nLEVEL OF INTEREST\n')
loi_aca_chi2 = hp.chi2(df_loi_char, ['Normal Interest', 'High Interest'],'Academic', True)
ar_val_aca_chi2[1][0]

EMOTION

Chi square of Anger : 1.7308569566634084 with p-value of: 0.630095518089683
Chi square of Boredom : 3.565982404692082 with p-value of: 0.31230471520765724
Chi square of Disgust : 1.2303681981101335 with p-value of: 0.7457303545737766
Chi square of Fear : 3.565982404692082 with p-value of: 0.31230471520765724
Chi square of Happiness : 1.5640273704789833 with p-value of: 0.6675735927389963
Chi square of Emo_Neutral : 2.898664059954383 with p-value of: 0.40751451037583464
Chi square of Sadness : 2.0964622200037435 with p-value of: 0.5526288118443592

AFFECT

Chi square of Aggressiv : 1.7308569566634082 with p-value of: 0.630095518089683
Chi square of Cheerful : 4.566959921798631 with p-value of: 0.20639481596610365
Chi square of Nervous : 1.2303681981101335 with p-value of: 0.7457303545737766
Chi square of Aff_Neutral : 2.5650048875855327 with p-value of: 0.4636576974639637
Chi square of Tired : 3.1906339458060216 with p-value of: 0.3631566939975995

AROUSAL-VALENCE

Chi square o

Unnamed: 0,1st Quartile,2nd Quartile,3rd Quartile,4th Quartile
Grad Student,-0.087982,-2.584596,0.928866,1.750855
PhD,0.087982,2.584596,-0.928866,-1.750855


Looking at emotion, we see that Grad Students and PhDs do not differ significantly.

The same thing goes for affect, we can't see any significant resutls.

Looking at arousal valence, we see that Grad Students and PhDs differ in arousal. By looking at the residuals we see that PhDs have lower values (2nd Quartile ~2.58) than Grad Students (4th Quartile ~1.76).

Looking at Level of Interest, we can see that GradStudents and PhDs do not differ.

So, PhDs and Grad Students only differ in arousal.

Again, we do not know yet where exactly those differences are.


Now let's look at Native Speaker

In [7]:
print('EMOTION\n')
emo_age_chi2 = hp.chi2(df_emotion_char, emotion_label,'IsNativeSpeaker', True)
print('\nAFFECT\n')
aff_age_chi2 = hp.chi2(df_affect_char, affect_label, 'IsNativeSpeaker', True)
print('\nAROUSAL-VALENCE\n')
ar_val_age_chi2 = hp.chi2(df_ar_val_char, ['Arousal', 'Valence'],'IsNativeSpeaker' ,True)
print('\nLEVEL OF INTEREST\n')
loi_age_chi2 = hp.chi2(df_loi_char, ['Normal Interest', 'High Interest'], 'IsNativeSpeaker',  True)

EMOTION

Chi square of Anger : 11.26940272372267 with p-value of: 0.08039863374095739
Chi square of Boredom : 6.394149593917929 with p-value of: 0.3805145730152919
Chi square of Disgust : 7.990760722774417 with p-value of: 0.23878098066550357
Chi square of Fear : 4.328970930399322 with p-value of: 0.6322513003326771
Chi square of Happiness : 7.7809837005172415 with p-value of: 0.25459217703899356
Chi square of Emo_Neutral : 5.322094237370444 with p-value of: 0.5032142212041371
Chi square of Sadness : 13.739048646339313 with p-value of: 0.03269092269960082

AFFECT

Chi square of Aggressiv : 8.407671375324114 with p-value of: 0.2097311846223672
Chi square of Cheerful : 4.332230360790049 with p-value of: 0.6318130320940132
Chi square of Nervous : 5.425185426139566 with p-value of: 0.49054478084272624
Chi square of Aff_Neutral : 2.253536219749351 with p-value of: 0.8949671352410078
Chi square of Tired : 8.134986241088537 with p-value of: 0.22837978186136848

AROUSAL-VALENCE

Chi square of 

## Post-Hoc tests for age and native speaker, as they have three different groups

If a significant p-value for the category 'NativeSpeaker' is found, we do not yet know which groups differ significantly from each other, so post-hoc testing is done for this character feature.

In [8]:
print('EMOTION\n')
print('post-hoc emotions and different groups')
emo_reject_list, emo_corrected_p_vals, emo_combinations, emo_residuals= hp.chi2_post_hoc(df_emotion_char,emotion_label, 'IsNativeSpeaker', 'bonferroni', True, True)
print('\nAFFECT\n')
print('\n post-hoc affect and different groups')
aff_reject_list, emo_corrected_p_vals, emo_combinations, aff_residuals = hp.chi2_post_hoc(df_affect_char, affect_label, 'IsNativeSpeaker' ,'bonferroni', True, True)
print('\nAROUSAL-VALENCE\n')
print('\n post-hoc arousal-valence and different groups')
ar_val_reject_list, ar_val_corrected_p_vals, ar_val_combinations, ar_val_residuals = hp.chi2_post_hoc(df_ar_val_char, ['Arousal', 'Valence'], 'IsNativeSpeaker', 'bonferroni',True, True)
print('\nLEVEL OF INTEREST\n')
print('\n post-hoc level of intereset and different groups')
loi_reject_list, loi_corrected_p_vals, loi_combinations, loi_residuals = hp.chi2_post_hoc(df_loi_char, ['Normal Interest', 'High Interest'], 'IsNativeSpeaker', 'bonferroni', True, True)

EMOTION

post-hoc emotions and different groups
Anger
Combinations: [('Asian Non-Native', 'Europ. Non-Native'), ('Asian Non-Native', 'Native Speaker'), ('Europ. Non-Native', 'Native Speaker')]
Reject List: [False False False]
Corrected p-values: [0.27190604 0.98106031 1.        ]
Boredom
Combinations: [('Asian Non-Native', 'Europ. Non-Native'), ('Asian Non-Native', 'Native Speaker'), ('Europ. Non-Native', 'Native Speaker')]
Reject List: [False False False]
Corrected p-values: [1.         0.65289629 1.        ]
Disgust
Combinations: [('Asian Non-Native', 'Europ. Non-Native'), ('Asian Non-Native', 'Native Speaker'), ('Europ. Non-Native', 'Native Speaker')]
Reject List: [False False False]
Corrected p-values: [0.63398621 1.         1.        ]
Fear
Combinations: [('Asian Non-Native', 'Europ. Non-Native'), ('Asian Non-Native', 'Native Speaker'), ('Europ. Non-Native', 'Native Speaker')]
Reject List: [False False False]
Corrected p-values: [1. 1. 1.]
Happiness
Combinations: [('Asian Non-Nati