# CHI Square Test
To further look into differences in our data, we conduct multiple chi² tests to see if there are any significant differences between females/males and grad students/PhDs regarding different emotion, affect, level of interest and arousal valence attributes.

## Import relevant libraries

In [1]:
import numpy as np
import pandas as pd
from os import listdir
import matplotlib.pyplot as plt
import itertools as it
from statsmodels.sandbox.stats.multicomp import multipletests
import statsmodels.api as sm
#import nltk
import scipy.stats as st
import statsmodels.formula.api as smf
import seaborn as sns
import Helper as hp

## Load .csv data with results of OpenSMILE Analysis
First we load .csv data and clean it (removing of NaNs), then we store information of all files in seperate panda dataframes containing information about affect, emotion and valence/arousal for all participants.

In [2]:
data = pd.read_csv("CHI_2019_FULL.csv")

#Set Labels 
emotion_label = ['Anger', 'Boredom', 'Disgust', 'Fear', 'Happiness', 'Emo_Neutral', 'Sadness']
affect_label = ['Aggressiv', 'Cheerful', 'Intoxicated', 'Nervous', 'Aff_Neutral', 'Tired']
loi_label = ['Disinterest', 'Normal', 'High Interest']

#Get specific data and save it into new data frames
# We use the pandas .copy(deep=True) function to prevent the SettingWithCopyWarning we would otherwise get. Since we do
# not write, but only read from the data, the warning does not affect the data frames
df_emotion = data[['Anger', 'Boredom', 'Disgust', 'Fear', 'Happiness', 'Emo_Neutral', 'Sadness', 'Filename']].copy(deep=True)
df_affect = data[['Aggressiv', 'Cheerful', 'Intoxicated', 'Nervous', 'Aff_Neutral', 'Tired', 'Filename']].copy(deep=True)
df_loi = data[['Disinterest', 'Normal', 'High Interest', 'Filename']].copy(deep=True)
df_ar_val = data[['Arousal', 'Valence', 'Filename']].copy(deep=True)
#For further usage, we want to append the CharacterID as a column, which is saved with other information in the filename
#Since we only want the digits, we can remove all non-digit characters of the filename column and append the column to the df

df_emotion['Char_ID'] = df_emotion['Filename'].replace('\D+','', regex = True).copy(deep=True)
df_affect['Char_ID'] = df_affect['Filename'].replace('\D+','', regex = True).copy(deep=True)
df_loi['Char_ID'] = df_loi['Filename'].replace('\D+','', regex = True).copy(deep=True)
df_ar_val['Char_ID'] = df_ar_val['Filename'].replace('\D+','', regex = True).copy(deep=True)

## Let's load information about the speakers
The speaker ID is saved in a single .csv file containing four important columns: ID, Age, Sex and Acadedmic Status. Since before loaded OpenSMILE csv files are named using the corresponding index (ex. speaker with id 0 has two files 0_a.csv and 0_b.csv), so that a link can be created

In [3]:
char_data = pd.read_csv("CHI_2019_CharacterData.csv")  

#Join above tables and Character Tables

#To Join DataFrames we have to cast the column on which we want to join to int, so that both columns have the same data type
char_data['ID'] = char_data['ID'].astype(int)
df_ar_val['Char_ID'] = df_ar_val['Char_ID'].astype(int)
df_emotion['Char_ID'] = df_emotion['Char_ID'].astype(int)
df_affect['Char_ID'] = df_affect['Char_ID'].astype(int)
df_loi['Char_ID'] = df_loi['Char_ID'].astype(int)

#Safe new data frames
df_ar_val_char = df_ar_val.merge(char_data, how = 'left', left_on='Char_ID', right_on='ID')
df_emotion_char = df_emotion.merge(char_data, how = 'left', left_on='Char_ID', right_on= 'ID')
df_affect_char = df_affect.merge(char_data, how = 'left', left_on='Char_ID', right_on= 'ID')
df_loi_char = df_loi.merge(char_data, how = 'left', left_on='Char_ID', right_on= 'ID')

## Chi-squared Test of Independence
We Start with characteristic sex. The null hypothesis states that the two categorical variables sex and e.g. emotion are independent.

Since we have float data and chi² needs integer data, such as observation counts, we have to convert our data. To illustrate how this is done, we'll look at a specific emotion, 'Anger'. We need to make sure that in our observation count, we do not have any cells with a value of less than 5, since this yields errors and may falsify the result. So we calculate the quantiles of our emotion 'Anger', which yields us three thresholds to compare the float data. This way, we can count how many samples were in the 1st, 2nd, 3rd or 4th Quantile. We want to compare two (or more) groups, so we compare only the female values and sort them into quartiles, then for male values. This yields a 2x4 table. An example table is printed below. This table is used to calculate the chi2 statistic. Note that the function 'calcFrequencyTable' takes in a pd.DataFrame, not a pd.Series and returns an array of pd.DataFrames. This means, that the function calculates these tables for all different emotions defined in e.g. emotion_label.

In [4]:
#Example Frequency Table for the emotion 'Anger':
#Since the function does the table calculation for all different emotions, we only want to select the first table
#which holds the table for 'anger' (since it's the first element, see declaration of emotion_label at the start)
anger_table = hp.calcFrequencyTable(df_emotion_char, emotion_label, 'Sex')[0]
anger_table

Unnamed: 0,1st Quartile,2nd Quartile,3rd Quartile,4th Quartile
Male,95,95,84,72
Female,64,64,74,87


In [5]:
print('EMOTION\n')
emo_sex_chi2 = hp.chi2(df_emotion_char, emotion_label,'Sex',  True)
print('\nAFFECT\n')
aff_sec_chi2 = hp.chi2(df_affect_char, affect_label,'Sex',  True)
print('\nAROUSAL-VALENCE\n')
ar_val_sec_chi2 = hp.chi2(df_ar_val_char, ['Arousal', 'Valence'], 'Sex', True)
print('\nLEVEL OF INTEREST\n')
loi_sec_chi2 = hp.chi2(df_loi_char, loi_label, 'Sex', True)

EMOTION

Chi square of Anger : 9.092786065531897 with p-value of: 0.02808234627053869
Chi square of Boredom : 12.239709267119355 with p-value of: 0.00660554330757788
Chi square of Disgust : 109.6988212832297 with p-value of: 1.2739455656408401e-23
Chi square of Fear : 18.894717476138283 with p-value of: 0.0002874502594385523
Chi square of Happiness : 12.202308955669732 with p-value of: 0.00672131054196095
Chi square of Emo_Neutral : 13.457065327350893 with p-value of: 0.003745557064868627
Chi square of Sadness : 90.38613513570625 with p-value of: 1.8097503676619159e-19

AFFECT

Chi square of Aggressiv : 68.62268523190446 with p-value of: 8.416722589251198e-15
Chi square of Cheerful : 20.553654037648435 with p-value of: 0.0001303114781882345
Chi square of Intoxicated : 47.144151863824035 with p-value of: 3.23867113053682e-10
Chi square of Nervous : 13.261074854046214 with p-value of: 0.004104722638686567
Chi square of Aff_Neutral : 8.32808613529152 with p-value of: 0.03969615199887199
C

If we have a look at the p-values regarding the different emotions, we can see that all 7 p-values tell there is a significant difference between the two populations and therefore our hypotheses can be rejected.

Looking at the affect p-values, we also see statistical significance in each of the affects, meaning the two populations are significantly different from each other and therefore again rejecting our hypotheses.

Also for Arousal-Valence, we can say that the populations do not differ and our hypotheses can be rejected.
We can say the same for level of interest.

So now we know, that females and males differ significantly regarding the distribution into the quantiles. It would be nice to see where exactly, so in which cells they differ.




Now move on to academic status, the hypothesis being that the variables academic status and e.g. emotion are independent.

In [6]:
print('EMOTION\n')
emo_aca_chi2 = hp.chi2(df_emotion_char, emotion_label,'Academic' , True)
print('\nAFFECT\n')
aff_aca_chi2 = hp.chi2(df_affect_char, affect_label,'Academic', True)
print('\nAROUSAL-VALENCE\n')
ar_val_aca_chi2 = hp.chi2(df_ar_val_char, ['Arousal', 'Valence'],  'Academic',True)
print('\nLEVEL OF INTEREST\n')
loi_aca_chi2 = hp.chi2(df_loi_char, loi_label,'Academic', True)

EMOTION

Chi square of Anger : 3.8437655772062964 with p-value of: 0.27883633698186394
Chi square of Boredom : 8.701162554811255 with p-value of: 0.03353961684760745
Chi square of Disgust : 1.6669951390651905 with p-value of: 0.6442962853849514
Chi square of Fear : 1.9683141508362738 with p-value of: 0.5790091984938832
Chi square of Happiness : 0.5499244103750169 with p-value of: 0.9077940355757476
Chi square of Emo_Neutral : 12.634185504434129 with p-value of: 0.005498345963915411
Chi square of Sadness : 2.0776251649903594 with p-value of: 0.5564525906473599

AFFECT

Chi square of Aggressiv : 2.537087501098846 with p-value of: 0.4686258115855658
Chi square of Cheerful : 1.8374600939752055 with p-value of: 0.6068173238559944
Chi square of Intoxicated : 7.236822301177585 with p-value of: 0.0647205194112024
Chi square of Nervous : 1.9417457362820423 with p-value of: 0.5845851905251991
Chi square of Aff_Neutral : 8.162190401848669 with p-value of: 0.042775988731364435
Chi square of Tired 

Looking at emotion, we see that Grad Students and PhDs only differ significantly in the attributes 'Boredom' and 'Emo_Neutral', in the other emotions, they do not differ.

If we take a look at affect, we only see statistical significance regarding neutral affect, just as before, they do not differ in the other affects.

Looking ar Arousal-Valence, we see that they do not differ significantly in arousal, but in valence they differ significantly. This is interesting, because one could assume that because of values of valence having a smaller range than arousal, the difference would occur in arousal.

Looking at Level of Interest, we can see that GradStudents and PhDs only differ significantly in Disinterest.

Again, we do not know yet where exactly those differences are.


Now let's look at Native Speaker

In [7]:
print('EMOTION\n')
emo_age_chi2 = hp.chi2(df_emotion_char, emotion_label,'IsNativeSpeaker', True)
print('\nAFFECT\n')
aff_age_chi2 = hp.chi2(df_affect_char, affect_label, 'IsNativeSpeaker', True)
print('\nAROUSAL-VALENCE\n')
ar_val_age_chi2 = hp.chi2(df_ar_val_char, ['Arousal', 'Valence'],'IsNativeSpeaker' ,True)
print('\nLEVEL OF INTEREST\n')
loi_age_chi2 = hp.chi2(df_loi_char, loi_label, 'IsNativeSpeaker',  True)

EMOTION

Chi square of Anger : 11.102065448090253 with p-value of: 0.08527273528883235
Chi square of Boredom : 4.62194958228796 with p-value of: 0.5931305839658287
Chi square of Disgust : 7.662786467044468 with p-value of: 0.2638637657684275
Chi square of Fear : 2.977315059447067 with p-value of: 0.8116886037220786
Chi square of Happiness : 7.718293348406944 with p-value of: 0.25947681958295843
Chi square of Emo_Neutral : 2.4087261604730092 with p-value of: 0.8785395486879244
Chi square of Sadness : 18.918739148715694 with p-value of: 0.0043030938878381424

AFFECT

Chi square of Aggressiv : 12.976158097668039 with p-value of: 0.04341612434351856
Chi square of Cheerful : 6.761513988035048 with p-value of: 0.34346652078310386
Chi square of Intoxicated : 15.035076357178989 with p-value of: 0.019985647333725716
Chi square of Nervous : 7.246370341827431 with p-value of: 0.29866286547218546
Chi square of Aff_Neutral : 11.165704099942833 with p-value of: 0.08338818204281284
Chi square of Tire

## Post-Hoc tests for age and native speaker, as they have three different groups

If a significant p-value for the category 'Age' is found, we do not yet know which groups differ significantly from each other, so post-hoc testing is done for this character feature.

In [8]:
print('EMOTION\n')
print('post-hoc emotions and different groups')
emo_reject_list, emo_corrected_p_vals, emo_combinations, emo_residuals= hp.chi2_post_hoc(df_emotion_char,emotion_label, 'IsNativeSpeaker', 'bonferroni', True, True)
print('\nAFFECT\n')
print('\n post-hoc affect and different groups')
aff_reject_list, emo_corrected_p_vals, emo_combinations, aff_residuals = hp.chi2_post_hoc(df_affect_char, affect_label, 'IsNativeSpeaker' ,'bonferroni', True, True)
print('\nAROUSAL-VALENCE\n')
print('\n post-hoc arousal-valence and different groups')
ar_val_reject_list, ar_val_corrected_p_vals, ar_val_combinations, ar_val_residuals = hp.chi2_post_hoc(df_ar_val_char, ['Arousal', 'Valence'], 'IsNativeSpeaker', 'bonferroni',True, True)
print('\nLEVEL OF INTEREST\n')
print('\n post-hoc level of intereset and different groups')
loi_reject_list, loi_corrected_p_vals, loi_combinations, loi_residuals = hp.chi2_post_hoc(df_loi_char, loi_label, 'IsNativeSpeaker', 'bonferroni', True, True)

EMOTION

post-hoc emotions and different groups
Anger
Combinations: [('Asian Non-Native', 'Europ. Non-Native'), ('Asian Non-Native', 'Native Speaker'), ('Europ. Non-Native', 'Native Speaker')]
Reject List: [False False False]
Corrected p-values: [0.10972194 0.79674406 1.        ]
Boredom
Combinations: [('Asian Non-Native', 'Europ. Non-Native'), ('Asian Non-Native', 'Native Speaker'), ('Europ. Non-Native', 'Native Speaker')]
Reject List: [False False False]
Corrected p-values: [0.89316409 1.         1.        ]
Disgust
Combinations: [('Asian Non-Native', 'Europ. Non-Native'), ('Asian Non-Native', 'Native Speaker'), ('Europ. Non-Native', 'Native Speaker')]
Reject List: [False False False]
Corrected p-values: [0.4063049  0.67907227 1.        ]
Fear
Combinations: [('Asian Non-Native', 'Europ. Non-Native'), ('Asian Non-Native', 'Native Speaker'), ('Europ. Non-Native', 'Native Speaker')]
Reject List: [False False False]
Corrected p-values: [1. 1. 1.]
Happiness
Combinations: [('Asian Non-Nati

## Further Analysis
Now that we know we have significant p-values, we should investigate in which cells the population differs from each other. For this, we can calculate the residuals, which is the difference between the calculated table, and a table cointaining distributed values for which the chi² hypothesis is true.