# Discrete emotions - ZuCo Analysis

With this analysis, we want to investigate the linear correlation between discrete emotions values and the values from the ZuCo dataset such as:

-Mean Pupil Size 

-Total Reading Time 

-Gaze Duration 

-First Fixation Duration


Since only a few emotions are usually associated with a single sentence, most value fields for each phrase are associated with zeros. For this reason, to better enhance the bonds between discrete emotions and VAD values we decided to associate each phrase only with the emotion with the highest score by considering the emotions label as a categorical variable associated with a score value.

In [2]:
import pandas as pd
import statsmodels.api as sm

# some_file.py
import sys
# insert at 1, 0 is the script path (or '' in REPL)
sys.path.insert(1, '../')
import BackwardElimination as be

## Importing the dataset

In [3]:
dataset = pd.read_csv(r'../Lexicons/Emotion_Sentences_Cross_Analysis.csv')
dataset = dataset.iloc[:,3:]

## Handling missing data

In [4]:
dataset = dataset[dataset.MPS.notnull()]


## Handling categorical data

In [5]:
X = dataset[['Anger','Anticipation','Disgust','Fear','Joy','Sadness','Trust']]
categorical_structure = {'Score': [], 'Anger': [], 'Anticipation':[], 'Disgust':[], 'Fear':[], 'Joy':[], 'Sadness':[], 'Trust':[]}
df = pd.DataFrame(categorical_structure, columns = ['Score', 'Anger', 'Anticipation', 'Disgust', 'Fear', 'Joy', 'Sadness', 'Trust'])
for i, row in X.iterrows():
    maxValue = 0
    category = ''
    struct = {'Score': 0, 'Anger': 0, 'Anticipation': 0, 'Disgust': 0, 'Fear': 0, 'Joy': 0, 'Sadness': 0, 'Trust':0}
    for column, value in row.iteritems(): 
        if value > maxValue:
            maxValue = value
            category = column
    if maxValue > 0:
        struct[category] = 1
        struct['Score'] = maxValue
        df.loc[i] = struct   
    else:
        df.loc[i] = struct
X = df
print(X)

     Score  Anger  Anticipation  Disgust  Fear  Joy  Sadness  Trust
0    0.953    0.0           0.0      0.0   0.0  0.0      1.0    0.0
1    1.469    0.0           0.0      0.0   0.0  0.0      1.0    0.0
2    1.530    0.0           0.0      0.0   0.0  0.0      0.0    1.0
4    0.765    0.0           0.0      0.0   0.0  1.0      0.0    0.0
5    0.765    0.0           0.0      0.0   0.0  1.0      0.0    0.0
..     ...    ...           ...      ...   ...  ...      ...    ...
384  0.984    0.0           1.0      0.0   0.0  0.0      0.0    0.0
386  2.140    0.0           0.0      0.0   0.0  0.0      0.0    1.0
387  0.942    0.0           0.0      0.0   1.0  0.0      0.0    0.0
388  0.912    0.0           0.0      0.0   0.0  1.0      0.0    0.0
389  1.828    0.0           0.0      0.0   0.0  0.0      0.0    1.0

[322 rows x 8 columns]


In [6]:
y_MPS = dataset.iloc[:, -4].values
y_TRT = dataset.iloc[:, -3].values
y_GD = dataset.iloc[:, -2].values
y_FFD = dataset.iloc[:, -1].values

X = sm.add_constant(X)

## Model building

### Discrete emotions - Mean Pupil Size

In [7]:
#model_MPS = sm.OLS(y_MPS, X).fit()
model_MPS = be.backWardEliminationMLR(X,y_MPS)
model_MPS.summary()

const           0.000000
Score           0.014080
Anger           0.358006
Anticipation    0.194719
Disgust         0.074971
Fear            0.132502
Joy             0.138589
Sadness         0.004923
Trust           0.103014
dtype: float64
 
const           0.000000
Score           0.019031
Anticipation    0.263768
Disgust         0.092539
Fear            0.167743
Joy             0.194948
Sadness         0.007194
Trust           0.153670
dtype: float64
 
const      0.000000
Score      0.036161
Disgust    0.145222
Fear       0.275305
Joy        0.393104
Sadness    0.014318
Trust      0.343305
dtype: float64
 
const      0.000000
Score      0.053992
Disgust    0.183029
Fear       0.352928
Sadness    0.021314
Trust      0.557053
dtype: float64
 
const      0.000000
Score      0.063318
Disgust    0.209424
Fear       0.406501
Sadness    0.025568
dtype: float64
 
const      0.000000
Score      0.057850
Disgust    0.217690
Sadness    0.027826
dtype: float64
 
const      0.000000
Score      0.

0,1,2,3
Dep. Variable:,y,R-squared:,0.012
Model:,OLS,Adj. R-squared:,0.009
Method:,Least Squares,F-statistic:,3.916
Date:,"Thu, 12 Nov 2020",Prob (F-statistic):,0.0487
Time:,16:25:53,Log-Likelihood:,-1897.3
No. Observations:,322,AIC:,3799.0
Df Residuals:,320,BIC:,3806.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,3307.6707,5.042,655.972,0.000,3297.750,3317.591
Sadness,42.2022,21.327,1.979,0.049,0.243,84.161

0,1,2,3
Omnibus:,0.295,Durbin-Watson:,1.057
Prob(Omnibus):,0.863,Jarque-Bera (JB):,0.17
Skew:,0.048,Prob(JB):,0.919
Kurtosis:,3.057,Cond. No.,4.37


### Discrete emotions - Total Reading Time

In [8]:
#model_TRT = sm.OLS(y_TRT, X).fit()
model_TRT = be.backWardEliminationMLR(X,y_TRT)
model_TRT.summary()

const           5.197006e-44
Score           1.060497e-13
Anger           2.630875e-02
Anticipation    1.820093e-01
Disgust         5.134209e-01
Fear            1.721780e-01
Joy             9.726974e-02
Sadness         1.280460e-01
Trust           1.208267e-01
dtype: float64
 
const           2.959603e-48
Score           1.243555e-14
Anger           3.071341e-02
Anticipation    2.271819e-01
Fear            1.324419e-01
Joy             5.458568e-02
Sadness         8.636468e-02
Trust           5.982200e-02
dtype: float64
 
const      1.230786e-65
Score      3.395404e-17
Anger      5.051014e-02
Fear       5.135654e-02
Joy        4.207759e-03
Sadness    1.894614e-02
Trust      1.445259e-03
dtype: float64
 
const      1.168243e-66
Score      1.361204e-16
Anger      3.487117e-02
Joy        1.245349e-02
Sadness    3.924785e-02
Trust      5.908939e-03
dtype: float64
 


0,1,2,3
Dep. Variable:,y,R-squared:,0.215
Model:,OLS,Adj. R-squared:,0.202
Method:,Least Squares,F-statistic:,17.28
Date:,"Thu, 12 Nov 2020",Prob (F-statistic):,4.03e-15
Time:,16:25:53,Log-Likelihood:,-2688.0
No. Observations:,322,AIC:,5388.0
Df Residuals:,316,BIC:,5411.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,2370.5539,106.528,22.253,0.000,2160.959,2580.148
Score,1009.1625,115.417,8.744,0.000,782.080,1236.245
Anger,1001.4030,472.585,2.119,0.035,71.592,1931.214
Joy,-473.5416,188.405,-2.513,0.012,-844.228,-102.856
Sadness,-564.4215,272.644,-2.070,0.039,-1100.849,-27.994
Trust,-418.8669,151.129,-2.772,0.006,-716.212,-121.521

0,1,2,3
Omnibus:,35.137,Durbin-Watson:,1.716
Prob(Omnibus):,0.0,Jarque-Bera (JB):,43.339
Skew:,0.849,Prob(JB):,3.88e-10
Kurtosis:,3.592,Cond. No.,11.6


### Discrete emotions - Gaze Duration

In [9]:
#model_GD = sm.OLS(y_GD, X).fit()
model_GD = be.backWardEliminationMLR(X,y_GD)
model_GD.summary()

const           6.935651e-43
Score           1.220703e-15
Anger           1.787976e-02
Anticipation    2.177907e-01
Disgust         7.650132e-01
Fear            1.607421e-01
Joy             1.000480e-01
Sadness         8.603127e-02
Trust           8.296540e-02
dtype: float64
 
const           1.193837e-46
Score           1.901986e-16
Anger           1.862610e-02
Anticipation    2.310096e-01
Fear            1.388456e-01
Joy             7.109362e-02
Sadness         6.666582e-02
Trust           5.112575e-02
dtype: float64
 
const      1.046505e-63
Score      3.276767e-19
Anger      3.121372e-02
Fear       5.485680e-02
Joy        6.450246e-03
Sadness    1.349203e-02
Trust      1.107743e-03
dtype: float64
 
const      1.052364e-64
Score      1.330207e-18
Anger      2.110263e-02
Joy        1.807168e-02
Sadness    2.840073e-02
Trust      4.502185e-03
dtype: float64
 


0,1,2,3
Dep. Variable:,y,R-squared:,0.24
Model:,OLS,Adj. R-squared:,0.228
Method:,Least Squares,F-statistic:,19.92
Date:,"Thu, 12 Nov 2020",Prob (F-statistic):,2.9e-17
Time:,16:25:53,Log-Likelihood:,-2609.4
No. Observations:,322,AIC:,5231.0
Df Residuals:,316,BIC:,5254.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1814.1242,83.462,21.736,0.000,1649.913,1978.336
Score,848.1656,90.426,9.380,0.000,670.253,1026.078
Anger,858.1543,370.257,2.318,0.021,129.674,1586.635
Joy,-350.7989,147.610,-2.377,0.018,-641.221,-60.377
Sadness,-470.3269,213.609,-2.202,0.028,-890.602,-50.052
Trust,-338.7763,118.405,-2.861,0.005,-571.738,-105.815

0,1,2,3
Omnibus:,40.421,Durbin-Watson:,1.813
Prob(Omnibus):,0.0,Jarque-Bera (JB):,52.856
Skew:,0.889,Prob(JB):,3.33e-12
Kurtosis:,3.881,Cond. No.,11.6


### Discrete emotions - First Fixation Duration

In [10]:
#model_FFD = sm.OLS(y_FFD, X).fit()
model_FFD = be.backWardEliminationMLR(X,y_FFD)
model_FFD.summary()

const           4.668699e-43
Score           1.121621e-16
Anger           2.237150e-02
Anticipation    2.895264e-01
Disgust         7.576530e-01
Fear            1.680459e-01
Joy             8.705369e-02
Sadness         8.387613e-02
Trust           6.749537e-02
dtype: float64
 
const           7.642367e-47
Score           1.555927e-17
Anger           2.339865e-02
Anticipation    3.099159e-01
Fear            1.450235e-01
Joy             6.050777e-02
Sadness         6.454900e-02
Trust           3.994493e-02
dtype: float64
 
const      1.629698e-63
Score      3.090657e-20
Anger      3.525389e-02
Fear       6.563292e-02
Joy        6.783832e-03
Sadness    1.581320e-02
Trust      1.112419e-03
dtype: float64
 
const      1.243014e-64
Score      1.160705e-19
Anger      2.431948e-02
Joy        1.801829e-02
Sadness    3.178114e-02
Trust      4.196856e-03
dtype: float64
 


0,1,2,3
Dep. Variable:,y,R-squared:,0.251
Model:,OLS,Adj. R-squared:,0.239
Method:,Least Squares,F-statistic:,21.22
Date:,"Thu, 12 Nov 2020",Prob (F-statistic):,2.68e-18
Time:,16:25:53,Log-Likelihood:,-2575.9
No. Observations:,322,AIC:,5164.0
Df Residuals:,316,BIC:,5186.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1633.2365,75.206,21.717,0.000,1485.269,1781.204
Score,790.8084,81.481,9.705,0.000,630.495,951.122
Anger,754.9779,333.631,2.263,0.024,98.559,1411.397
Joy,-316.2457,133.008,-2.378,0.018,-577.939,-54.552
Sadness,-415.1232,192.479,-2.157,0.032,-793.825,-36.422
Trust,-307.6947,106.692,-2.884,0.004,-517.612,-97.778

0,1,2,3
Omnibus:,42.162,Durbin-Watson:,1.83
Prob(Omnibus):,0.0,Jarque-Bera (JB):,56.35
Skew:,0.904,Prob(JB):,5.81e-13
Kurtosis:,3.967,Cond. No.,11.6
