## ZuCo - Discrete emotion multiple linear regression analysis

With this analysis, we want to search for any sign of the linear correlation between the feature values obtained from the ZuCo dataset applied to the single words that compose the sentences of the mentioned dataset, and the emotion score value obtained from the NRC emotion intensity lexicon.

In [1]:
import pandas as pd
import statsmodels.api as sm
from sklearn.preprocessing import MinMaxScaler

# some_file.py
import sys
# insert at 1, 0 is the script path (or '' in REPL)
sys.path.insert(1, '../')
import BackwardElimination as be

## Import the ZuCo dataset

In [2]:
zuco_ds = pd.read_csv('../Lexicons/ZuCo_words_dataset.csv')

## Normalizing values

In [3]:
scaler = MinMaxScaler()
zuco_ds.iloc[:,1:] = pd.DataFrame(scaler.fit_transform(zuco_ds.iloc[:,1:].values), columns=zuco_ds.columns[1:])

## Get discrete emotion intensity values dataset from NRC lexicon

In [4]:
anger_lex = pd.read_csv('../Lexicons/NRC_Emotion_Intensity_Lexicon/NRC-Emotion-Intensity-anger-scores.csv')
anticipation_lex = pd.read_csv('../Lexicons/NRC_Emotion_Intensity_Lexicon/NRC-Emotion-Intensity-anticipation-scores.csv')
disgust_lex = pd.read_csv('../Lexicons/NRC_Emotion_Intensity_Lexicon/NRC-Emotion-Intensity-disgust-scores.csv')
fear_lex = pd.read_csv('../Lexicons/NRC_Emotion_Intensity_Lexicon/NRC-Emotion-Intensity-fear-scores.csv')
joy_lex = pd.read_csv('../Lexicons/NRC_Emotion_Intensity_Lexicon/NRC-Emotion-Intensity-joy-scores.csv')
sadness_lex = pd.read_csv('../Lexicons/NRC_Emotion_Intensity_Lexicon/NRC-Emotion-Intensity-sadness-scores.csv')
surprise_lex = pd.read_csv('../Lexicons/NRC_Emotion_Intensity_Lexicon/NRC-Emotion-Intensity-surprise-scores.csv')
trust_lex = pd.read_csv('../Lexicons/NRC_Emotion_Intensity_Lexicon/NRC-Emotion-Intensity-trust-scores.csv')

## Intersect each discrete emotion lexicon with the zuco used words dataset

In [5]:
anger_ds = pd.merge(zuco_ds, anger_lex, how ='inner', on =['Word'])
anger_ds = anger_ds.drop(['Word'], axis=1)

anticipation_ds = pd.merge(zuco_ds, anticipation_lex, how ='inner', on =['Word'])
anticipation_ds = anticipation_ds.drop(['Word'], axis=1)

disgust_ds = pd.merge(zuco_ds, disgust_lex, how ='inner', on =['Word'])
disgust_ds = disgust_ds.drop(['Word'], axis=1)

fear_ds = pd.merge(zuco_ds, fear_lex, how ='inner', on =['Word'])
fear_ds = fear_ds.drop(['Word'], axis=1)

joy_ds = pd.merge(zuco_ds, joy_lex, how ='inner', on =['Word'])
joy_ds = joy_ds.drop(['Word'], axis=1)

sadness_ds = pd.merge(zuco_ds, sadness_lex, how ='inner', on =['Word'])
sadness_ds = sadness_ds.drop(['Word'], axis=1)

surprise_ds = pd.merge(zuco_ds, surprise_lex, how ='inner', on =['Word'])
surprise_ds = surprise_ds.drop(['Word'], axis=1)

trust_ds = pd.merge(zuco_ds, trust_lex, how ='inner', on =['Word'])
trust_ds = trust_ds.drop(['Word'], axis=1)

## Build simple linear regression models

In [6]:
#anger_ds = pd.DataFrame(anger_ds, columns=['MPS','TRT','GD','FFD','SCORE'])
#print (anger_ds)
X = anger_ds[['MPS','TRT','GD','FFD']]
y = anger_ds['Score']
X = sm.add_constant(X)
#model = be.backWardEliminationMLR(X,y)
model = sm.OLS(y, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,Score,R-squared:,0.04
Model:,OLS,Adj. R-squared:,-0.056
Method:,Least Squares,F-statistic:,0.4121
Date:,"Sat, 21 Nov 2020",Prob (F-statistic):,0.799
Time:,11:12:25,Log-Likelihood:,-0.8297
No. Observations:,45,AIC:,11.66
Df Residuals:,40,BIC:,20.69
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.3258,0.218,1.492,0.144,-0.116,0.767
MPS,0.0992,0.450,0.220,0.827,-0.810,1.009
TRT,0.6506,0.960,0.677,0.502,-1.291,2.592
GD,1.0696,2.144,0.499,0.621,-3.263,5.402
FFD,-0.7596,0.906,-0.838,0.407,-2.591,1.071

0,1,2,3
Omnibus:,4.715,Durbin-Watson:,1.889
Prob(Omnibus):,0.095,Jarque-Bera (JB):,2.384
Skew:,0.292,Prob(JB):,0.304
Kurtosis:,2.035,Cond. No.,70.3


No correlation has been noticed between ZuCo values and anger score values (SL = 0.05)

In [7]:
X = anticipation_ds[['MPS','TRT','GD','FFD']]
y = anticipation_ds['Score']
X = sm.add_constant(X)

model = be.backWardEliminationMLR(X,y)
model.summary()

const    2.505866e-08
MPS      7.141985e-01
TRT      7.807327e-01
GD       7.992330e-01
FFD      7.584872e-01
dtype: float64
 
const    1.865584e-08
MPS      6.879633e-01
TRT      8.517625e-01
FFD      5.344069e-01
dtype: float64
 
const    1.513918e-08
MPS      7.079841e-01
FFD      5.394793e-01
dtype: float64
 
const    7.937494e-20
FFD      5.358977e-01
dtype: float64
 
const    1.540213e-52
dtype: float64
 


0,1,2,3
Dep. Variable:,Score,R-squared:,-0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,
Date:,"Sun, 15 Nov 2020",Prob (F-statistic):,
Time:,18:30:50,Log-Likelihood:,56.487
No. Observations:,83,AIC:,-111.0
Df Residuals:,82,BIC:,-108.6
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.4948,0.014,36.569,0.000,0.468,0.522

0,1,2,3
Omnibus:,1.345,Durbin-Watson:,1.871
Prob(Omnibus):,0.51,Jarque-Bera (JB):,1.348
Skew:,-0.216,Prob(JB):,0.51
Kurtosis:,2.55,Cond. No.,1.0


No correlation has been noticed between ZuCo values and anticipaiton score values (SL = 0.05)

In [8]:
X = disgust_ds[['MPS','TRT','GD','FFD']]
y = disgust_ds['Score']
X = sm.add_constant(X)

model = be.backWardEliminationMLR(X,y)
model.summary()

const    0.998591
MPS      0.417207
TRT      0.430732
GD       0.712350
FFD      0.755057
dtype: float64
 
const    0.998429
MPS      0.314107
TRT      0.456123
GD       0.840333
dtype: float64
 
const    0.976442
MPS      0.311308
TRT      0.361046
dtype: float64
 
const    0.930323
MPS      0.176793
dtype: float64
 
const    4.894083e-08
dtype: float64
 


0,1,2,3
Dep. Variable:,Score,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,
Date:,"Sun, 15 Nov 2020",Prob (F-statistic):,
Time:,18:30:50,Log-Likelihood:,1.2067
No. Observations:,25,AIC:,-0.4133
Df Residuals:,24,BIC:,0.8055
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.3672,0.047,7.803,0.000,0.270,0.464

0,1,2,3
Omnibus:,3.204,Durbin-Watson:,1.985
Prob(Omnibus):,0.201,Jarque-Bera (JB):,2.125
Skew:,0.518,Prob(JB):,0.346
Kurtosis:,2.018,Cond. No.,1.0


No correlation has been noticed between ZuCo values and disgust score values (SL = 0.05)

In [9]:
X = fear_ds[['MPS','TRT','GD','FFD']]
y = fear_ds['Score']
X = sm.add_constant(X)

model = be.backWardEliminationMLR(X,y)
model.summary()

const    0.499155
MPS      0.081821
TRT      0.169964
GD       0.500827
FFD      0.197703
dtype: float64
 
const    0.423467
MPS      0.105003
TRT      0.093356
FFD      0.204596
dtype: float64
 
const    0.182373
MPS      0.139532
TRT      0.267310
dtype: float64
 
const    0.227606
MPS      0.210156
dtype: float64
 
const    1.471035e-18
dtype: float64
 


0,1,2,3
Dep. Variable:,Score,R-squared:,-0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,
Date:,"Sun, 15 Nov 2020",Prob (F-statistic):,
Time:,18:30:50,Log-Likelihood:,-1.7442
No. Observations:,55,AIC:,5.488
Df Residuals:,54,BIC:,7.496
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.4492,0.034,13.215,0.000,0.381,0.517

0,1,2,3
Omnibus:,5.902,Durbin-Watson:,2.091
Prob(Omnibus):,0.052,Jarque-Bera (JB):,2.594
Skew:,0.219,Prob(JB):,0.273
Kurtosis:,2.03,Cond. No.,1.0


No correlation has been noticed between ZuCo values and fear score values (SL = 0.05)

In [10]:
X = joy_ds[['MPS','TRT','GD','FFD']]
y = joy_ds['Score']
X = sm.add_constant(X)

model = be.backWardEliminationMLR(X,y)
model.summary()

const    0.000006
MPS      0.424350
TRT      0.782228
GD       0.554726
FFD      0.697249
dtype: float64
 
const    0.000005
MPS      0.443127
GD       0.596355
FFD      0.723418
dtype: float64
 
const    0.000003
MPS      0.463895
GD       0.144797
dtype: float64
 
const    2.023268e-17
GD       1.267469e-01
dtype: float64
 
const    6.083037e-48
dtype: float64
 


0,1,2,3
Dep. Variable:,Score,R-squared:,-0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,
Date:,"Sun, 15 Nov 2020",Prob (F-statistic):,
Time:,18:30:50,Log-Likelihood:,31.264
No. Observations:,110,AIC:,-60.53
Df Residuals:,109,BIC:,-57.83
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.4469,0.017,25.619,0.000,0.412,0.481

0,1,2,3
Omnibus:,3.074,Durbin-Watson:,1.983
Prob(Omnibus):,0.215,Jarque-Bera (JB):,2.609
Skew:,0.269,Prob(JB):,0.271
Kurtosis:,2.47,Cond. No.,1.0


No correlation has been noticed between ZuCo values and joy score values (SL = 0.05)

In [11]:
X = sadness_ds[['MPS','TRT','GD','FFD']]
y = sadness_ds['Score']
X = sm.add_constant(X)

model = be.backWardEliminationMLR(X,y)
model.summary()

const    0.579814
MPS      0.372548
TRT      0.647064
GD       0.807428
FFD      0.594650
dtype: float64
 
const    0.604012
MPS      0.359862
TRT      0.649858
FFD      0.539480
dtype: float64
 
const    0.582035
MPS      0.335805
FFD      0.172193
dtype: float64
 
const    0.000441
FFD      0.169958
dtype: float64
 
const    2.056652e-15
dtype: float64
 


0,1,2,3
Dep. Variable:,Score,R-squared:,-0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,
Date:,"Sun, 15 Nov 2020",Prob (F-statistic):,
Time:,18:30:50,Log-Likelihood:,-6.022
No. Observations:,50,AIC:,14.04
Df Residuals:,49,BIC:,15.96
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.4452,0.039,11.418,0.000,0.367,0.524

0,1,2,3
Omnibus:,13.13,Durbin-Watson:,2.234
Prob(Omnibus):,0.001,Jarque-Bera (JB):,3.262
Skew:,0.129,Prob(JB):,0.196
Kurtosis:,1.776,Cond. No.,1.0


No correlation has been noticed between ZuCo values and sadness score values (SL = 0.05)

In [12]:
X = surprise_ds[['MPS','TRT','GD','FFD']]
y = surprise_ds['Score']
X = sm.add_constant(X)

model = be.backWardEliminationMLR(X,y)
model.summary()

const    0.029491
MPS      0.721343
TRT      0.830113
GD       0.329318
FFD      0.276748
dtype: float64
 
const    0.020900
MPS      0.686175
GD       0.293879
FFD      0.213888
dtype: float64
 
const    0.000169
GD       0.319062
FFD      0.230809
dtype: float64
 
const    0.000066
FFD      0.438223
dtype: float64
 
const    9.127711e-14
dtype: float64
 


0,1,2,3
Dep. Variable:,Score,R-squared:,-0.0
Model:,OLS,Adj. R-squared:,-0.0
Method:,Least Squares,F-statistic:,
Date:,"Sun, 15 Nov 2020",Prob (F-statistic):,
Time:,18:30:50,Log-Likelihood:,15.939
No. Observations:,30,AIC:,-29.88
Df Residuals:,29,BIC:,-28.48
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.3479,0.026,13.171,0.000,0.294,0.402

0,1,2,3
Omnibus:,1.001,Durbin-Watson:,2.618
Prob(Omnibus):,0.606,Jarque-Bera (JB):,1.009
Skew:,0.356,Prob(JB):,0.604
Kurtosis:,2.452,Cond. No.,1.0


No correlation has been noticed between ZuCo values and surprise score values (SL = 0.05)

In [13]:
X = trust_ds[['MPS','TRT','GD','FFD']]
y = trust_ds['Score']
X = sm.add_constant(X)

model = be.backWardEliminationMLR(X,y)
model.summary()

const    1.462442e-10
MPS      7.698936e-01
TRT      8.541483e-01
GD       9.617501e-01
FFD      7.747600e-01
dtype: float64
 
const    1.194705e-10
MPS      7.707021e-01
TRT      8.152153e-01
FFD      6.090411e-01
dtype: float64
 
const    1.032591e-10
MPS      7.181858e-01
FFD      4.777018e-01
dtype: float64
 
const    3.559462e-35
FFD      4.448646e-01
dtype: float64
 
const    5.169434e-100
dtype: float64
 


0,1,2,3
Dep. Variable:,Score,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,
Date:,"Sun, 15 Nov 2020",Prob (F-statistic):,
Time:,18:30:50,Log-Likelihood:,95.682
No. Observations:,167,AIC:,-189.4
Df Residuals:,166,BIC:,-186.2
Df Model:,0,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.5141,0.011,48.551,0.000,0.493,0.535

0,1,2,3
Omnibus:,0.681,Durbin-Watson:,1.884
Prob(Omnibus):,0.712,Jarque-Bera (JB):,0.768
Skew:,0.146,Prob(JB):,0.681
Kurtosis:,2.84,Cond. No.,1.0


No correlation has been noticed between ZuCo values and surprise score values (SL = 0.05)

### Conclusions

The results does not report any meaningful sign of linear correlation between the ZuCo features values and the single discrete emotions.