# Discrete emotions - VAD Multiple linear regression analysis

With this analysis, we want to investigate the linear correlation between discrete emotions values and the VAD values (taken singularly). 
We perform this analysis on the sentences from the ZuCo dataset.

Since only a few emotions are usually associated with a single sentence, most value fields for each phrase are associated with zeros. For this reason, to better enhance the bonds between discrete emotions and VAD values we decided to associate each phrase only with the emotion with the highest score by considering the emotions label as a categorical variable associated with the score value.

In [3]:
import pandas as pd
import statsmodels.api as sm

# some_file.py
import sys
# insert at 1, 0 is the script path (or '' in REPL)
sys.path.insert(1, '../')
import BackwardElimination as be

## Importing the  dataset

In [4]:
dataset = pd.read_csv(r'../Lexicons/Emotion_Sentences_Cross_Analysis.csv')

## Handling categorical variables

In [5]:
X = dataset[['Anger','Anticipation','Disgust','Fear','Joy','Sadness','Trust']]
categorical_structure = {'Score': [], 'Anger': [], 'Anticipation':[], 'Disgust':[], 'Fear':[], 'Joy':[], 'Sadness':[], 'Trust':[]}
df = pd.DataFrame(categorical_structure, columns = ['Score', 'Anger', 'Anticipation', 'Disgust', 'Fear', 'Joy', 'Sadness', 'Trust'])
for i, row in X.iterrows():
    maxValue = 0
    category = ''
    struct = {'Score': 0, 'Anger': 0, 'Anticipation': 0, 'Disgust': 0, 'Fear': 0, 'Joy': 0, 'Sadness': 0, 'Trust':0}
    for column, value in row.iteritems(): 
        if value > maxValue:
            maxValue = value
            category = column
    if maxValue > 0:
        struct[category] = 1
        struct['Score'] = maxValue
        df.loc[i] = struct   
    else:
        df.loc[i] = struct
X = df

print (df)

     Score  Anger  Anticipation  Disgust  Fear  Joy  Sadness  Trust
0    0.953    0.0           0.0      0.0   0.0  0.0      1.0    0.0
1    1.469    0.0           0.0      0.0   0.0  0.0      1.0    0.0
2    1.530    0.0           0.0      0.0   0.0  0.0      0.0    1.0
3    0.477    0.0           0.0      0.0   0.0  0.0      0.0    1.0
4    0.765    0.0           0.0      0.0   0.0  1.0      0.0    0.0
..     ...    ...           ...      ...   ...  ...      ...    ...
385  0.000    0.0           0.0      0.0   0.0  0.0      0.0    0.0
386  2.140    0.0           0.0      0.0   0.0  0.0      0.0    1.0
387  0.942    0.0           0.0      0.0   1.0  0.0      0.0    0.0
388  0.912    0.0           0.0      0.0   0.0  1.0      0.0    0.0
389  1.828    0.0           0.0      0.0   0.0  0.0      0.0    1.0

[390 rows x 8 columns]


## Models building

In [6]:
X = sm.add_constant(X)             
y_arousal = dataset.iloc[:, 0].values
y_valence = dataset.iloc[:, 1].values
y_dominance = dataset.iloc[:, 2].values

### Discrete emotions - Arousal

In [7]:
model_a = be.backWardEliminationMLR(X,y_arousal)
model_a.summary()

const           8.841825e-152
Score            5.164889e-01
Anger            2.992963e-03
Anticipation     3.348986e-02
Disgust          8.106571e-01
Fear             5.528919e-03
Joy              2.451124e-04
Sadness          1.496589e-02
Trust            2.769780e-03
dtype: float64
 
const           7.372537e-160
Score            4.807523e-01
Anger            3.035631e-03
Anticipation     3.233777e-02
Fear             5.334126e-03
Joy              1.747773e-04
Sadness          1.465941e-02
Trust            2.010465e-03
dtype: float64
 
const           1.377174e-160
Anger            2.178958e-03
Anticipation     1.320635e-02
Fear             2.066158e-03
Joy              1.102394e-05
Sadness          4.582568e-03
Trust            2.649087e-05
dtype: float64
 


0,1,2,3
Dep. Variable:,y,R-squared:,0.074
Model:,OLS,Adj. R-squared:,0.06
Method:,Least Squares,F-statistic:,5.121
Date:,"Wed, 11 Nov 2020",Prob (F-statistic):,4.47e-05
Time:,11:18:23,Log-Likelihood:,464.65
No. Observations:,390,AIC:,-915.3
Df Residuals:,383,BIC:,-887.5
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.4244,0.009,46.827,0.000,0.407,0.442
Anger,0.1061,0.034,3.086,0.002,0.038,0.174
Anticipation,0.0351,0.014,2.490,0.013,0.007,0.063
Fear,0.0611,0.020,3.102,0.002,0.022,0.100
Joy,0.0595,0.013,4.455,0.000,0.033,0.086
Sadness,0.0550,0.019,2.852,0.005,0.017,0.093
Trust,0.0453,0.011,4.253,0.000,0.024,0.066

0,1,2,3
Omnibus:,5.492,Durbin-Watson:,1.852
Prob(Omnibus):,0.064,Jarque-Bera (JB):,5.774
Skew:,0.196,Prob(JB):,0.0557
Kurtosis:,3.449,Cond. No.,10.7


We can notice a strong correlation between the dependent variable Arousal and the following independent variables:
Anger
Anticipation
Fear
Joy
Sadness
Trust

SL = 0.05

### Discrete emotions - Valence

In [10]:
model_v = be.backWardEliminationMLR(X,y_valence)
model_v.summary()

const           3.887524e-183
Score            5.144582e-01
Anger            1.388131e-01
Anticipation     8.943099e-01
Disgust          3.493114e-02
Fear             2.324677e-02
Joy              2.944271e-02
Sadness          9.047965e-07
Trust            5.229422e-01
dtype: float64
 
const      6.454123e-221
Score       4.577692e-01
Anger       1.218085e-01
Disgust     2.626632e-02
Fear        1.185885e-02
Joy         1.235364e-02
Sadness     5.807530e-08
Trust       4.719339e-01
dtype: float64
 
const      3.331202e-245
Score       1.993854e-01
Anger       8.909630e-02
Disgust     1.469504e-02
Fear        3.218256e-03
Joy         1.194608e-02
Sadness     8.095020e-10
dtype: float64
 
const      5.731501e-309
Anger       8.214411e-02
Disgust     1.427300e-02
Fear        3.385242e-03
Joy         9.761047e-03
Sadness     1.328030e-09
dtype: float64
 
const      3.058089e-310
Disgust     1.597878e-02
Fear        4.112623e-03
Joy         7.507071e-03
Sadness     1.986814e-09
dtype: float

0,1,2,3
Dep. Variable:,y,R-squared:,0.139
Model:,OLS,Adj. R-squared:,0.13
Method:,Least Squares,F-statistic:,15.55
Date:,"Wed, 11 Nov 2020",Prob (F-statistic):,8.35e-12
Time:,11:21:08,Log-Likelihood:,385.87
No. Observations:,390,AIC:,-761.7
Df Residuals:,385,BIC:,-741.9
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.6516,0.005,122.335,0.000,0.641,0.662
Disgust,-0.0838,0.035,-2.420,0.016,-0.152,-0.016
Fear,-0.0635,0.022,-2.887,0.004,-0.107,-0.020
Joy,0.0353,0.013,2.688,0.008,0.009,0.061
Sadness,-0.1318,0.021,-6.146,0.000,-0.174,-0.090

0,1,2,3
Omnibus:,22.115,Durbin-Watson:,1.789
Prob(Omnibus):,0.0,Jarque-Bera (JB):,34.095
Skew:,-0.408,Prob(JB):,3.95e-08
Kurtosis:,4.197,Cond. No.,7.68


We can notice a strong correlation between the dependent variable Valence and the following independent variables:
Disgust
Fear
Joy
Sadness

SL = 0.05

### Discrete emotions - Dominance

In [9]:
model_d = be.backWardEliminationMLR(X,y_dominance)
model_d.summary()

const           4.078654e-177
Score            5.141481e-01
Anger            2.697950e-03
Anticipation     2.049672e-02
Disgust          2.218889e-01
Fear             1.180720e-01
Joy              8.156312e-01
Sadness          6.163363e-04
Trust            2.048237e-05
dtype: float64
 
const           7.781420e-216
Score            4.034662e-01
Anger            2.433888e-03
Anticipation     9.220274e-03
Disgust          2.278735e-01
Fear             1.066298e-01
Sadness          8.521514e-05
Trust            1.630355e-08
dtype: float64
 
const           1.421412e-229
Anger            2.176567e-03
Anticipation     5.956372e-03
Disgust          2.043135e-01
Fear             8.107809e-02
Sadness          1.208260e-04
Trust            3.303422e-11
dtype: float64
 
const           5.158850e-235
Anger            2.646232e-03
Anticipation     9.060093e-03
Fear             1.010671e-01
Sadness          7.298513e-05
Trust            7.111984e-11
dtype: float64
 
const           5.966377e-247
An

0,1,2,3
Dep. Variable:,y,R-squared:,0.18
Model:,OLS,Adj. R-squared:,0.172
Method:,Least Squares,F-statistic:,21.15
Date:,"Wed, 11 Nov 2020",Prob (F-statistic):,8.87e-16
Time:,11:20:42,Log-Likelihood:,418.8
No. Observations:,390,AIC:,-827.6
Df Residuals:,385,BIC:,-807.8
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.5765,0.007,82.558,0.000,0.563,0.590
Anger,0.1102,0.038,2.911,0.004,0.036,0.185
Anticipation,0.0329,0.014,2.353,0.019,0.005,0.060
Sadness,-0.0864,0.020,-4.250,0.000,-0.126,-0.046
Trust,0.0609,0.009,6.491,0.000,0.042,0.079

0,1,2,3
Omnibus:,8.364,Durbin-Watson:,2.106
Prob(Omnibus):,0.015,Jarque-Bera (JB):,13.057
Skew:,0.099,Prob(JB):,0.00146
Kurtosis:,3.874,Cond. No.,10.1


We can notice a strong correlation between the dependent variable Dominance and the following independent variables:
Anger
Anticipation
Sadness
Trust

SL = 0.05