
**Covariance Matrix** - variance–covariance matrix - variance matrix

> Covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector. The covariance matrix generalizes the notion of variance to multiple dimensions. **Covariance** is a measure of how much two variables vary together.
> 
> $Cov = \begin{bmatrix} Var(x) & Cov(x,y) & Cov(x,z)\\Cov(y,x) & Var(y) & Cov(y,z)\\Cov(z,x) & Cov(z,y) & Var(z) \end{bmatrix}$

**MANOVA** - Multivariate Analysis of Variance
>  MANOVA is an extension of ANOVA (Analysis of Variance) that allows for the analysis of multiple dependent variables simultaneously. While ANOVA tests for differences in means between groups on a single dependent variable, MANOVA examines whether groups differ based on the covariance of the dependent variables.
>
>there is no significant difference in the combination (or covariance) of the dependent variables between the group
>
>  |  |  |  |
> |--|--|--|
> | $H_0$ | $μ_1 = μ_2 =... = μ_n$    | → There is no significant difference in the combination (or covariance) of the dependent variables between the group |
> | $H_1$ | $∃i,j \ \ such\ that \ \ i ≠ j\ and\ μ_i ≠ μ_j$ | → There is a significant difference between groups on at least one of the dependent variables |
>
> | Number of <br> Dependent <br> Variable| Number of <br> Independent <br> Variable | Test |
> |:-:|:-:|:-:|
> | 1 | 1 | One-Way ANOVA |
> | 1 | 2 | Two-Way ANOVA |
> | +1 | 1 | One-Way MANOVA |
> | +1 | 2 | Two-way MANOVA |

**One-Way MANOVA**
> Used to determine whether multiple dependent variables differ across groups of a single independent variable.
>
> |  |  |
> |--|--|
> | $H_0$	|→ The independent variable has no effect on the dependent variables.|
> | $H_1$	|→ The independent variable has an effect on the dependent variables.|
> 
> **ASSUMTIONS**
> > → Observations must be independent. <br>
> > → Each dependent variable must be continuous. <br>
> > → Each group of the independent variable must follow a normal distribution. <br>
> > → Variances must be homogeneous. <br>
> > → Variance-Covariance matrices must be homogeneous.
> > > **Box’s M Test** is used to test the homogeneity of the variance-covariance matrix.

	
**Two-Way MANOVA**
> Used to determine whether multiple dependent variables differ across groups of two independent variables. Assumptions
>
> |  |  |
> |--|--|
> | $H_0$	|→ The independent variables have no effect on the dependent variables.|
> | $H_1$	|→ The independent variables have an effect on the dependent variables.|
> 
> **ASSUMTIONS**
> > → Observations must be independent. <br>
> > → Each dependent variable must be continuous. <br>
> > → Each group of the independent variables must follow a normal distribution. <br>
> > → Variances must be homogeneous. <br>
> > → Variance-Covariance matrices must be homogeneous.
> > > **Box’s M Test** is used to test the homogeneity of the variance-covariance matrix.

<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

In [1]:
import numpy as np
import pandas as pd
import pingouin as pg
from statsmodels.multivariate.manova import MANOVA

import matplotlib.pyplot as plt

α = alpha = 0.05

<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

<p style="background-image: linear-gradient(#0aa98f, #ffffff 10%); font-weight:bold;"> 
  &nbsp;  COVARIANCE MATRIX </p>

In [2]:
x = [45, 37, 42, 35, 39]
y = [38, 31, 26, 28, 33]
data = np.array([x,y])
print(data)

[[45 37 42 35 39]
 [38 31 26 28 33]]


In [3]:
covariance = np.cov(data, bias=True)
print(covariance)

# print(np.var(x))
# print(np.var(y))

[[12.64  7.68]
 [ 7.68 17.36]]


<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

<p style="background-image: linear-gradient(#0aa98f, #ffffff 10%); font-weight:bold;"> 
 &nbsp; ONE-WAY MANOVA </p>

**Subject :** Comparison of attitudes towards different product groups based on gender <br>
**Data :** 10_attitude_towards_products.csv

|  |  |
|--|--|
| $H_0$ | → There is no significant difference in attitudes towards the product groups based on gender. |
| $H_1$ | → There is at least one significant difference in attitudes towards the product groups based on gender. |

In [4]:
data = pd.read_csv('data/10_attitude_towards_product.csv')
data.sample(3)
# data.info()

Unnamed: 0,Product_ID,Male,Female
18,1,6,8
42,3,6,8
24,2,6,6


In [5]:
data['Product_ID'] = data['Product_ID'].astype('category')
data['Product_ID'].cat.categories

Index([1, 2, 3], dtype='int64')

**1. Normality Test**

In [6]:
data['Product_ID'].value_counts().to_frame().T

Product_ID,1,2,3
count,20,20,20


Since there are 20 data points for each product group, and due to the power of ANOVA/MANOVA tests against normality as well as the assistance of the Central Limit Theorem, we can assume that the normality assumption is met.

In [7]:
# cats = data['Product_ID'].cat.categories
# groups = data.columns[-2:] # ['Male', 'Female']

# fig, axs = plt.subplots(len(groups), len(cats), figsize=(12,8))

# for i, group in enumerate(groups):
#     for j, cat in enumerate(cats):
#         filt = data['Product_ID']==cat
#         pg.qqplot(data[filt][group], ax=axs[i,j])
#         axs[i,j].set_title(f'Product {cat} & {group}')
#         axs[i,j].set_xlabel(None)
#         axs[i,j].set_ylabel(None)
# plt.show()

In [8]:
# display(
#     pg.normality(data['Male']),
#     pg.normality(data['Female']),
#     pg.normality(data, dv='Male', group='Product_ID'),
#     pg.normality(data, dv='Female', group='Product_ID')
# )

**2. Homogeneity Test**

In [9]:
display(
    pg.homoscedasticity(data, dv='Male', group='Product_ID', center='mean'),
    # pg.homoscedasticity(data, dv='Male', group='Product_ID', method='bartlett'),
    pg.homoscedasticity(data, dv='Female', group='Product_ID', center='mean'),
    # pg.homoscedasticity(data, dv='Female', group='Product_ID', method='bartlett')
)

Unnamed: 0,W,pval,equal_var
levene,0.258799,0.772881,True


Unnamed: 0,W,pval,equal_var
levene,1.693663,0.192965,True


**3. Variance-Covariance Homogeneity** - Box’s M Test

In [10]:
pg.box_m(data, dvs=['Male', 'Female'], group='Product_ID')

Unnamed: 0,Chi2,df,pval,equal_cov
box,7.383242,6.0,0.286854,True


**4. Test Implementation** - One-Way MANOVA

In [11]:
model = MANOVA.from_formula('Male+Female~Product_ID', data=data)
print(model.mv_test())

                  Multivariate linear model
                                                              
--------------------------------------------------------------
       Intercept         Value  Num DF  Den DF F Value  Pr > F
--------------------------------------------------------------
          Wilks' lambda  0.0404 2.0000 56.0000 665.8804 0.0000
         Pillai's trace  0.9596 2.0000 56.0000 665.8804 0.0000
 Hotelling-Lawley trace 23.7814 2.0000 56.0000 665.8804 0.0000
    Roy's greatest root 23.7814 2.0000 56.0000 665.8804 0.0000
--------------------------------------------------------------
                                                              
--------------------------------------------------------------
        Product_ID       Value  Num DF  Den DF  F Value Pr > F
--------------------------------------------------------------
           Wilks' lambda 0.9805 4.0000 112.0000  0.2764 0.8927
          Pillai's trace 0.0195 4.0000 114.0000  0.2813 0.8895
  Hotelling

The **Wilks' lambda test** for the independent variable is the most powerful and used. We make our decision based on the $Pr>F$ column. In this case $H_0$ cannot be rejected.

<p style="background-image: linear-gradient(#f87674, #ffffff 10%); font-weight:bold;"> 
    &nbsp; </p>

**5. Test Implementation** - Post-hoc - **UNNECESSARY IN THIS CASE**

In [12]:
test_m = pg.pairwise_tukey(data, dv='Male', between='Product_ID')
test_f = pg.pairwise_tukey(data, dv='Female', between='Product_ID')

test_m['Decision'] = test_m['p-tukey'].map(lambda x: True if x>alpha else False)
test_f['Decision'] = test_f['p-tukey'].map(lambda x: True if x>alpha else False)

display(test_m, test_f)

Unnamed: 0,A,B,mean(A),mean(B),diff,se,T,p-tukey,hedges,Decision
0,1,2,5.35,5.25,0.1,0.317059,0.315399,0.946694,0.092898,True
1,1,3,5.35,5.5,-0.15,0.317059,-0.473098,0.884199,-0.152002,True
2,2,3,5.25,5.5,-0.25,0.317059,-0.788497,0.711571,-0.249165,True


Unnamed: 0,A,B,mean(A),mean(B),diff,se,T,p-tukey,hedges,Decision
0,1,2,7.85,7.6,0.25,0.34931,0.715696,0.755252,0.240793,True
1,1,3,7.85,7.7,0.15,0.34931,0.429418,0.903532,0.138612,True
2,2,3,7.6,7.7,-0.1,0.34931,-0.286279,0.955863,-0.080027,True


<p style="background-image: linear-gradient(to right, #ee2965, #e31837)"> &nbsp; </p>

<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>

<p style="background-image: linear-gradient(#0aa98f, #ffffff 10%); font-weight:bold;"> 
 &nbsp; TWO-WAY MANOVA </p>
    
*We assume that the assumptions are met.*<br>

**Subject :** The effect of employees' positions and departments on their performance and ownership <br>
**Data :** 11_employee_perfrmance_ownership.csv

|  |  |
|--|--|
| $H_0$ | → The positions and departments of employees have no effect on their performance and ownership. |
| $H_1$ | → The positions and departments of employees have an effect on their performance and ownership. |

In [13]:
data = pd.read_csv('data/11_employee_perfrmance_ownership.csv')
data.sample(3)

Unnamed: 0,Position,Department,Performance,Ownership
23,Manager,R&D,7,5
7,Supervisor,Finance,6,5
8,Manager,Accounting,6,5


**1. Test Implementation** - Two-Way MANOVA

In [14]:
formula = 'Performance+Ownership~Position+Department+Position:Department'
model = MANOVA.from_formula(formula, data=data)

print(model.mv_test())

                  Multivariate linear model
                                                             
-------------------------------------------------------------
        Intercept        Value  Num DF  Den DF F Value Pr > F
-------------------------------------------------------------
           Wilks' lambda 0.1028 2.0000 20.0000 87.2904 0.0000
          Pillai's trace 0.8972 2.0000 20.0000 87.2904 0.0000
  Hotelling-Lawley trace 8.7290 2.0000 20.0000 87.2904 0.0000
     Roy's greatest root 8.7290 2.0000 20.0000 87.2904 0.0000
-------------------------------------------------------------
                                                             
-------------------------------------------------------------
         Position        Value  Num DF  Den DF F Value Pr > F
-------------------------------------------------------------
           Wilks' lambda 0.9144 4.0000 40.0000  0.4578 0.7662
          Pillai's trace 0.0872 4.0000 42.0000  0.4788 0.7511
  Hotelling-Lawley trace 0

In tables **Position, Department, Position:Department**, $H_0$ cannot be rejected in each of them since the $p$ values ($Pr>F$) are greater than 0.05 according to Wilks' lambda test results.

<p style="background-image: linear-gradient(#f87674, #ffffff 10%); font-weight:bold;"> 
    &nbsp; </p>

**5. Test Implementation** - Post-hoc - **UNNECESSARY IN THIS CASE**

In [15]:
test_pp = pg.pairwise_tukey(data, dv='Performance', between='Position')
test_pd = pg.pairwise_tukey(data, dv='Performance', between='Department')
test_op = pg.pairwise_tukey(data, dv='Ownership', between='Position')
test_od = pg.pairwise_tukey(data, dv='Ownership', between='Department')

test_pp['Decision'] = test_pp['p-tukey'].map(lambda x: True if x>alpha else False)
test_pd['Decision'] = test_pd['p-tukey'].map(lambda x: True if x>alpha else False)
test_op['Decision'] = test_op['p-tukey'].map(lambda x: True if x>alpha else False)
test_od['Decision'] = test_od['p-tukey'].map(lambda x: True if x>alpha else False)

display(test_pp, test_pd, test_op, test_od)

Unnamed: 0,A,B,mean(A),mean(B),diff,se,T,p-tukey,hedges,Decision
0,Manager,Supervisor,7.0,6.181818,0.818182,0.439133,1.863177,0.166959,0.756762,True
1,Manager,Worker,7.0,6.272727,0.727273,0.439133,1.656157,0.238519,0.696496,True
2,Supervisor,Worker,6.181818,6.272727,-0.090909,0.439133,-0.20702,0.976669,-0.083734,True


Unnamed: 0,A,B,mean(A),mean(B),diff,se,T,p-tukey,hedges,Decision
0,Accounting,Finance,6.5,6.25,0.25,0.548759,0.455573,0.968002,0.256838,True
1,Accounting,Production,6.5,6.727273,-0.227273,0.557011,-0.408022,0.976627,-0.18832,True
2,Accounting,R&D,6.5,6.5,0.0,0.708445,0.0,1.0,0.0,True
3,Finance,Production,6.25,6.727273,-0.477273,0.45813,-1.041785,0.726639,-0.410039,True
4,Finance,R&D,6.25,6.5,-0.25,0.633652,-0.394538,0.978775,-0.22647,True
5,Production,R&D,6.727273,6.5,0.227273,0.640812,0.354663,0.984394,0.167575,True


Unnamed: 0,A,B,mean(A),mean(B),diff,se,T,p-tukey,hedges,Decision
0,Manager,Supervisor,4.727273,4.545455,0.181818,0.538337,0.33774,0.939186,0.140701,True
1,Manager,Worker,4.727273,4.909091,-0.181818,0.538337,-0.33774,0.939186,-0.128441,True
2,Supervisor,Worker,4.545455,4.909091,-0.363636,0.538337,-0.67548,0.779407,-0.297597,True


Unnamed: 0,A,B,mean(A),mean(B),diff,se,T,p-tukey,hedges,Decision
0,Accounting,Finance,5.333333,5.0,0.333333,0.59446,0.560733,0.942839,0.262882,True
1,Accounting,Production,5.333333,4.090909,1.242424,0.6034,2.059039,0.190493,1.133251,True
2,Accounting,R&D,5.333333,4.75,0.583333,0.767445,0.760098,0.871593,0.524159,True
3,Finance,Production,5.0,4.090909,0.909091,0.496283,1.831798,0.279448,0.699956,True
4,Finance,R&D,5.0,4.75,0.25,0.686424,0.364207,0.983147,0.177769,True
5,Production,R&D,4.090909,4.75,-0.659091,0.69418,-0.949452,0.77861,-0.532236,True


<p style="background-image: linear-gradient(to right, #ee2965, #e31837)"> &nbsp; </p>

<p style="background-image: linear-gradient(to right, #0aa98f, #68dab2)"> &nbsp; </p>