## MANOVA (Multivariate Analysis of Variance)

In [1]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.multivariate.manova import MANOVA

In [9]:
# Data Preparation
url = 'https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv'
df = pd.read_csv(url, index_col=0)
df.columns = df.columns.str.replace(".", "_")
df.head()

  df.columns = df.columns.str.replace(".", "_")


Unnamed: 0,Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Species
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3.0,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5.0,3.6,1.4,0.2,setosa


In [10]:
maov = MANOVA.from_formula('Sepal_Length + Sepal_Width + Petal_Length + Petal_Width  ~ Species', data=df)

In [11]:
print(maov.mv_test())

                   Multivariate linear model
                                                                
----------------------------------------------------------------
       Intercept         Value  Num DF  Den DF   F Value  Pr > F
----------------------------------------------------------------
          Wilks' lambda  0.0170 4.0000 144.0000 2086.7720 0.0000
         Pillai's trace  0.9830 4.0000 144.0000 2086.7720 0.0000
 Hotelling-Lawley trace 57.9659 4.0000 144.0000 2086.7720 0.0000
    Roy's greatest root 57.9659 4.0000 144.0000 2086.7720 0.0000
----------------------------------------------------------------
                                                                
----------------------------------------------------------------
        Species          Value  Num DF  Den DF   F Value  Pr > F
----------------------------------------------------------------
          Wilks' lambda  0.0234 8.0000 288.0000  199.1453 0.0000
         Pillai's trace  1.1919 8.0000 290.00

### Description
- Value: This column represents the test statistic value for each effect or interaction term in the MANOVA analysis. The test statistic is typically based on the F-distribution.

- Num DF: Num DF stands for the numerator degrees of freedom. It indicates the degrees of freedom associated with the numerator of the F-test for each effect or interaction term. The numerator degrees of freedom are based on the number of groups or levels of the factor being tested.

- Den DF: Den DF stands for the denominator degrees of freedom. It represents the degrees of freedom associated with the denominator of the F-test for each effect or interaction term. The denominator degrees of freedom are based on the error term, which measures the within-group variability.

- F Value: The F value is the ratio of the between-group variability to the within-group variability. It is calculated by dividing the mean square between (MSB) by the mean square within (MSW). It measures the strength of the effect or interaction term. Larger F values indicate more significant effects.

- Pr > F: Pr > F represents the p-value associated with each effect or interaction term. It indicates the probability of observing the obtained F value (or a more extreme value) under the null hypothesis of no effect. Smaller p-values suggest stronger evidence against the null hypothesis and support the presence of a significant effect.

By examining the F value and p-value for each effect or interaction term, you can determine whether there are significant differences among the groups or levels based on the variables being tested in the MANOVA analysis.

## Hypothesis

- Null Hypothesis (H0): There is no significant difference among the "Species" groups regarding the multivariate response variables, which include "Sepal_Length", "Sepal_Width", "Petal_Length", and "Petal_Width".

- Alternative Hypothesis (H1): There is a significant difference between at least one pair of "Species" groups regarding the multivariate response variables, which include "Sepal_Length", "Sepal_Width", "Petal_Length", and "Petal_Width".

The MANOVA results provide information about the effect of the categorical variable "Species" on the multivariate response variables, which include "Sepal_Length", "Sepal_Width", "Petal_Length", and "Petal_Width". Let's explain each section of the MANOVA results:

First Section (Intercept):
- Wilks' lambda: This is the Wilks' lambda value, which indicates the proportion of variability explained by the model.
- Pillai's trace: It represents the measure of the distance between groups in the multivariate space.
- Hotelling-Lawley trace: This is a measure of the effect of groups in the multivariate space.
- Roy's greatest root: It represents the strongest effect of groups in the univariate space.

Second Section (Species):
- Wilks' lambda: It indicates the proportion of variability explained by the variable "Species" in the multivariate response variables.
- Pillai's trace: This measure represents the distance between "Species" groups in the multivariate space.
- Hotelling-Lawley trace: It measures the effect of the "Species" groups in the multivariate space.
- Roy's greatest root: This is the strongest effect of the "Species" groups in the univariate space.
For each measure (Wilks' lambda, Pillai's trace, Hotelling-Lawley trace, and Roy's greatest root), there is a corresponding "Value" column that shows the value of the measure. Additionally, there are "Num DF" (numerator degrees of freedom) and "Den DF" (denominator degrees of freedom) columns, which indicate the degrees of freedom associated with the F-distribution used to test the significance of the effects or differences.

The "F Value" column displays the calculated F-statistic value used to test the significance of the effects or differences. The "Pr > F" column indicates the p-value or the probability of association between the effects or differences and the variable "Species". In this case, all the "Pr > F" values are 0 (or very close to 0), which suggests significant differences between the "Species" groups in the multivariate response variables.

Overall, the MANOVA results conclude that the variable "Species" significantly affects the multivariate response variables, which include "Sepal_Length", "Sepal_Width", "Petal_Length", and "Petal_Width".