ANOVA, short for Analysis of Variance, and also called AOV, is a hypothesis testing technique. The most common use case for ANOVA is when you do an experiment in which your outcome variable is numeric, and your explanatory variable is a categorical variable with three or more classes.

E.g. There is a trial for a new agricultural crop growth product in which you measure the performance of two new treatments and a control group(where no new treatment has been implemented, that is either untreated or treated in old method). 
You measure a numerical outcome (for example kilograms of harvest) in the three groups (treatment 1, treatment 2, and control).
You then compute the average kilograms of harvest for each treatment, and you observe that there are differences in the averages. However, you need to define whether the differences are large enough to state that the outcomes were significantly different and that the differences were not just due to some random variations or sampling error.
This is what ANOVA is made for. 

MANOVA is a multivariate version of the ANOVA model. Multivariate here indicates the fact that there are multiple dependent variables instead of just one.

The goal of a MANOVA analysis is still to detect whether there is a treatment effect vs the other groups. However, this effect is now measured across multiple continuous variables rather than just one.

In this case, let’s do a study in which the goal is to prove that different plant growth products lead to significantly different plant growth.

Therefore, we will have three treatments:

Treatment 1 (Control, No Product)
Treatment 2 (product 1)
Treatment 3 (product 2)
We will use three measurements for defining plant growth:

1. height of the plant
2. width of the plant
3. weight of the plant

Let's import the data with height, width and weight for each treatment type.
For ANOVA, we test only Weight.
For MANOVA, we test all three.

In [18]:
import pandas as pd
data = pd.read_csv(r'C:\Users\2084820\OneDrive - Cognizant\Documents\DS sessions\plant_growth_multivariate_data.csv')

In [24]:
data

Unnamed: 0,Treatment,Height,Width,Weight
0,1,15.8,3.9,29.4
1,1,15.1,3.8,29.9
2,1,14.8,4.1,30.1
3,1,14.4,4.7,30.2
4,1,15.1,3.7,30.9
5,2,15.7,4.8,31.2
6,2,15.9,4.3,31.4
7,2,15.4,4.5,31.8
8,2,16.7,5.4,32.44
9,2,16.9,5.8,32.6


In [23]:
#ANOVA applied
from scipy.stats import f_oneway

f_oneway(data[data["Treatment"]==1]["Weight"], data[data["Treatment"]==2]["Weight"], data[data["Treatment"]==3]["Weight"])

F_onewayResult(statistic=49.26710941017562, pvalue=1.6372198388045576e-06)

Since p-value is less that 0.05, we reject the null hypothesis.
There is significant diffrence in the weights of the plants for different treatments applied.

In [25]:
#MANOVA applied
from statsmodels.multivariate.manova import MANOVA

# fit manova
manova_result = MANOVA.from_formula('Height + Width + Weight ~ Treatment', data)
print(manova_result.mv_test())

                   Multivariate linear model
                                                                
----------------------------------------------------------------
       Intercept         Value   Num DF  Den DF  F Value  Pr > F
----------------------------------------------------------------
          Wilks' lambda   0.0014 3.0000 11.0000 2541.6075 0.0000
         Pillai's trace   0.9986 3.0000 11.0000 2541.6075 0.0000
 Hotelling-Lawley trace 693.1657 3.0000 11.0000 2541.6075 0.0000
    Roy's greatest root 693.1657 3.0000 11.0000 2541.6075 0.0000
----------------------------------------------------------------
                                                                
----------------------------------------------------------------
           Treatment        Value  Num DF  Den DF F Value Pr > F
----------------------------------------------------------------
              Wilks' lambda 0.1005 3.0000 11.0000 32.8119 0.0000
             Pillai's trace 0.8995 3.0000 11.

Pillai’s trace is known to be relatively conservative: it gives a significant result less easily (the differences have to be bigger to obtain significant output).

The Wilks’ Lambda is another often-used test statistic. Hotelling-Lawley trace and Roy’s greatest root are also alternative options. There is no absolute consensus in the statistical literature as to which test statistic should be preferred.

The p-values are shown in the right column and are all less than 0.05, which confirms that treatment has an impact on plant growth across all metrics.

In [26]:
import os
os.getcwd()

'C:\\Users\\2084820'