## Notebook by **[Aden](https://twitter.com/Aden_Rajput_)**
- adenrajput@gmail.com

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats

In [2]:
df = pd.read_csv("diet.csv")
df.head()

Unnamed: 0,id,gender,age,height,diet_type,initial_weight,med_weight,final_weight,sex
0,1,Female,22,159,A,58,40.628648,54.2,0
1,2,Female,46,192,A,60,66.940705,54.0,0
2,3,Female,55,170,A,64,74.338654,63.3,0
3,4,Female,33,171,A,64,42.475798,61.1,0
4,5,Female,50,170,A,65,86.663066,62.2,0


In [3]:
df.diet_type.unique()

array(['A', 'B', 'C'], dtype=object)

In [4]:
# ONE WAY ANOVA to find the difference between diet A, diet B, diet C on teh final weight
# H0: There is no difference in each diet
# H1: There is a difference in each diet

# apply anova
f, p = scipy.stats.f_oneway(df[df.diet_type == "A"].final_weight, df[df.diet_type == "B"].final_weight, df[df.diet_type == "C"].final_weight)

#interpret
alpha = 0.05
if p < alpha:
    print("Reject H0 = Diets are different")
else:
    print("Fail to reject H0 = Diets are not different")

Fail to reject H0 = Diets are not different


- Two-way ANOVA (Analysis of Variance) can be used to determine if there is an interaction between two independent variables (diet_type and gender) on a continuous dependent variable (final_weight)

In [5]:
# TWO WAY ANOVA on diet and gender 


from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# two-way ANOVA
model = ols("final_weight ~ C(diet_type) + C(gender)", data=df).fit()

# Print the ANOVA table
anova_table = anova_lm(model)
print(anova_table)
print("_________________")
# Extract the p-values from the ANOVA table
p_values = anova_table['PR(>F)'].tolist()

# Check if the p-value for each factor is less than 0.05
for i, p_value in enumerate(p_values):
    if p_value < 0.05:
        print(f"{anova_table.index[i]} has a significant effect on final_weight (p = {p_value:.10f})")
    else:
        print(f"{anova_table.index[i]} has no significant effect on final_weight (p = {p_value:.10f})")



                df       sum_sq      mean_sq          F        PR(>F)
C(diet_type)   2.0    81.234570    40.617285   1.343881  2.672943e-01
C(gender)      1.0  2613.633642  2613.633642  86.475816  5.805535e-14
Residual      72.0  2176.118499    30.223868        NaN           NaN
_________________
C(diet_type) has no significant effect on final_weight (p = 0.2672943456)
C(gender) has a significant effect on final_weight (p = 0.0000000000)
Residual has no significant effect on final_weight (p = nan)


- ANCOVA is used to determine if there is a significant difference between two or more groups on a continuous dependent variable (final_weight) while controlling for the effect of one or more confounding variables (age).

In [6]:
# ANCOVA

# Perform the ANCOVA
model = ols("final_weight ~ C(diet_type) + C(gender) + age", data=df).fit()

# Print the summary of the model
print(model.summary())
print("_____________________________")


# Extract the coefficients, p-values, and other relevant statistics
coef = model.params
p_values = model.pvalues

# Check if the p-value for each variable is less than 0.05
for i, p_value in enumerate(p_values):
    if p_value < 0.05:
        print(f"{model.params.index[i]} has a significant effect on final_weight (p = {p_value:.10f})")
    else:
        print(f"{model.params.index[i]} has no significant effect on final_weight (p = {p_value:.10f})")


                            OLS Regression Results                            
Dep. Variable:           final_weight   R-squared:                       0.555
Model:                            OLS   Adj. R-squared:                  0.530
Method:                 Least Squares   F-statistic:                     22.11
Date:                Tue, 24 Jan 2023   Prob (F-statistic):           6.99e-12
Time:                        19:49:39   Log-Likelihood:                -235.19
No. Observations:                  76   AIC:                             480.4
Df Residuals:                      71   BIC:                             492.0
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
Intercept            63.3937      2.92

- one-way MANOVA, there is only one independent variable, and the goal is to determine whether the means of the dependent variables differ between the levels of that independent variable. In this case, the independent variable is "gender" and the dependent variables are "initial_weight," "med_weight," and "final_weight."

- A one-way MANOVA test is used to determine whether there is a significant difference in the mean values of the dependent variables between the two groups defined by the independent variable (e.g. Male and Female). The test statistic and p-value are used to determine whether this difference is statistically significant

In [7]:
df.head()

Unnamed: 0,id,gender,age,height,diet_type,initial_weight,med_weight,final_weight,sex
0,1,Female,22,159,A,58,40.628648,54.2,0
1,2,Female,46,192,A,60,66.940705,54.0,0
2,3,Female,55,170,A,64,74.338654,63.3,0
3,4,Female,33,171,A,64,42.475798,61.1,0
4,5,Female,50,170,A,65,86.663066,62.2,0


In [11]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.multivariate.manova import MANOVA


# make sure that all the columns are numeric
df = df.apply(pd.to_numeric, errors='coerce')

# drop any rows with missing values
df.dropna(inplace=True)

# specify dependent variables
y = df[["initial_weight", "med_weight", "final_weight"]]

# specify independent variable
x = df["gender"]

# perform the manova
manova = MANOVA(y, x)
results = manova.mv_test()

# print the results
print(results)
