## Two Way Anova

## Hypothesis

1. Main Effect of Watering Frequency (Independent Variable 1)

H₀ (Null Hypothesis): The mean of plant height is not significantly different for all the watering frequencies.
H₁ (Alternative Hypothesis): The mean of plant height is significantly different for all the watering frequencies.

2. Main Effect of Sunlight Exposure (Independent Variable 2)

H₀(Null Hypothesis): The mean of plant height is not significantly different for all sunlight exposure.
H₁(Alternative Hypothesis): The mean of plant height is significantly different for all sunlight exposure.

3. Interaction Effect Between Watering Frequency and Sunlight Exposure
   
H₀ (Null Hypothesis):The mean plant of height is not significantly affected by the interaction between watering frequency and sunlight exposure.
H₁ (Alternative Hypothesis): The mean of plant height is significantly affected by the interaction between watering frequency and sunlight exposure.

importing the libraries

In [2]:
import warnings
warnings.filterwarnings('ignore')
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

loading the dataset

In [4]:
growth = pd.read_csv('TwoWayAnova.csv')

In [5]:
growth.head(10)

Unnamed: 0,water,sun,height
0,Daily,Low,5.0
1,Daily,Low,5.2
2,Daily,Low,5.6
3,Daily,Low,4.3
4,Daily,Low,4.8
5,Daily,Medium,6.4
6,Daily,Medium,6.2
7,Daily,Medium,4.7
8,Daily,Medium,5.5
9,Daily,Medium,5.8


In [6]:
growth.groupby(['sun','water'])['height'].mean()

sun     water 
High    Daily     5.78
        Weekly    5.32
Low     Daily     4.98
        Weekly    5.22
Medium  Daily     5.72
        Weekly    6.06
Name: height, dtype: float64

groups the data in the growth DataFrame by the combination 
of the sun and water columns, 
and then calculates the mean of the height for each combination.

Using Ordinary Least Squares(OLS) regression from the statsmodels library to fit a simple linear regression model.

In [7]:
growth_model_1 = ols('height ~ water',data =growth).fit()

In [8]:
ano2test_1=sm.stats.anova_lm(growth_model_1,type = 2)

In [9]:
ano2test_1

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
water,2.0,0.628333,0.314167,0.823038,0.449813
Residual,27.0,10.306333,0.381716,,


water: p-value = 0.449813 > 0.05 , retain H0

H₀ (Null Hypothesis): The mean of height of the plan is not significantly different for the water levels

In [10]:
growth_model_2 = ols('height ~ sun',data =growth).fit()

In [11]:
ano2test_2=sm.stats.anova_lm(growth_model_2,type = 2)

In [12]:
ano2test_2

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
sun,2.0,3.140667,1.570333,5.439954,0.010349
Residual,27.0,7.794,0.288667,,


sun: p-value = 0.010349 < 0.05 , reject H0

H₁ (Alternative Hypothesis): the mean height of the plant is significantly different for all the levels of sun exposure

In [13]:
growth_model_2 = ols('height ~ water+sun',data =growth).fit()

In [14]:
ano2test2=sm.stats.anova_lm(growth_model_2,type = 2)

In [15]:
ano2test2

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
water,2.0,0.628333,0.314167,1.147515,0.333584
sun,2.0,3.461833,1.730917,6.32229,0.005997
Residual,25.0,6.8445,0.27378,,


water*sun: p-value = 0.136537 > 0.05, retain H0

H₀ (Null Hypothesis): There is no interaction effect between watering frequency and sunlight exposure on plant height.

## Summary:

H0 is true - The mean of plant height is not significantly different for all the watering frequencies

H0 is false - The mean of plant height is significantly different for all sunlight exposure.

H0 is true - The mean plant of height is not significantly affected by the interaction between watering frequency and sunlight exposure.

## Inference:

In this scenario, the primary driver for changes in plant height 
is sunlight exposure, while watering frequency alone is not 
a significant factor. There is no combined effect between the two factors.