# Anova_test

ANOVA is a parametric statistical technique that helps in finding out if there is a significant difference between the mean of three or more groups. It checks the impact of various factors by comparing groups (samples) on the basis of their respective mean. 

We can use this only when: 

the samples have a normal distribution.
the samples are selected at random and should be independent of one another.
all groups have equal standard deviations.

## One Way Anova
It is a type of hypothesis test where only one factor is considered. We use F-statistic to perform a one-way analysis of variance.One-Way ANOVA in Python: One-way ANOVA (also known as “analysis of variance”) is a test that is used to find out whether there exists a statistically significant difference between the mean values of more than one group.

Hypothesis involved:
A one-way ANOVA has the below given null and alternative hypotheses:

H0 (null hypothesis): μ1 = μ2 = μ3 = … = μk (It implies that the means of all the population are equal)
H1 (null hypothesis): It states that there will be at least one population mean that differs from the rest

Statement:

Q 1. Researchers took 20 cars of the same to take part in a study. These cars are randomly doped with one of the four-engine oils and allowed to run freely for 100 kilometers each. At the end of the journey, the performance of each of the cars is noted. Before proceeding further we need to install the SciPy library in our system. You can install this library by using the below command in the terminal:

Stepwise Implementation
Conducting a One-Way ANOVA test in Python is a step by step process and these steps are explained below:

Step 1: Creating data groups.

The very first step is to create three arrays that will keep the information of cars 

In [4]:
import numpy as np
import scipy.stats as stats
from numpy.random import randn
import seaborn as sns 
from scipy.stats import norm
import scipy

In [2]:
# Performance when each of the engine 
# oil is applied
performance1 = [89, 89, 88, 78, 79]
performance2 = [93, 92, 94, 89, 88]
performance3 = [89, 88, 89, 93, 90]
performance4 = [81, 78, 81, 92, 82]


Step 2: Conduct the one-way ANOVA:

Python provides us f_oneway() function from SciPy library using which we can conduct the One-Way ANOVA.

In [5]:
# Importing library
from scipy.stats import f_oneway

# Performance when each of the engine 
# oil is applied
performance1 = [89, 89, 88, 78, 79]
performance2 = [93, 92, 94, 89, 88]
performance3 = [89, 88, 89, 93, 90]
performance4 = [81, 78, 81, 92, 82]

# Conduct the one-way ANOVA
f_oneway(performance1, performance2, performance3, performance4)


F_onewayResult(statistic=4.625000000000002, pvalue=0.016336459839780215)

# P value

Step 3: Analyse the result:

The F statistic and p-value turn out to be equal to 4.625 and 0.016336498 respectively. Since the p-value is less than 0.05 hence we would reject the null hypothesis. This implies that we have sufficient proof to say that there exists a difference in the performance among four different engine oils.

# Two way ANOVA

 ### Two-Way ANOVA in statistics stands for Analysis of Variance and it is used to check whether there is a statistically significant difference between the mean value of three or more that has been divided into two factors. In simple words, ANOVA is a test conducted in statistics and it is used to interpret the difference between the mean value of at least three groups. 

#### The main objective of a two-way ANOVA is to find out how two factors affect a response variable and to find out whether there is a relation between the two factors on the response variable.

### Q 2. Let us consider an example in which scientists need to know whether plant growth is affected by fertilizers and watering frequency. They planted exactly 30 plants and allowed them to grow for six months under different conditions for fertilizers and watering frequency. After exactly six months, they recorded the heights of each plant centimeters. Performing a Two-Way ANOVA in Python is a step by step process and these are discussed below:

Step 1: Import Libraries

In [1]:
# Importing libraries 
import numpy as np 
import pandas as pd


Step 2: Enter the data.

Let us create a pandas DataFrame that consist of the following three variables:

1. fertilizers:  how frequently each plant was fertilized that is daily or weekly.
2. watering:  how frequently each plant was watered that is daily or weekly.
3. height:  the height of each plant (in inches) after six months.

In [3]:
# Create a dataframe 
dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15), 
						'Watering': np.repeat(['daily', 'weekly'], 15), 
						'height': [14, 16, 15, 15, 16, 13, 12, 11, 14, 
									15, 16, 16, 17, 18, 14, 13, 14, 14, 
									14, 15, 16, 16, 17, 18, 14, 13, 14, 
									14, 14, 15]}) 
dataframe


Unnamed: 0,Fertilizer,Watering,height
0,daily,daily,14
1,daily,daily,16
2,daily,daily,15
3,daily,daily,15
4,daily,daily,16
5,daily,daily,13
6,daily,daily,12
7,daily,daily,11
8,daily,daily,14
9,daily,daily,15


Step 3: Conduct the two-way ANOVA:

To perform the two-way ANOVA, the Statsmodels library provides us with anova_lm() function. The syntax of the function is given below, 

Syntax:

sm.stats.anova_lm(model, type=2)

Parameters:

model: It represents model statistics
type: It represents the type of Anova test to perform that is { I or II or III or 1 or 2 or 3 }

In [7]:
# Importing libraries 
import statsmodels.api as sm 
from statsmodels.formula.api import ols 

# Performing two-way ANOVA 
model = ols('height ~ C(Fertilizer) + C(Watering) + C(Fertilizer):C(Watering)', 
            data=dataframe).fit() 
result = sm.stats.anova_lm(model, type=2) 
  
# Print the result 
print(result) 

                             df     sum_sq   mean_sq         F    PR(>F)
C(Fertilizer)               1.0   0.033333  0.033333  0.012069  0.913305
C(Watering)                 1.0   1.027463  1.027463  0.372012  0.546828
C(Fertilizer):C(Watering)   1.0   0.577010  0.577010  0.208918  0.651144
Residual                   28.0  77.333333  2.761905       NaN       NaN


# P value

The p-value for the interaction effect (0.904053) is greater than 0.05 which depicts that there is no significant interaction effect between fertilizer frequency and watering frequency.

https://www.geeksforgeeks.org/how-to-perform-a-one-way-anova-in-python/

https://www.geeksforgeeks.org/how-to-perform-a-two-way-anova-in-python/