### One Way ANOVA (Analyis of Variance)

1) A one-way ANOVA (“analysis of variance”) compares the means of three or more independent groups to determine if there is a statistically significant difference between the corresponding population means.


Steps in ANOVA<br>
1) <b>Calculate group mean and overall mean(mean of all the group mean obtained)</b><br>

2) <b>Compute SSR (Regression sum of squares)</b><br>
SSR = n * sum(individal_mean_of_each_group - overall_mean)^2<br>
where n is the length of group (it is same for each group)

3) <b>Compute SSE(Error sum of squares)</b><br>
SSE = sum(xi - x_mean)<br>
where <br>
xi = individual values of each group<br>
x_mean = mean of individal group<br>

4) <b>Compute SST (Total Sum of Squares)</b>
SST = SSR + SSE<br>

5) Populate the ANOVA Table
<img src="ANOVA_table1.png">

where <br>
n = total observations (sum of obersvtaions of each group), K = number of groups

6) Compare f_stat with f-critical at a given alpha value and df_t and df_er values<br>
7) Make inference for the hypothesis<br>

#### Q1)
We want to know whether or not three different exam prep programs lead to different mean scores on a certain exam. To test this, we recruit 30 students to participate in a study and split them into three groups.

The students in each group are randomly assigned to use one of the three exam prep programs for the next three weeks to prepare for an exam. At the end of the three weeks, all of the students take the same exam. The exam scores for each group are shown below:

Perform a one-way ANOVA test to determine if the mean exam score is different between the three groups.

<img src="anova_ex1.png" height="150" width="150" align="left">

In [13]:
import numpy as np
import math
import pandas as pd
import statistics as st

In [41]:
# 1) H0 (null hypothesis): μ1 = μ2 = μ3 = … = μk (all the population means are equal)
# 2) H1 (null hypothesis): at least one population mean is different from the rest

In [1]:
g1 = [85,86,88,75,78,94,98,79,71,80]
g2 = [91,92,93,85,87,84,82,88,95,96]
g3 = [79,78,88,94,92,85,83,85,82,81]


In [46]:
from scipy.stats import f_oneway

In [48]:
f_stat,p = f_oneway(g1,g2,g3)
print('f_stat',f_stat,'p',p)

f_stat 2.3575322551335636 p 0.11384795345837218


Since the F test statistic in the ANOVA table is less than the F critical value in the F distribution table, we fail to reject the null hypothesis. This means we don’t have sufficient evidence to say that there is a statistically significant difference between the mean exam scores of the three groups.

#### Q2) 
Apply One Way ANOVA using alpha=0.05

g1 = [210,240,270,270,300]<br>
g2 = [210,240,240,270,270]<br>
g3 = [180,210,210,210,240]


In [2]:
# 1) H0 (null hypothesis): μ1 = μ2 = μ3 = … = μk (all the population means are equal)
# 2) H1 (null hypothesis): at least one population mean is different from the rest

g1 = [210,240,270,270,300]
g2 = [210,240,240,270,270]
g3 = [180,210,210,210,240]



#### Q3)
Using the following data, perform a oneway ANOVA using α=0.05<br>
g1 = [51,45,33,45,67]<br>
g2 = [23,43,23,43,45]<br>
g3 = [56,76,74,87,56]


In [None]:
# 1) H0 (null hypothesis): μ1 = μ2 = μ3 = … = μk (all the population means are equal)
# 2) H1 (null hypothesis): at least one population mean is different from the rest

In [3]:
g1 = [51,45,33,45,67]
g2 = [23,43,23,43,45]
g3 = [56,76,74,87,56]


#### Link Used -> http://rstudio-pubs-static.s3.amazonaws.com/228015_d8d0ddab79664707890681a9a75cf16d.html

#### Q3)
Perform ANOVA using the following summary at alpha = 0.01
<img src="anova_ex3.png">

#### Q4)
A clinical psychologist has run a between-subjects experiment comparing two treatments for depression (cognitive-behavioral therapy (CBT) and client-centered therapy (CCT) against a control condition. Subjects were randomly assigned to the experimental condition. After 12 weeks, the subject’s depression scores were measured using the CESD depression scale. The data are summarized as follows:

<img src="anova_ex4.png">

Use a oneway ANOVA with α=.01 for the test.

### Chi_Square Test of Independence

A Chi-Square Test of Independence is used to determine whether or not there is a significant association between two categorical variables.

Examples:-
We want to know if gender is associated with political party preference so we survey 500 voters and record their gender and political party preference.<br>
We want to know if a person’s favorite color is associated with their favorite sport so we survey 100 people and ask them about their preferences for both.<br>

Hypothesis<br>

    H0: (null hypothesis) The two variables are independent.
    H1: (alternative hypothesis) The two variables are not independent. (i.e. they are associated)

Formula for Chi_Square test

<img src="chi_square_formula.png">

where<br>
O = Observed Value<br>
E = Expected Value<br>
Expected value = (row sum * column sum) / table sum.

#### Q1)
We want to know whether or not gender is associated with political party preference. We take a simple random sample of 500 voters and survey them on their political party preference. Alpha = 0.05. 
The following table shows the results of the survey:-
<img src="chi_square_ex1.png" width="500">

### Solution
1) Compute the expected value
<img src="chi_square_ex1_expected_val.png" width="500">

2) Compute X2 (chi square values)
<img src="chi_square_ex1_chi_square_val.png" width ="500">

3) Calcuate the sum of X2 values obtained in step 2<br>
Sum = Σ(O-E)^2/E = 0.2174 + 0.2174 + 0.0676 + 0.0676 + 0.1471 + 0.1471 = 0.8642

4) dof_rows = (2-1) = 1, dof_col = (3-1) = 2<br>
dof = dof_rows * dof_cols = 1 * 2 = 2

5) The p-value associated with X2 = 0.8642 and dof 2 degrees of freedom is 0.649198


In [72]:
from scipy.stats import chi2_contingency

In [76]:
# H0: Gender and political party preference are independent.
# H1: Gender and political party preference are not independent
data = [[120,90,40],[110,95,45]]
alpha=0.05
chi_stat,p,dof,expected = chi2_contingency(data)
print('chi_stat',chi_stat)
print('p',p)
print('dof',dof)
print('expected\n',expected)
if p<alpha:
    print('Ho is rejected')
else:
    print('Ho is accepted')

chi_stat 0.8640353908896108
p 0.6491978887380976
dof 2
expected
 [[115.   92.5  42.5]
 [115.   92.5  42.5]]
Ho is accepted


Since this p-value is not less than 0.05, we fail to reject the null hypothesis. This means we do not have sufficient evidence to say that there is an association between gender and political party preference.

#### Q2) Test whether Gender and Like Shopping are independent. 
Gender =  ['F','F','F','M','M','F','F','M','M']<br>
Like Shopping = ['Y','N','Y','N','N','N','Y','Y','Y']

In [99]:
df =  pd.DataFrame({'Gender': ['F','F','F','M','M','F','F','M','M','F','M'],
                    'Like Shopping':['Y','N','Y','N','N','N','Y','Y','Y','Y','N']})
df.head()

Unnamed: 0,Gender,Like Shopping
0,F,Y
1,F,N
2,F,Y
3,M,N
4,M,N


In [100]:
# Ho: Gender and Like Shopping are independent.
# H1: Gender and Like Shopping are not independent.
contingency_table = pd.crosstab(df["Gender"],df["Like Shopping"])
contingency_table

Like Shopping,N,Y
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
F,2,4
M,3,2


In [101]:
chi_stat,p,dof,expected = chi2_contingency(contingency_table)
print('chi_stat',chi_stat)
print('p',p)
print('dof',dof)
print('expected\n',expected)
if p<alpha:
    print('Ho is rejected')
else:
    print('Ho is accepted')

chi_stat 0.07638888888888876
p 0.782252069887464
dof 1
expected
 [[2.72727273 3.27272727]
 [2.27272727 2.72727273]]
Ho is accepted
