# Chi Square Test
## Chi Square Independence Test 
#### "used to determine whether or not there is a significant association between two categorical variables

>Chi-Square Test of Independence uses the following null and alternative hypotheses:
>
>- H0: (null hypothesis) The two variables are independent.
>- H1: (alternative hypothesis) The two variables are not independent.
>
>Degrees of freedom: (calculated as #rows-1 * #columns-1)

Criteria
>if p-value greater than or equal to 0.05 is not statistically significant and suggests strong support for the null hypothesis and alternative hypothesis is rejected.

### Example 1

![data](./Images/chi_ind_data1.png)

In [12]:
#importing packages
import numpy as np
import scipy.stats as stats

#creating above data in python code
data = [[120, 90, 40],
        [110, 95, 45]]

#perform the Chi-Square Test of Independence
# chi_square_statistic, p_value,*_ = stats.chi2_contingency(data)
result = stats.chi2_contingency(data)
chi_square_statistic = result[0]
p_value = result[1]
dof = result[2]
expected = np.array(result[3])


# chi square test statistic and p value
print('chi_square_test_statistic is : ' + str(chi_square_statistic))
print('p_value : ' + str(round(p_value,2)))
print('degree of freedom : ' + str(dof))
print('expected array : ' + str(expected))


chi_square_test_statistic is : 0.8640353908896108
p_value : 0.65
degree of freedom : 2
expected array : [[115.   92.5  42.5]
 [115.   92.5  42.5]]


NOTE:
>H0: (The two variables are independent) is accepted because p_value : 0.6491978887380976 is greater then alpha value: 0.05, expected values for data can be find through chi square independence test showed above

### Example 2

![data](./Images/chi_ind_data2.png)

In [13]:
import pandas as pd
import scipy.stats as stats
import numpy as np

# Create a contingency table (also known as a cross tabulation)
dct = {
    'Gryffindor': [79, 82],
    'Hufflepuff': [122, 130],
    'Ravenclaw': [204, 240],
    'Slytherin': [74, 69],
}
crosstab = pd.DataFrame(dct, index=['No', 'Yes'])
# print(crosstab)

#perform the Chi-Square Test of Independence
chi2, pval, dof, expected = stats.chi2_contingency(crosstab)

print(f'χ² = {chi2:.3f}, p = {pval:.2f}, degrees of freedom = {dof}')
print(expected)

χ² = 1.643, p = 0.65, degrees of freedom = 3
[[ 77.119 120.708 212.676  68.497]
 [ 83.881 131.292 231.324  74.503]]


NOTE:
>H0: (The two variables are independent) is accepted because p_value : 0.6491978887380976 is greater then alpha value: 0.05, expected values for data can be find through chi square independence test showed above