In [None]:
"""
The Chi-square test is a statistical method used to determine if there's a significant association between two categorical variables.
It compares the expected frequencies (values you would expect to get if there were no relationship) 
with the observed frequencies (the actual values you observe).

Steps for performing the Chi-square test:
State the hypothesis:

Null hypothesis (𝐻0): There is no association between the variables.
Alternative hypothesis (𝐻1): There is an association between the variables.
Create a contingency table (a table showing the frequencies of different categories).

Calculate the expected frequencies using the formula:

E = (row total×column total)/grand total
​
 
Calculate the Chi-square statistic:

x^2=∑ (O−E)^2 /E

where:
O = Observed frequency
E = Expected frequency
Compare the calculated Chi-square value with the critical value from the Chi-square distribution table 
based on the significance level and degrees of freedom.
"""

In [None]:
"""
Example 1:

A fair die is rolled 120 times and the following results are obtained:

Face 1: 22 times
Face 2: 17 times
Face 3: 20 times
Face 4: 26 times
Face 5: 22 times
Face 6: 13 times

Test at a 5% level of significance whether the die is fair.
"""

In [1]:
import numpy as np
import scipy.stats as st

In [2]:
# H0 -> die is fair
# Ha -> die is not fair
ob = np.array([22,17,20,26,22,13])
ex = np.array([20,20,20,20,20,20])
df = 6 - 1
#from chi table 
chi_table = 11.04

In [4]:
chi = np.sum(np.square(ob-ex)/ex)
chi

5.1000000000000005

In [None]:
"""
chi_table > chi
so H0 is true 
"""

![](2.png)

In [28]:
# H0 -> no association
# Ha -> diff association 
# df = (row-1)(col-1)

df = (2-1)*(4-1)
#from chi table 
chi1_table = 7.815

In [17]:
row1 = np.array([40,45,25,10])
row2 = np.array([35,30,20,30])
sum_r1 = np.sum(row1)
sum_r2 = np.sum(row2)
sum_col = row1 + row2
sum_row = np.array([sum_r1,sum_r2])
sum_r1,sum_r2,sum_col,sum_row

(120, 115, array([75, 75, 45, 40]), array([120, 115]))

In [20]:
exp = []
for i in sum_row:
    for j in sum_col:
        val = (i*j)/235
        exp.append(val)

exp

[38.297872340425535,
 38.297872340425535,
 22.97872340425532,
 20.425531914893618,
 36.702127659574465,
 36.702127659574465,
 22.02127659574468,
 19.574468085106382]

In [22]:
obs = np.array([40,45,25,10,35,30,20,30])

In [27]:
chi1 = np.sum((np.square(obs - exp))/exp)
chi1

13.788747987117553

In [None]:
"""
chi1 > chi1_table 
So Ha is true
"""