# CHI-SQUARE TEST




# **1.State the Hypothesis:**
**The Chi-square test of independence is a hypothesis test so it has a null (H0) and an alternative hypothesis (H1):**

**H0 : the variables are independent, there is no relationship between the two categorical variables. Knowing the value of one variable does not help to predict the value of the other variable**

**H1 : the variables are dependent, there is a relationship between the two categorical variables. Knowing the value of one variable helps to predict the value of the other variable.**

# **2.Compute the chi-square statistic:**

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats
from scipy.stats import chi2_contingency

In [2]:
data = {
    'Satisfaction': ['Very Satisfied','Satisfied','Neutral','Unsatisfied','Very Unsatisfied','Total'],
    'Smart Thermostat': [50,80,60,30,20,240],
    'Smart Light': [70,100,90,50,50,360],
    'Total': [120,180,150,80,70,600]
}
contingency_table = pd.DataFrame(data)
contingency_table

Unnamed: 0,Satisfaction,Smart Thermostat,Smart Light,Total
0,Very Satisfied,50,70,120
1,Satisfied,80,100,180
2,Neutral,60,90,150
3,Unsatisfied,30,50,80
4,Very Unsatisfied,20,50,70
5,Total,240,360,600


In [3]:
observed = np.array([[50,70],
                     [80,100],
                     [60,90],
                     [30,50],
                     [20,50]])
observed

array([[ 50,  70],
       [ 80, 100],
       [ 60,  90],
       [ 30,  50],
       [ 20,  50]])

In [4]:
expected = chi2_contingency(observed)
expected

Chi2ContingencyResult(statistic=5.638227513227513, pvalue=0.22784371130697179, dof=4, expected_freq=array([[ 48.,  72.],
       [ 72., 108.],
       [ 60.,  90.],
       [ 32.,  48.],
       [ 28.,  42.]]))

In [5]:
expected = np.array([
       [48.,  72.],
       [ 72., 108.],
       [ 60.,  90.],
       [ 32.,  48.],
       [ 28.,  42.]])
expected

array([[ 48.,  72.],
       [ 72., 108.],
       [ 60.,  90.],
       [ 32.,  48.],
       [ 28.,  42.]])

In [6]:
chi_squared_stats = (((observed - expected)**2)/expected).sum().sum()
chi_squared_stats

5.638227513227513

# **3.Determine the Critical Value:**

**Using the significance level(alpha) of 0.05 and the degrees of freedom(which is the number of categories minus 1)**

In [7]:
critical_value = stats.chi2.ppf(q = 0.95, # Find the critical value for 95% confidence
                      df = 4)

print("Critical value: ",critical_value)

Critical value:  9.487729036781154


# **4.Make a Decision:**

**Compare the chi-square statistic with the critical value to decide whether to reject the null hypothesis.**

In [9]:
if chi_squared_stats > critical_value:
    print("Reject the null hypothesis")
else:
    print("Accept the null hypothesis")

Accept the null hypothesis
