# Chi-Square test

Objective:
To use the Chi-Square test for independence to determine if there's a significant association between the type of smart home device purchased (Smart Thermostats vs. Smart Lights) and the customer satisfaction level.



# Hypothesis
H0 (Null)- there is no association between the device type and customer satisfaction. means they are independent.

H1 (Alternate) - There is an assoaciation between the device type and the customer satisfaction. means they are dependant.

In [30]:
import numpy as np
import pandas as pd

data = np.array([[50,70],
                 [80,100],
                 [60,90],
                 [30,50],
                 [20,50]]) # data without labels

In [31]:
row_totals = data.sum(axis=1) # axis=1 for calculating row
col_totals = data.sum(axis=0) # axis=0 for column total
grand_total = data.sum()
print('row totals:',row_totals)
print('column total:',col_totals)
print('grand total:',grand_total)

row totals: [120 180 150  80  70]
column total: [240 360]
grand total: 600


# calculating the expected freq
# formula - E= (row_total * column_total)/ grand total


In [32]:
Expected = np.zeros(data.shape) # placeholder empty table/arr, an empty array wont work here cause there is no shape

In [33]:
for i in range(data.shape[0]):
    for j in range(data.shape[1]):
        Expected[i][j] = (row_totals[i] * col_totals[j]) / grand_total
        
Expected

array([[ 48.,  72.],
       [ 72., 108.],
       [ 60.,  90.],
       [ 32.,  48.],
       [ 28.,  42.]])

# computing chi square stats

formula for the Chi-Square statistic is:

Ï‡**2 = summation of  [(O - E)**2 /E]

O- Observed frequency (actual data collected).

E- Expected frequency (calculated under the null hypothesis)

In [34]:
chi_square = 0

for i in range(data.shape[0]):
    for j in range(data.shape[1]):
        chi_square += (data[i][j] - Expected[i][j])**2 / Expected[i][j]
chi_square

5.638227513227513

# calculating the Degree of freedom

In [35]:
# degree of freedom
rows = data.shape[0]
print(rows)
cols = data.shape[1]
print(cols)

5
2


In [36]:
dof = (rows-1) * (cols-1) # degrees of freedom
dof

4

# Determining the critical value from the chi-square table

In [38]:
critical_value = 9.488  # as per the table for ddof =4 and Alpha/critical value = 0.05

# making a  Decision based on values

In [39]:
if chi_square > critical_value:
    print('Reject the Null or H0 Hypothesis')
    
else:
    print('Fail to reject the Null Hypothesis and Alternate hypothesis must be True')

Fail to reject the Null Hypothesis and Alternate hypothesis must be True


# conclusion

since the chi-square value is greater than the critical value we reject the null hypothesis. This indicates that there is significant association between device types and the satisfaction of customers.