# Chi-Square Test for Association between Device Type and Customer Satisfaction


## Introduction

This analysis applies a Chi-Square test to investigate if there is a significant association between the type of device purchased (Smart Thermostats vs. Smart Lights) and customer satisfaction levels at Mizzare Corporation. 

### Background
Mizzare Corporation collected customer satisfaction data across five levels (Very Satisfied, Satisfied, Neutral, Unsatisfied, and Very Unsatisfied) for two types of smart home devices: **Smart Thermostats** and **Smart Lights**.

### Dataset
The data is presented in a contingency table, showing the counts of customers in each satisfaction level for both device types:

| Satisfaction Level   | Smart Thermostat | Smart Light | Total |
|----------------------|------------------|-------------|-------|
| Very Satisfied       | 50              | 70          | 120   |
| Satisfied            | 80              | 100         | 180   |
| Neutral              | 60              | 90          | 150   |
| Unsatisfied          | 30              | 50          | 80    |
| Very Unsatisfied     | 20              | 50          | 70    |
| **Total**            | **240**         | **360**     | **600**|

### Objective
To use the **Chi-Square test for independence** to determine whether there is a statistically significant association between the type of smart home device purchased and the level of customer satisfaction.

This analysis will follow the following steps:
1. Formulate the Hypotheses.
2. Compute the Chi-Square Statistic.
3. Determine the Critical Value and the p-value.
4. Make a Decision based on the significance level.


## Hypotheses

- **Null Hypothesis (H₀)**: There is no significant association between the type of smart home device (Smart Thermostat or Smart Light) and customer satisfaction levels. (The variables are independent.)

- **Alternative Hypothesis (H₁)**: There is a significant association between the type of smart home device and customer satisfaction levels. (The variables are dependent.)


## Methodology: Chi-Square Test for Independence

To test the hypotheses, we will use the Chi-Square statistic formula:

The Chi-Square formula is given by :
$$
\chi^2 = \sum \frac{(O - E)^2}{E}
$$



Where:
- \( O \) = Observed frequency (actual counts from the data)
- \( E \) = Expected frequency (calculated based on the assumption that there is no association between variables)

The Chi-Square test will produce:
- **Chi-Square Statistic**  $\chi^2$ : Measures the difference between observed and expected frequencies.
- **p-value**: Probability of obtaining a Chi-Square statistic at least as extreme as the observed, under the null hypothesis.
- **Degrees of Freedom (df)**: Calculated as $(r - 1) \times (c - 1)$, where \( r \) and \( c \) are the number of rows and columns in the table, respectively.
- **Critical Value**: The threshold value corresponding to the significance level  $\alpha$ = 0.05

If the p-value is less than 0.05 or the Chi-Square statistic exceeds the critical value, we reject the null hypothesis.


In [4]:
import numpy as np
from scipy.stats import chi2_contingency

# Observed data 
observed = np.array([[50, 70], 
                     [80, 100], 
                     [60, 90], 
                     [30, 50], 
                     [20, 50]])

# Chi-Square test 
chi2_stat, p_val, dof, expected = chi2_contingency(observed)


print("Chi-Square Test Results")
print(f"Chi-Square Statistic: {chi2_stat:.2f}")
print(f"p-value: {p_val:.3f}")
print(f"Degrees of Freedom: {dof}")
print("\nExpected Frequencies (if there was no association):")
print(expected)


Chi-Square Test Results
Chi-Square Statistic: 5.64
p-value: 0.228
Degrees of Freedom: 4

Expected Frequencies (if there was no association):
[[ 48.  72.]
 [ 72. 108.]
 [ 60.  90.]
 [ 32.  48.]
 [ 28.  42.]]


## Results and Interpretation

With the output of our Chi-Square test, we can now interpret the results and make a conclusion based on the p-value and the Chi-Square critical value at the 0.05 significance level.

- **Degrees of Freedom (df)**: $ (r - 1) \times (c - 1) = (5 - 1) \times (2 - 1) = 4 $
- **Significance Level**: $\alpha = 0.05$
- **Critical Value** for Chi-Square distribution with 4 degrees of freedom at $\alpha = 0.05$ can be computed as follows:


In [3]:
from scipy.stats import chi2

# Significance level and degrees of freedom
alpha = 0.05
dof = 4

# Critical value from chi-square distribution
critical_value = chi2.ppf(1 - alpha, dof)

print(f"Critical Value for Chi-Square with df={dof} at alpha=0.05: {critical_value:.2f}")


Critical Value for Chi-Square with df=4 at alpha=0.05: 9.49


## Conclusion

After performing the Chi-Square test, we find the following:

- **Chi-Square Statistic**: 5.64
- **p-value**: 0.228 (greater than 0.05)
- **Critical Value**: 9.49 (for $\alpha = 0.05 $ and 4 degrees of freedom)

### Interpretation
Since the p-value (0.228) is greater than the significance level (0.05) and the Chi-Square statistic (5.64) is less than the critical value (9.49), we **fail to reject the null hypothesis**. 

### Final Decision
We conclude that there is **no statistically significant association** between the type of device purchased (Smart Thermostat or Smart Light) and the customer satisfaction levels. Therefore, customer satisfaction does not appear to be dependent on the device type in this dataset.
