# CHI-SQUARE TEST

## Association between Device Type and Customer Satisfaction

## Objective:
To use the Chi-Square test for independence to determine if there's a significant association between the type of smart home device purchased (Smart Thermostats vs. Smart Lights) and the customer satisfaction level.


In [11]:
import numpy as np
import scipy.stats as stats
import pandas as pd

# 1. State the Hypotheses

In [12]:
print("Hypotheses:")
print("H0: There is no association between the type of smart home device and customer satisfaction level.")
print("H1: There is an association between the type of smart home device and customer satisfaction level.")

Hypotheses:
H0: There is no association between the type of smart home device and customer satisfaction level.
H1: There is an association between the type of smart home device and customer satisfaction level.


# 2. Compute the Chi-Square Statistic
### The Chi-Square statistic is then computed using the formula: Σ((O-E)^2 / E), where O is the observed frequency and E is the expected frequency.

In [13]:
# Create the observed frequency table
observed = np.array([
    [50, 70],
    [80, 100],
    [60, 90],
    [30, 50],
    [20, 50]
])

In [14]:
# Create a DataFrame for better visualization
satisfaction_levels = ['Very Satisfied', 'Satisfied', 'Neutral', 'Unsatisfied', 'Very Unsatisfied']
devices = ['Smart Thermostat', 'Smart Light']
df_observed = pd.DataFrame(observed, index=satisfaction_levels, columns=devices)
df_observed['Total'] = df_observed.sum(axis=1)
df_observed.loc['Total'] = df_observed.sum()

print("\nObserved Frequency Table:")
print(df_observed)


Observed Frequency Table:
                  Smart Thermostat  Smart Light  Total
Very Satisfied                  50           70    120
Satisfied                       80          100    180
Neutral                         60           90    150
Unsatisfied                     30           50     80
Very Unsatisfied                20           50     70
Total                          240          360    600


In [15]:
# Calculate expected frequencies
total = df_observed.loc['Total', 'Total']
expected = np.outer(df_observed['Total'][:-1], df_observed.loc['Total'][devices]) / total

print("\nExpected Frequency Table:")
df_expected = pd.DataFrame(expected, index=satisfaction_levels, columns=devices)
print(df_expected)


Expected Frequency Table:
                  Smart Thermostat  Smart Light
Very Satisfied                48.0         72.0
Satisfied                     72.0        108.0
Neutral                       60.0         90.0
Unsatisfied                   32.0         48.0
Very Unsatisfied              28.0         42.0


In [16]:
# Calculate Chi-Square statistic
chi2_stat = np.sum((observed - expected)**2 / expected)

print(f"\nChi-Square Statistic: {chi2_stat:.4f}")


Chi-Square Statistic: 5.6382


# 3. Determine the Critical Value
### The critical value using the chi-square distribution with (r-1)(c-1) degrees of freedom, where r is the number of rows and c is the number of columns in the contingency table.

In [17]:
df = (len(satisfaction_levels) - 1) * (len(devices) - 1)  # degrees of freedom
alpha = 0.05
critical_value = stats.chi2.ppf(1 - alpha, df)

In [18]:
print(f"\nCritical Value:")
print(f"Degrees of freedom: {df}")
print(f"Significance level (alpha): {alpha}")
print(f"Critical value: {critical_value:.4f}")


Critical Value:
Degrees of freedom: 4
Significance level (alpha): 0.05
Critical value: 9.4877


# 4. Make a Decision
### We make a decision by comparing the Chi-Square statistic to the critical value. We also calculate the p-value for a more precise interpretation.

In [19]:
p_value = 1 - stats.chi2.cdf(chi2_stat, df)

print("\nDecision:")
if chi2_stat > critical_value:
    print("Reject the null hypothesis")
    print(f"Chi-Square statistic ({chi2_stat:.4f}) > Critical value ({critical_value:.4f})")
else:
    print("Fail to reject the null hypothesis")
    print(f"Chi-Square statistic ({chi2_stat:.4f}) <= Critical value ({critical_value:.4f})")

print(f"\nP-value: {p_value:.4f}")


Decision:
Fail to reject the null hypothesis
Chi-Square statistic (5.6382) <= Critical value (9.4877)

P-value: 0.2278


# 5. Conclusion

In [20]:
print("\nConclusion:")
if p_value < alpha:
    print("There is strong evidence to suggest a significant association between the type of smart home device and customer satisfaction level.")
else:
    print("There is not enough evidence to suggest a significant association between the type of smart home device and customer satisfaction level.")


Conclusion:
There is not enough evidence to suggest a significant association between the type of smart home device and customer satisfaction level.
