# CHI-SQUARE TEST

## Association between Device Type and Customer Satisfaction
Background:
Mizzare Corporation has collected data on customer satisfaction levels for two types of smart home devices: Smart Thermostats and Smart Lights. They want to determine if there's a significant association between the type of device purchased and the customer's satisfaction level.


In [1]:
import numpy as np
import pandas as pd
from scipy.stats import chi2, norm

## Objective:
To use the Chi-Square test for independence to determine if there's a significant association between the type of smart home device purchased (Smart Thermostats vs. Smart Lights) and the customer satisfaction level.


In [2]:
obs = np.array([
    [50, 70],   # Very Satisfied
    [80, 100],  # Satisfied
    [60, 90],   # Neutral
    [30, 50],   # Unsatisfied
    [20, 50]    # Very Unsatisfied
])

rows = ["Very Satisfied","Satisfied","Neutral","Unsatisfied","Very Unsatisfied"]
cols = ["Smart Thermostat","Smart Light"]

In [3]:
df_obs = pd.DataFrame(obs, index=rows, columns=cols)
print("Observed contingency table:\n")
print(df_obs)

Observed contingency table:

                  Smart Thermostat  Smart Light
Very Satisfied                  50           70
Satisfied                       80          100
Neutral                         60           90
Unsatisfied                     30           50
Very Unsatisfied                20           50


In [4]:
# Totals
total = obs.sum()
row_sums = obs.sum(axis=1)
col_sums = obs.sum(axis=0)
total

np.int64(600)

In [5]:
# Expected counts
expected = np.outer(row_sums, col_sums) / total
df_expected = pd.DataFrame(expected, index=rows, columns=cols)
print("\nExpected counts (under H0 of independence):\n")
print(df_expected.round(3))


Expected counts (under H0 of independence):

                  Smart Thermostat  Smart Light
Very Satisfied                48.0         72.0
Satisfied                     72.0        108.0
Neutral                       60.0         90.0
Unsatisfied                   32.0         48.0
Very Unsatisfied              28.0         42.0


Assignment Tasks:
1. State the Hypotheses:

In [6]:
 """ H0 = evidence of association
     Ha= no evidence of association """

' H0 = evidence of association\n    Ha= no evidence of association '

2. Compute the Chi-Square Statistic:

In [7]:
chi2_stat = ((obs - expected)**2 / expected).sum()
print(f"\nChi-square statistic = {chi2_stat:.4f}")


Chi-square statistic = 5.6382


3. Determine the Critical Value:
Using the significance level (alpha) of 0.05 and the degrees of freedom (which is the number of categories minus 1)

In [8]:
df = (obs.shape[0]-1) * (obs.shape[1]-1)
p_value = 1 - chi2.cdf(chi2_stat, df)
crit_val = chi2.ppf(0.95, df)
print("Degrees of freedom =", df)
print(f"p-value = {p_value:.6g}")
print(f"Critical chi-square value at alpha=0.05 = {crit_val:.4f}")

Degrees of freedom = 4
p-value = 0.227844
Critical chi-square value at alpha=0.05 = 9.4877


4. Make a Decision:
Compare the Chi-Square statistic with the critical value to decide whether to reject the null hypothesis.


In [9]:

if chi2_stat > crit_val:
    print("Decision: Reject H0 (evidence of association)\n")
else:
    print("Decision: Fail to reject H0 (no evidence of association)\n")

Decision: Fail to reject H0 (no evidence of association)



# HYPOTHESIS TESTING

Background:
Bombay hospitality Ltd. operates a franchise model for producing exotic Norwegian dinners throughout New England. The operating cost for a franchise in a week (W) is given by the equation W = $1,000 + $5X, where X represents the number of units produced in a week. Recent feedback from restaurant owners suggests that this cost model may no longer be accurate, as their observed weekly operating costs are higher.


In [10]:
# Given model: W = 1000 + 5X, where X ~ Normal(mean=600, sd=25)
mu_X = 600
sd_X = 25

Objective:
To investigate the restaurant owners' claim about the increase in weekly operating costs using hypothesis testing.


### Data Provided:
•	The theoretical weekly operating cost model: W = $1,000 + $5X


•	Sample of 25 restaurants with a mean weekly cost of Rs. 3,050


•	Number of units produced in a week (X) follows a normal distribution with a mean (μ) of 600 units and a standard deviation (σ) of 25 units


## Assignment Tasks:

1. State the Hypotheses statement:

In [11]:
""" H0 = no evidence that costs are higher
HA = evidence that costs are higher than model"""

' H0 = no evidence that costs are higher\nHA = evidence that costs are higher than model'

### 2. Calculate the Test Statistic:
Use the following formula to calculate the test statistic (t):
where:


•	ˉxˉ = sample mean weekly cost (Rs. 3,050)


•	μ = theoretical mean weekly cost according to the cost model (W = $1,000 + $5X for X = 600 units)


•	σ = 5*25 units


•	n = sample size (25 restaurants)


In [12]:
mu_W = 1000 + 5 * mu_X   # theoretical mean
sd_W = 5 * sd_X          # standard deviation

# Sample info
n = 25
xbar = 3050.0  # sample mean

se = sd_W / np.sqrt(n)   # standard error
z_stat = (xbar - mu_W) / se

print("Hypothesis Test for Weekly Operating Cost:\n")
print(f"Theoretical mean of W = {mu_W}")
print(f"Standard deviation of W = {sd_W}")
print(f"Sample size n = {n}, sample mean = {xbar}")
print(f"Standard error = {se:.3f}")
print(f"Test statistic z = {z_stat:.4f}")

Hypothesis Test for Weekly Operating Cost:

Theoretical mean of W = 4000
Standard deviation of W = 125
Sample size n = 25, sample mean = 3050.0
Standard error = 25.000
Test statistic z = -38.0000


3. Determine the Critical Value:
Using the alpha level of 5% (α = 0.05), determine the critical value from the standard normal (Z) distribution table.


In [13]:
# One-sided test (owners claim higher cost)
alpha = 0.05
z_crit = norm.ppf(1 - alpha)   # critical z (right-tailed)
p_val = 1 - norm.cdf(z_stat)   # right-tailed p-value

print(f"Critical z at alpha=0.05 = {z_crit:.4f}")
print(f"p-value = {p_val:.6g}")

Critical z at alpha=0.05 = 1.6449
p-value = 1


4. Make a Decision:
Compare the test statistic with the critical value to decide whether to reject the null hypothesis.


In [14]:
if z_stat > z_crit:
    print("Decision: Reject H0 (evidence that costs are higher than model)\n")
else:
    print("Decision: Fail to reject H0 (no evidence that costs are higher)\n")

Decision: Fail to reject H0 (no evidence that costs are higher)



5. Conclusion:
Based on the decision in step 4, conclude whether there is strong evidence to support the restaurant owners' claim that the weekly operating costs are higher than the model suggests.


In [17]:
""" From our observation, the conclusion is there is no evidence to support the restaurant owners"""

' From our observation, the conclusion is there is no evidence to support the restaurant owners'