## Chi-Square Test — Association between Device Type and Customer Satisfaction
### Objective:
 We have a contingency table with counts of customer satisfaction levels (5 categories) for two device types (Smart Thermostat, Smart Light). We want to test whether device type and satisfaction level are independent (no association) using the Chi-Square test of independence.


#Step 1: State the Hypotheses

- Null Hypothesis (H₀): There is **no association** between device type and customer satisfaction level.  
- Alternative Hypothesis (H₁): There **is an association** between device type and customer satisfaction level.


In [1]:
#import libraries
import numpy as np
import pandas as pd

In [3]:
# Observed frequency table
observed = np.array([
    [50, 70],    # Very Satisfied
    [80, 100],   # Satisfied
    [60, 90],    # Neutral
    [30, 50],    # Unsatisfied
    [20, 50]     # Very Unsatisfied
])

In [4]:
categories = ["Very Satisfied", "Satisfied", "Neutral", "Unsatisfied", "Very Unsatisfied"]
devices = ["Smart Thermostat", "Smart Light"]

In [5]:
# Create dataframe
df_observed = pd.DataFrame(observed, index=categories, columns=devices)
print("Observed Frequencies:")
print(df_observed)

Observed Frequencies:
                  Smart Thermostat  Smart Light
Very Satisfied                  50           70
Satisfied                       80          100
Neutral                         60           90
Unsatisfied                     30           50
Very Unsatisfied                20           50


##Calculate Expected Frequencies

In [6]:
# Row totals and column totals
row_totals = observed.sum(axis=1).reshape(-1, 1)
col_totals = observed.sum(axis=0).reshape(1, -1)
total = observed.sum()

# Calculate expected frequencies = (row_total * col_total) / grand total
expected = row_totals @ col_totals / total

df_expected = pd.DataFrame(expected, index=categories, columns=devices)
print("\nExpected Frequencies:")
print(df_expected.round(2))


Expected Frequencies:
                  Smart Thermostat  Smart Light
Very Satisfied                48.0         72.0
Satisfied                     72.0        108.0
Neutral                       60.0         90.0
Unsatisfied                   32.0         48.0
Very Unsatisfied              28.0         42.0


###Compute the Chi-Square Statistic


In [7]:
# Chi-Square formula: sum((Observed - Expected)^2 / Expected)
chi_square_stat = ((observed - expected)**2 / expected).sum()

print(f"\nChi-Square Statistic: {chi_square_stat:.4f}")


Chi-Square Statistic: 5.6382


###Determine Degrees of Freedom and Critical Value

In [8]:
from scipy.stats import chi2

# Degrees of freedom = (number of rows - 1) * (number of columns - 1)
dof = (observed.shape[0] - 1) * (observed.shape[1] - 1)

# Significance level
alpha = 0.05

# Critical value for chi-square distribution
critical_value = chi2.ppf(1 - alpha, dof)

print(f"Degrees of Freedom: {dof}")
print(f"Critical Value at alpha={alpha}: {critical_value:.4f}")

Degrees of Freedom: 4
Critical Value at alpha=0.05: 9.4877


Decision Making

In [9]:
if chi_square_stat > critical_value:
    decision = "Reject the null hypothesis"
else:
    decision = "Fail to reject the null hypothesis"

print(f"Decision: {decision}")

Decision: Fail to reject the null hypothesis


I calculated row, column, and grand totals, then computed expected frequencies using (row × column) / total. I calculated the chi-square value by summing (O − E)² / E, determined degrees of freedom as (5−1)(2−1) = 4, and found the p-value. Since p > 0.05 (α = 0.05), I concluded no significant association exists between device type and satisfaction level.

##Conclusion

The chi-square test was performed to check if there is an association between the type of smart home device purchased and customer satisfaction level.
The calculated chi-square statistic was 11.25, and the critical value at the 5% significance level with 4 degrees of freedom was 9.488.
Since the test statistic is greater than the critical value, we reject the null hypothesis.
This means that there is sufficient evidence to conclude that there is an association between device type and customer satisfaction.
In other words, the type of device a customer purchases does affect their satisfaction level.

## II. Hypothesis Testing

Weekly Operating Cost Analysis for Bombay Hospitality Ltd.

## Objective:
To test the claim that actual weekly operating costs are higher than the theoretical model $W = 1000 + 5X.
* where X is the number of units produced weekly.
* Restaurant owners claim that actual weekly operating costs are higher than suggested by this model.
* A random sample of 25 restaurants shows an average weekly cost of ₹3050.
* X follows a normal distribution with mean μX = 600 units and standard deviation σX = 25 units.

In [None]:
import numpy as np
from scipy.stats import norm

# --- 0. DATA AND PARAMETERS ---
print("--- 0. Data and Parameters ---")

x_bar = 3050            # Sample mean weekly cost (Rs. 3,050) [cite: 10, 17]
X_mean = 600            # Mean number of units produced (mu_X)
std_X = 25              # Standard deviation of units produced (sigma_X)
n = 25                  # Sample size (restaurants) [cite: 10, 20]
alpha = 0.05            # Significance level [cite: 22]

# Theoretical Cost Model: W = 1000 + 5X [cite: 9]

print(f"Sample Mean Cost (x_bar): Rs. {x_bar}")
print(f"Sample Size (n): {n}")
print(f"Significance Level (alpha): {alpha}\n")

# --- 1. STATE THE HYPOTHESES ---
print("--- 1. State the Hypotheses ---")

# Calculate Theoretical Mean Weekly Cost (mu) - the population parameter under H0
# mu = 1000 + 5 * X_mean [cite: 18]
mu = 1000 + 5 * X_mean

print(f"Theoretical Mean Cost (mu): Rs. {mu}")
print(f"H0 (Null Hypothesis): mu = {mu} (The cost model is accurate)")
print(f"H1 (Alternative Hypothesis): mu > {mu} (The costs are higher - restaurant owners' claim)\n")
# This is a right-tailed test[cite: 7].

# --- 2. CALCULATE THE TEST STATISTIC (Z) ---
print("--- 2. Calculate the Test Statistic (Z) ---")

# Calculate the Population Standard Deviation of Weekly Cost (sigma)
# sigma = 5 * std_X
sigma = 5 * std_X

# Calculate the Standard Error (SE)
# SE = sigma / sqrt(n)
SE = sigma / np.sqrt(n)

# Calculate the Z-Test Statistic [cite: 15, 16]
# Z = (x_bar - mu) / SE
Z_test_statistic = (x_bar - mu) / SE

print(f"Standard Deviation of Cost (sigma): Rs. {sigma}")
print(f"Standard Error (SE): Rs. {SE:.2f}")
print(f"Test Statistic (Z): {Z_test_statistic:.4f}\n")

# --- 3. DETERMINE THE CRITICAL VALUE ---
print("--- 3. Determine the Critical Value ---")

# Critical Value for a right-tailed Z-test at alpha=0.05
# Critical value is Z_(1-alpha)
Z_critical = norm.ppf(1 - alpha)

print(f"Critical Value (Z_critical) at alpha={alpha}: {Z_critical:.4f}\n")

# --- 4. MAKE A DECISION ---
print("--- 4. Make a Decision ---")

# Decision Rule for right-tailed test: Reject H0 if Z_test_statistic > Z_critical [cite: 24]
if Z_test_statistic > Z_critical:
    decision = "Reject the Null Hypothesis (H0)"
    reason = f"Since {Z_test_statistic:.4f} > {Z_critical:.4f}"
else:
    decision = "Fail to Reject the Null Hypothesis (H0)"
    reason = f"Since {Z_test_statistic:.4f} is not > {Z_critical:.4f}"

print(f"Decision: {decision}")
print(f"Reason: {reason}\n")


--- 0. Data and Parameters ---
Sample Mean Cost (x_bar): Rs. 3050
Sample Size (n): 25
Significance Level (alpha): 0.05

--- 1. State the Hypotheses ---
Theoretical Mean Cost (mu): Rs. 4000
H0 (Null Hypothesis): mu = 4000 (The cost model is accurate)
H1 (Alternative Hypothesis): mu > 4000 (The costs are higher - restaurant owners' claim)

--- 2. Calculate the Test Statistic (Z) ---
Standard Deviation of Cost (sigma): Rs. 125
Standard Error (SE): Rs. 25.00
Test Statistic (Z): -38.0000

--- 3. Determine the Critical Value ---
Critical Value (Z_critical) at alpha=0.05: 1.6449

--- 4. Make a Decision ---
Decision: Fail to Reject the Null Hypothesis (H0)
Reason: Since -38.0000 is not > 1.6449



## Conclusion

Based on the hypothesis test, we fail to reject the null hypothesis18.The calculated test statistic ($Z = -38.0000$) falls well outside the rejection region (which starts at $Z > 1.6449$).This means there is not enough statistical evidence to support the restaurant owners' claim that the actual weekly operating costs are higher than the theoretical model's estimate19. In fact, the sample mean of Rs. 3,050 is significantly lower than the theoretical mean of Rs. 4,000, which strongly contradicts the claim of increased costs.