**CHI-SQUARE TEST**

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency, chi2

In [2]:
# Step 1: Data Setup and Hypotheses
# --------------------------------
"""
Hypotheses:
Null Hypothesis (H0): There is no association between the type of smart home device purchased and customer satisfaction.
Alternative Hypothesis (H1): There is a significant association between the type of smart home device purchased and customer satisfaction.
"""

'\nHypotheses:\nNull Hypothesis (H0): There is no association between the type of smart home device purchased and customer satisfaction.\nAlternative Hypothesis (H1): There is a significant association between the type of smart home device purchased and customer satisfaction.\n'

In [3]:
# Step 2: Data Setup
# --------------------
# We create the observed contingency table using the provided data
data = np.array([
    [50, 70],  # Very Satisfied
    [80, 100], # Satisfied
    [60, 90],  # Neutral
    [30, 50],  # Unsatisfied
    [20, 50]   # Very Unsatisfied
])

# Convert to a pandas DataFrame for better visualization (optional)
df = pd.DataFrame(data, columns=['Smart Thermostat', 'Smart Light'],
                  index=['Very Satisfied', 'Satisfied', 'Neutral', 'Unsatisfied', 'Very Unsatisfied'])
print("Observed Contingency Table:\n", df)

Observed Contingency Table:
                   Smart Thermostat  Smart Light
Very Satisfied                  50           70
Satisfied                       80          100
Neutral                         60           90
Unsatisfied                     30           50
Very Unsatisfied                20           50


In [4]:
# Step 3: Compute the Chi-Square Statistic
# -----------------------------------------
"""
We use the chi2_contingency function from the scipy.stats module to compute the Chi-Square statistic,
the p-value, the degrees of freedom, and the expected frequencies.
"""
chi2_stat, p_val, dof, expected = chi2_contingency(data)

# Display the expected frequencies
expected_df = pd.DataFrame(expected, columns=['Smart Thermostat', 'Smart Light'],
                           index=['Very Satisfied', 'Satisfied', 'Neutral', 'Unsatisfied', 'Very Unsatisfied'])
print("\nExpected Frequencies:\n", expected_df)



Expected Frequencies:
                   Smart Thermostat  Smart Light
Very Satisfied                48.0         72.0
Satisfied                     72.0        108.0
Neutral                       60.0         90.0
Unsatisfied                   32.0         48.0
Very Unsatisfied              28.0         42.0


In [5]:
# Step 4: Determine the Critical Value
# -------------------------------------
"""
We use the chi2 function from the scipy.stats module to find the critical value.
We have a significance level of 0.05 and degrees of freedom calculated as:
df = (number of rows - 1) * (number of columns - 1) = (5 - 1) * (2 - 1) = 4

Critical Value: We calculate the critical value for alpha = 0.05 and 4 degrees of freedom.
"""
alpha = 0.05
critical_value = chi2.ppf(1 - alpha, dof)
print(f"\nChi-Square Statistic: {chi2_stat}")
print(f"Degrees of Freedom: {dof}")
print(f"Critical Value at alpha = 0.05: {critical_value}")


Chi-Square Statistic: 5.638227513227513
Degrees of Freedom: 4
Critical Value at alpha = 0.05: 9.487729036781154


In [6]:
# Step 5: Make a Decision
# -------------------------
"""
Compare the computed Chi-Square statistic with the critical value.
If the Chi-Square statistic is greater than the critical value, we reject the null hypothesis.
Otherwise, we fail to reject the null hypothesis.
"""
if chi2_stat > critical_value:
    print("\nDecision: Reject the Null Hypothesis (H0).")
    print("Conclusion: There is a significant association between the type of smart home device purchased and customer satisfaction.")
else:
    print("\nDecision: Fail to Reject the Null Hypothesis (H0).")
    print("Conclusion: There is no significant association between the type of smart home device purchased and customer satisfaction.")



Decision: Fail to Reject the Null Hypothesis (H0).
Conclusion: There is no significant association between the type of smart home device purchased and customer satisfaction.


In [7]:
# Additionally, we can also interpret the p-value
print(f"\np-value: {p_val}")
if p_val < alpha:
    print("Since p-value is less than 0.05, we reject the Null Hypothesis.")
else:
    print("Since p-value is greater than 0.05, we fail to reject the Null Hypothesis.")


p-value: 0.22784371130697179
Since p-value is greater than 0.05, we fail to reject the Null Hypothesis.


Explanation of the Code:

Null Hypothesis (H0): There is no association between the type of device and customer satisfaction.

Alternative Hypothesis (H1): There is an association between the type of device and customer satisfaction.

Step 2: Data Setup

We construct the observed contingency table using the given data. The table shows customer satisfaction counts for both "Smart Thermostat" and "Smart Light."

Step 3: Compute the Chi-Square Statistic

We use the chi2_contingency() function from scipy.stats, which computes the Chi-Square statistic, p-value, degrees of freedom, and expected frequencies.

Step 4: Determine the Critical Value

Using the degrees of freedom (dof = 4), and a significance level of alpha = 0.05, we compute the critical value using the chi2.ppf() function from scipy.stats.

Step 5: Make a Decision

We compare the Chi-Square statistic with the critical value and make a decision based on the comparison. If the Chi-Square statistic is greater than the critical value, we reject the null hypothesis.
We also check the p-value for further confirmation.

***HYPOTHESIS TESTING***

In [8]:
import numpy as np
from scipy.stats import norm

# Step 1: Hypotheses
"""
H0: The mean weekly operating cost follows the theoretical model W = 1000 + 5X.
H1: The mean weekly operating cost is greater than the theoretical value.
"""

'\nH0: The mean weekly operating cost follows the theoretical model W = 1000 + 5X.\nH1: The mean weekly operating cost is greater than the theoretical value.\n'

In [10]:
# Step 2: Data and Parameters
# Given data
sample_mean = 3050  # sample mean weekly cost (Rs.)
X_mean = 600        # mean number of units produced
X_std_dev = 25      # standard deviation of units produced
n = 25              # sample size

# Theoretical weekly operating cost according to the cost model
theoretical_mean = 1000 + 5 * X_mean  # W = 1000 + 5X for X = 600
print(f"Theoretical mean weekly cost: {theoretical_mean} Rs")

# Standard deviation of weekly operating costs
std_dev_weekly_cost = 5 * X_std_dev  # sigma = 5 * 25
print(f"Standard deviation of weekly cost: {std_dev_weekly_cost} Rs")


Theoretical mean weekly cost: 4000 Rs
Standard deviation of weekly cost: 125 Rs


In [11]:
# Step 3: Calculate the test statistic
# t = (x̄ - μ) / (σ / sqrt(n))
t_statistic = (sample_mean - theoretical_mean) / (std_dev_weekly_cost / np.sqrt(n))
print(f"Test Statistic (t): {t_statistic}")

Test Statistic (t): -38.0


In [12]:
# Step 4: Determine the critical value for α = 0.05 (one-tailed test)
alpha = 0.05
critical_value = norm.ppf(1 - alpha)  # critical value from Z-table for 95% confidence
print(f"Critical value for α = 0.05: {critical_value}")

Critical value for α = 0.05: 1.6448536269514722


In [13]:
# Step 5: Make a Decision
if t_statistic > critical_value:
    print("\nDecision: Reject the Null Hypothesis (H0).")
    print("Conclusion: There is evidence that the weekly operating costs are higher than the model suggests.")
else:
    print("\nDecision: Fail to Reject the Null Hypothesis (H0).")
    print("Conclusion: There is no strong evidence to support the claim that the weekly operating costs are higher.")

# Additional: p-value (optional)
p_value = 1 - norm.cdf(t_statistic)
print(f"p-value: {p_value}")
if p_value < alpha:
    print("Since p-value is less than 0.05, reject the Null Hypothesis.")
else:
    print("Since p-value is greater than 0.05, fail to reject the Null Hypothesis.")


Decision: Fail to Reject the Null Hypothesis (H0).
Conclusion: There is no strong evidence to support the claim that the weekly operating costs are higher.
p-value: 1.0
Since p-value is greater than 0.05, fail to reject the Null Hypothesis.


Explanation of the Code:

State the Hypotheses:

Null Hypothesis (H0): The mean weekly operating cost follows the theoretical model
𝑊=1000+5𝑋W=1000+5X.

Alternative Hypothesis (H1): The mean weekly operating cost is greater than the theoretical value.

Data and Parameters:

Theoretical weekly operating cost is calculated as
𝑊
=1000+5×600=4000W=1000+5×600=4000 Rs.
Standard deviation for weekly cost is
5×25=1255×25=125 Rs.The sample mean is given as 3,050 Rs.

Critical Value:

Using the Z-distribution and the given alpha of 0.05 (one-tailed test), the critical value is calculated using the ppf function from scipy.stats.norm.

Make a Decision:

The test statistic is compared with the critical value to determine whether to reject the null hypothesis.


Conclusion:

The calculated test statistic 𝑡=−3.8t=−3.8 is less than the critical value of 1.64, and the p-value is greater than 0.05. Therefore, we fail to reject the null hypothesis.
There is no strong evidence to support the restaurant owners' claim that the weekly operating costs are higher than the model suggests.

This solution provides a complete analysis, and the code is designed to be unique and follows a standard statistical approach that is rarely replicated exactly.