# Hypothesis Testing


Background:
Bombay hospitality Ltd. operates a franchise model for producing exotic Norwegian dinners throughout New England. The operating cost for a franchise in a week (W) is given by the equation W = $1,000 + $5X, where X represents the number of units produced in a week. Recent feedback from restaurant owners suggests that this cost model may no longer be accurate, as their observed weekly operating costs are higher.
Objective:
To investigate the restaurant owners' claim about the increase in weekly operating costs using hypothesis testing.
Data Provided:
•	The theoretical weekly operating cost model: W = $1,000 + $5X
•	Sample of 25 restaurants with a mean weekly cost of Rs. 3,050
•	Number of units produced in a week (X) follows a normal distribution with a mean (μ) of 600 units and a standard deviation (σ) of 25 units
Assignment Tasks:
1. State the Hypotheses statement:
2. Calculate the Test Statistic:
Use the following formula to calculate the test statistic (t):
where:
•	ˉxˉ = sample mean weekly cost (Rs. 3,050)
•	μ = theoretical mean weekly cost according to the cost model (W = $1,000 + $5X for X = 600 units)
•	σ = 5*25 units
•	n = sample size (25 restaurants)
3. Determine the Critical Value:
Using the alpha level of 5% (α = 0.05), determine the critical value from the standard normal (Z) distribution table.
4. Make a Decision:
Compare the test statistic with the critical value to decide whether to reject the null hypothesis.
5. Conclusion:
Based on the decision in step 4, conclude whether there is strong evidence to support the restaurant owners' claim that the weekly operating costs are higher than the model suggests.

Submission Guidelines:
•	Prepare python file detailing each step of your hypothesis testing process.
•	Include calculations for the test statistic and the critical value.
•	Provide a clear conclusion based on your analysis.



In [3]:
# First we have to import necessary libraries
import numpy as np
import scipy.stats as stats

In [4]:
#### STEP 1: Defining hypotheses
# H0: mu = 4000
# H1: mu > 4000

In [5]:
#### STEP 2: Calculating test stats
# Sample mean (x)
sample_mean = 3050

In [6]:
sample_mean

3050

In [7]:
# Theoretical mean according to the model
mu = 1000 + 5* 600        # theoretical mean cost
# mu = 4000

# Standard deviation 
sigma = 5 * 25            # cost per unit times the number of units
# sample size (n)
n = 25

# Calculating test stats (t)
test_statistic = (sample_mean - mu) / (sigma/np.sqrt(n))

In [8]:
mu

4000

In [9]:
sigma

125

In [10]:
test_statistic

-38.0

In [11]:
#### STEP 3: Determine the critical value
alpha = 0.05
critical_value = stats.norm.ppf(1 - alpha)    # z aplha for one-tailed test

In [12]:
critical_value

1.6448536269514722

In [13]:
#### STEP 4: Make a decision
if test_statistic < -critical_value:
    decision = "Reject the null hypothesis"
else:
    decision = "Fail to reject the null hypothesis"


In [82]:
decision

'Fail to reject the null hypothesis'

#### STEP 5: Conclusion
if decision == "Reject the null hypothesis":
    conclusion = "There is strong evidence to support the claims that weekly operating costs are higher"
else:
    conclusion = "There is not enough evidence to support the claim that weekly operating costs are higher."

#Output Results 
print("Test Statistics:", test_statistic)
print("Critical Value:", critical_value)
print("Decision:", decision)
print("Conclusion:", conclusion)

#### Conclusion

In [90]:
#There is no evidence that weekly operating costs have increased. Instead, the data suggests that actual costs are lower than expected.

##### Interpretation of Results

In [None]:
#The test statistic Z=−38 is extremely low, meaning the actual observed costs are much lower than expected. 
#The critical value is 1.645, and since −38 is far below this threshold, we fail to reject 𝐻0

# Chi-Square Test

Association between Device Type and Customer Satisfaction

Background:
Mizzare Corporation has collected data on customer satisfaction levels for two types of smart home devices: Smart Thermostats and Smart Lights. They want to determine if there's a significant association between the type of device purchased and the customer's satisfaction level.
Data Provided:
The data is summarized in a contingency table showing the counts of customers in each satisfaction level for both types of devices:
Satisfaction	Smart Thermostat	Smart Light	Total
Very Satisfied	50	70	120
Satisfied	80	100	180
Neutral	60	90	150
Unsatisfied	30	50	80
Very Unsatisfied	20	50	70
Total	240	360	600

Objective:                                                                                                        
To use the Chi-Square test for independence to determine if there's a significant association between the type of smart home device purchased (Smart Thermostats vs. Smart Lights) and the customer satisfaction level.Assignment Tasks:
1. State the Hypotheses:
2. Compute the Chi-Square Statistic:
3. Determine the Critical Value:
Using the significance level (alpha) of 0.05 and the degrees of freedom (which is the number of categories minus 1)
4. Make a Decision:
Compare the Chi-Square statistic with the critical value to decide whether to reject the null hypothesis.
Submission Guidelines:
•	Provide a detailed report of your analysis, including each step outlined in the assignment tasks in a python file.
•	Include all calculations, the Chi-Square statistic, the critical value, and your conclusion.


In [17]:
import numpy as np
import scipy.stats as stats

In [18]:
##### Step 1: State the Hypotheses
# Null Hypothesis (H₀): There is no association between device type and customer satisfaction level.
# Alternative Hypothesis (H₁): There is a significant association between device type and customer satisfaction.

In [19]:
# Step 2: Compute the Chi-Square Statistic

# Given data: Observed frequencies
observed = np.array([[50, 70],        # Very Satisfied
                     [80, 100],       # Satisfied
                     [60, 90],        # Neutral
                     [30, 50],        # Unsatisfied
                     [20, 50]])       # Very Unsatisfied

In [84]:
observed

array([[ 50,  70],
       [ 80, 100],
       [ 60,  90],
       [ 30,  50],
       [ 20,  50]])

In [20]:
#Now dooing Row totals and column totals
row_totals = np.sum(observed, axis=1)
col_totals = np.sum(observed, axis=0)
grand_total = np.sum(observed)

In [21]:
row_totals

array([120, 180, 150,  80,  70])

In [22]:
col_totals

array([240, 360])

In [23]:
grand_total

600

In [24]:
# Now we have to Calculate expected frequencies
expected = np.outer(row_totals, col_totals) / grand_total

In [25]:
expected

array([[ 48.,  72.],
       [ 72., 108.],
       [ 60.,  90.],
       [ 32.,  48.],
       [ 28.,  42.]])

In [26]:
# Here we have to Calculate chi-square statistic
chi_square_stat = np.sum((observed - expected)**2 / expected)

In [27]:
chi_square_stat

5.638227513227513

In [28]:
####Step 3: Determine the Critical Value
# Degrees of freedom (df) = (number of rows - 1) * (number of columns - 1)
df = (observed.shape[0] - 1) * (observed.shape[1] - 1)

In [29]:
df

4

In [30]:
# Significance level (alpha) = 0.05
alpha = 0.05
critical_value = stats.chi2.ppf(1 - alpha, df)

In [31]:
critical_value

9.487729036781154

In [32]:
# p-value for the chi-square statistic
p_value = 1 - stats.chi2.cdf(chi_square_stat, df)

In [33]:
p_value

0.22784371130697179

In [34]:
##### Step 4: Make a Decision
if chi_square_stat > critical_value:
    decision = "Reject the null hypothesis"
else:
    decision = "Fail to reject the null hypothesis"

In [86]:
decision

'Fail to reject the null hypothesis'

In [35]:
# Display the results
print(f"Chi-Square Statistic: {chi_square_stat:.2f}")
print(f"Critical Value: {critical_value:.2f}")
print(f"p-value: {p_value:.2f}")
print(f"Decision: {decision}")

Chi-Square Statistic: 5.64
Critical Value: 9.49
p-value: 0.23
Decision: Fail to reject the null hypothesis


#### Steps Explained:

##### State the Hypotheses:

In [38]:
# Null Hypothesis (H₀): No association between device type and customer satisfaction.
# Alternative Hypothesis (H₁): There is an association between device type and customer satisfaction.

##### Compute the Chi-Square Statistic:

In [40]:
# We input the observed data in a contingency table and calculate the expected frequencies based on row and column totals.

##### Determine the Critical Value:

In [42]:
# Using the degrees of freedom and significance level (α = 0.05), the critical value is determined from a Chi-Square distribution table.

##### Conclusion

In [44]:
# Final Conclusion: The observed data does not provide statistical evidence that the restaurant owners' claim about increased costs is valid.

#### Report summary

In [95]:
#Since our calculated 𝜒2 value (5.6389) is less than the critical value (9.488), we fail to reject 𝐻0. 
#This means that there is not enough statistical evidence to conclude that customer satisfaction is dependent on the device type.
#In simpler terms, the type of smart home device (Smart Thermostat vs. Smart Light) does not significantly affect customer satisfaction.
