In [2]:
import scipy.stats as stats;
import pandas as pd
import numpy as np
from statsmodels.stats.weightstats import ztest;

I. Problem Statement

Background:
Bombay hospitality Ltd. operates a franchise model for producing exotic Norwegian dinners throughout New England. The operating cost for a franchise in a week (W) is given by the equation W = $1,000 + $5X, where X represents the number of units produced in a week. Recent feedback from restaurant owners suggests that this cost model may no longer be accurate, as their observed weekly operating costs are higher.
Objective:
To investigate the restaurant owners' claim about the increase in weekly operating costs using hypothesis testing.
Data Provided:
        ●	The theoretical weekly operating cost model: W = $1,000 + $5X
        ●	Sample of 25 restaurants with a mean weekly cost of Rs. 3,050
        ●	Number of units produced in a week (X) follows a normal distribution with a mean (μ) of 600 units and a standard deviation (σ) of 25 units
Assignment Tasks:
1. State the Hypotheses statement:
2. Calculate the Test Statistic:
Use the following formula to calculate the test statistic(z):
where:
        ●	ˉxˉ = sample mean weekly cost (Rs. 3,050)
        ●	μ = theoretical mean weekly cost according to the cost model (W = $1,000 + $5X for X = 600 units)
        ●	σ = 5*25 units
        ●	n = sample size (25 restaurants)
3. Determine the Probability and compare:
Using the alpha level of 5% (α = 0.05),
4. Make a Decision:
Compare the test statistic with the critical value to decide whether to reject the null hypothesis.
5. Conclusion:
Based on the decision in step 4, conclude whether there is strong evidence to support the restaurant owners' claim that the weekly operating costs are higher than the model suggests.

Submission Guidelines:
        ●	Prepare a python file detailing each step of your hypothesis testing process.
        ●	Include calculations for the test statistic and the critical value.
        ●	Provide a clear conclusion based on your analysis.



In [None]:
'''
1. Stating the Hypothesis Statement

Ho = Weekly costs are not higher than what is given by the cost model as claimed by the restaurant owner
Ha = Weekly costs is indeed higher.
'''

In [None]:
'''
2. Calculating the Test Statistics
'''

# Given
x_bar = 3050 
pop_mean = 4000 # W = 1000 + 5X (X = 600)
pop_std_dev = 125 
n = 25
s = pop_std_dev/np.sqrt(n)
alpha = 0.05
CI = 0.95
critical_area = CI + alpha/2
print("Critical Area =",critical_area)

zscore = (x_bar-pop_mean)/s
print("z score =", zscore)

Critical Area = 0.975
z score = -38.0


In [None]:
'''
3. Conclusion
'''

zcritical = stats.norm.ppf(critical_area)
print("z critical =", zcritical)
print("z score =", zscore)
print("Significant Levels = ", -1*zcritical,zcritical)
print("Reject Null Hypothesis" if zcritical > zscore else "Fail to Reject Null Hypothesis")


sample_data = stats.norm.rvs(size=n, loc=x_bar, scale=s)
zscore2 = ztest(x1=sample_data, value=pop_mean)
print(zscore2)


'''
Since the z-critical is 1.959 and the z-score is -38.0, and z-score < z-critical, we can comfortably rule to reject the Null Hypothesis on the Confidence Interval of 95%. And based on this calculation we can confirm that the restaurant owner was indeed right about the restaurant's weekly costs exceeding the Cost Model's outputs
'''

z critical = 1.959963984540054
z score = -38.0
Significant Levels =  -1.959963984540054 1.959963984540054
Reject Null Hypothesis
(np.float64(-186.71685182856456), np.float64(0.0))


"\nSince the z-critical is 1.959 and the z-score is -38.0, and z-score < z-critical, we can comfortably rule to reject the Null Hypothesis. And based on this calculation we can confirm that the restaurant owner was indeed right about the restaurant's weekly costs exceeding the Cost Model's outputs\n"

II. Problem Statement
Background:
In quality control processes, especially when dealing with high-value items, destructive sampling is a necessary but costly method to ensure product quality. The test to determine whether an item meets the quality standards destroys the item, leading to the requirement of small sample sizes due to cost constraints.
Scenario

A manufacturer of print-heads for personal computers is interested in estimating the mean durability of their print-heads in terms of the number of characters printed before failure. To assess this, the manufacturer conducts a study on a small sample of print-heads due to the destructive nature of the testing process.
Data

A total of 15 print-heads were randomly selected and tested until failure. The durability of each print-head (in millions of characters) was recorded as follows:
1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29
Assignment Tasks

a. Build 99% Confidence Interval Using Sample Standard Deviation
Assuming the sample is representative of the population, construct a 99% confidence interval for the mean number of characters printed before the print-head fails using the sample standard deviation. Explain the steps you take and the rationale behind using the t-distribution for this task.

b. Build 99% Confidence Interval Using Known Population Standard Deviation
If it were known that the population standard deviation is 0.2 million characters, construct a 99% confidence interval for the mean number of characters printed before failure.



In [55]:
data = [1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29]
n = len(data) #15
degree_of_freedom = n - 1

# Finding the sample mean by simply calculating the average of the data given  
x_bar = sum(data[i] for i in range(n)) / n

# Finding the standard deviation of the sample, using the formulae
s  = np.sqrt(sum((data[i] - x_bar)**2 for i in range(n)) / (n - 1))

CI = 0.99
# With Confidence Interval 99%, the critical area would be 99.5 %
critical_area = CI + (1 - CI)/2

In [56]:
# Chossing to calculate the lower and upper limit of the population mean using the t-test, since the sample size is less than 30, and we also don't know the population standard deviation. 
tcritical = stats.t.ppf(critical_area, degree_of_freedom)
margin_of_error = tcritical * (s/np.sqrt(n))

print(tcritical)
print(margin_of_error)

lower_mean = x_bar - margin_of_error
upper_mean = x_bar + margin_of_error

print(lower_mean)
print(upper_mean)

2.976842734370834
0.1484693282281759
1.0901973384384906
1.3871359948948425


In [57]:
# Given
pop_std_dev = 0.2

# Calculating the new sample standard deviation based on the population standard deviation. 
s = pop_std_dev/np.sqrt(n)

zcritical = stats.norm.ppf(critical_area)

# Finding the margin or error for the z-test.
margin_of_error = zcritical * pop_std_dev 


lower_mean = x_bar - margin_of_error
upper_mean = x_bar + margin_of_error


print(zcritical)
print(lower_mean)
print(upper_mean)

2.5758293035489004
0.7235008059568865
1.7538325273764466
