
# Background
In quality control processes, especially when dealing with high-value items, destructive sampling is a necessary but costly method to ensure product quality. The test to determine whether an item meets the quality standards destroys the item, leading to the requirement of small sample sizes due to cost constraints.

# Scenario
A manufacturer of print-heads for personal computers is interested in estimating the mean durability of their print-heads in terms of the number of characters printed before failure. To assess this, the manufacturer conducts a study on a small sample of print-heads due to the destructive nature of the testing process.

# Data
A total of 15 print-heads were randomly selected and tested until failure. The durability of each print-head (in millions of characters) was recorded as follows:
1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29


In [11]:
import pandas as pd
import numpy as np
import math
from scipy import stats

We have 15 numbers representing millions of characters printed

In [12]:
data =[1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29]

In [13]:
n = len(data)
n

15

In [14]:
sample_mean = np.mean(data)

print(f'sample population is:', n)
print(f'the mean of the population is:',sample_mean)

sample population is: 15
the mean of the population is: 1.2386666666666666


# a. Build 99% Confidence Interval Using Sample Standard Deviation
Assuming the sample is representative of the population, construct a 99% confidence interval for the mean number of characters printed before the print-head fails using the sample standard deviation. Explain the steps you take and the rationale behind using the t-distribution for this task.


If a sample size is less than 30 (n<30), it's considered a small sample, meaning we should use the t-distribution (t-test) instead of the normal distribution (z-test) for statistical inference because t distribution has more uncertainity which the t-distribution accounts for.

In [15]:
# calculate sample std dev
sample_std =np.std(data, ddof=1)
print(f'sample_std:', sample_std)

sample_std: 0.19316412956959936


In [16]:
confidence = 0.99 #confidence interval
alpha = 1- confidence
deg_free = n - 1  #degree of freedom
t_score = stats.t.ppf(1-alpha/2,deg_free)

In [17]:
t_score

2.97684273411266

 confirm the t score values from t distribution table for two tailed test.
 degrees of freedom is 14 while significance level is 0.1  which is 2.977

 we now calculate margin error using : t_score * std_dev/sqaure_root(n)

In [24]:
#calculate margin error
margin_err = t_score * (sample_std/math.sqrt(n))

#calculate  intervals
lower_t_score= sample_mean - margin_err
upper_t_score= sample_mean  +  margin_err

print('margin error:',margin_err)
print('lower_t_score:',lower_t_score)
print('upper_t_score:',upper_t_score)
print(f'the true average is between {lower_t_score:.4f} to {upper_t_score:.4f}')

margin error: 0.1484693282152996
lower_t_score: 1.090197338451367
upper_t_score: 1.3871359948819662
the true average is between 1.0902 to 1.3871


we are 99% confident that the true mean  lies between  lower_t_score and upper_t_score that is 1.0902 to 1.3871 respectively

# b. Build 99% Confidence Interval Using Known Population Standard Deviation
If it were known that the population standard deviation is 0.2 million characters, construct a 99% confidence interval for the mean number of characters printed before failure.


so we know that the population standard deviation is exactly 0.2. so we will be using z distribution (standard normal disrtibution)

In [25]:
# known standard deviation
sigma = 0.2

z_score =stats.norm.ppf(1-alpha/2) # CI : 99%  so 1-CI=0.01 which is 0.005 for each tail

margin_error_z =z_score *(sigma/math.sqrt(n))

# intervals

lower_z = sample_mean - margin_error_z
upper_z = sample_mean + margin_error_z

print(' population standard dev:', sigma)
print('z_score:', z_score)
print(' margin_error_z:',margin_error_z)
print(f'confidence interval (z_distribution):{lower_z: .4f} to {upper_z: 0.4f}')

 population standard dev: 0.2
z_score: 2.5758293035489004
 margin_error_z: 0.13301525327090588
confidence interval (z_distribution): 1.1057 to  1.3717


with known standard deviation the interval is slightly narrower: 1.1057 to 1.3717

# conclusion :

t-distribution gave us 1.09 to 1.39.

Z-distribution gave us 1.11 to 1.37.

which shows that the t-distribution is wider than the z-distribution. so just knowing the standard deviation gave us more certain results comapared to normal t-distribution which covers more range due to uncertainty.

We are 99% confident that the true mean number of characters printed before failure lies between 1.1057 and 1.3717 million characters.