#   Estimation And Confidence Intervals

### Background
In quality control processes, especially when dealing with high-value items, destructive sampling is a necessary but costly method to ensure product quality. The test to determine whether an item meets the quality standards destroys the item, leading to the requirement of small sample sizes due to cost constraints.
### Scenario
A manufacturer of print-heads for personal computers is interested in estimating the mean durability of their print-heads in terms of the number of characters printed before failure. To assess this, the manufacturer conducts a study on a small sample of print-heads due to the destructive nature of the testing process.
### Data
A total of 15 print-heads were randomly selected and tested until failure. The durability of each print-head (in millions of characters) was recorded as follows:
1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29


In [None]:
import numpy as np

durability_data = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29])
print("Durability data (in millions of characters):")
print(durability_data)

Durability data (in millions of characters):
[1.13 1.55 1.43 0.92 1.25 1.36 1.32 0.85 1.07 1.48 1.2  1.33 1.18 1.22
 1.29]


# Task
Analyze the provided print-head durability data to estimate the mean durability and construct a confidence interval for the mean. Summarize the findings.

## Calculate summary statistics

### Subtask:
Calculate the mean and standard deviation of the durability data.


**Reasoning**:
Calculate the mean and standard deviation of the durability data using numpy functions and print the results.



In [None]:
mean_durability = np.mean(durability_data)
std_durability = np.std(durability_data)

print(f"Mean durability: {mean_durability:.4f} million characters")
print(f"Standard deviation of durability: {std_durability:.4f} million characters")

Mean durability: 1.2387 million characters
Standard deviation of durability: 0.1866 million characters


## Construct a confidence interval

### Subtask:
Calculate a confidence interval for the mean durability of the print-heads.


**Reasoning**:
Calculate the confidence interval for the mean durability based on the provided data and previously calculated mean and standard deviation.



In [None]:
from scipy.stats import t
from math import sqrt

n = len(durability_data)
confidence_level = 0.95
alpha = 1 - confidence_level
degrees_freedom = n - 1

critical_t_value = t.ppf(1 - alpha/2, degrees_freedom)

standard_error = std_durability / sqrt(n)

margin_of_error = critical_t_value * standard_error

lower_bound = mean_durability - margin_of_error
upper_bound = mean_durability + margin_of_error

print(f"{confidence_level*100:.0f}% Confidence Interval for the Mean Durability:")
print(f"({lower_bound:.4f}, {upper_bound:.4f}) million characters")

95% Confidence Interval for the Mean Durability:
(1.1353, 1.3420) million characters


## Interpret the results

### Subtask:
Explain the meaning of the calculated statistics and the confidence interval in the context of the problem.


**Reasoning**:
Explain the meaning of the calculated statistics and the confidence interval in the context of the problem based on the previously calculated values.



In [None]:
print("Interpretation of the Results:")
print(f"\n1. Mean Durability: The calculated mean durability of {mean_durability:.4f} million characters represents the average number of characters that this sample of 15 print-heads lasted before failing. It serves as our best point estimate for the true average durability of all print-heads produced by the manufacturer.")
print(f"\n2. Standard Deviation: The standard deviation of {std_durability:.4f} million characters indicates the typical spread or variability in the durability among the print-heads in our sample. A larger standard deviation would suggest more variation in how long individual print-heads last, while a smaller one indicates that the durabilities are clustered more closely around the mean.")
print(f"\n3. Confidence Interval: We calculated a {confidence_level*100:.0f}% confidence interval for the mean durability to be ({lower_bound:.4f}, {upper_bound:.4f}) million characters.")
print(f"This means that if we were to repeat this sampling and confidence interval calculation process many times, approximately {confidence_level*100:.0f}% of the intervals constructed would contain the true mean durability of all print-heads.")
print(f"\n4. Conclusion: Based on this sample and our {confidence_level*100:.0f}% confidence interval, we are {confidence_level*100:.0f}% confident that the true average durability of the manufacturer's print-heads is between {lower_bound:.4f} million and {upper_bound:.4f} million characters. This interval provides a range of plausible values for the population mean durability, taking into account the variability in our sample data.")

Interpretation of the Results:

1. Mean Durability: The calculated mean durability of 1.2387 million characters represents the average number of characters that this sample of 15 print-heads lasted before failing. It serves as our best point estimate for the true average durability of all print-heads produced by the manufacturer.

2. Standard Deviation: The standard deviation of 0.1866 million characters indicates the typical spread or variability in the durability among the print-heads in our sample. A larger standard deviation would suggest more variation in how long individual print-heads last, while a smaller one indicates that the durabilities are clustered more closely around the mean.

3. Confidence Interval: We calculated a 95% confidence interval for the mean durability to be (1.1353, 1.3420) million characters.
This means that if we were to repeat this sampling and confidence interval calculation process many times, approximately 95% of the intervals constructed would contain

## Summary:

### Data Analysis Key Findings

*   The mean durability of the print heads is estimated to be approximately 1.2387 million characters, based on the sample data.
*   The standard deviation of the print head durability is approximately 0.1866 million characters, indicating the spread of durability values around the mean in the sample.
*   A 95% confidence interval for the true mean durability of the print heads was calculated to be (1.1353, 1.3420) million characters.

### Insights or Next Steps

*   We are 95% confident that the true average durability of the manufacturer's print heads falls within the range of 1.1353 million to 1.3420 million characters.
*   Further testing with a larger sample size could narrow the confidence interval and provide a more precise estimate of the true mean durability.


### Assignment Tasks


#### a. Build 99% Confidence Interval Using Sample Standard Deviation

In [None]:
from scipy.stats import t
from math import sqrt

n = len(durability_data)
confidence_level_99 = 0.99
alpha_99 = 1 - confidence_level_99
degrees_freedom = n - 1

critical_t_value_99 = t.ppf(1 - alpha_99/2, degrees_freedom)

standard_error = std_durability / sqrt(n)

margin_of_error_99 = critical_t_value_99 * standard_error

lower_bound_99 = mean_durability - margin_of_error_99
upper_bound_99 = mean_durability + margin_of_error_99

print(f"{confidence_level_99*100:.0f}% Confidence Interval for the Mean Durability:")
print(f"({lower_bound_99:.4f}, {upper_bound_99:.4f}) million characters")

99% Confidence Interval for the Mean Durability:
(1.0952, 1.3821) million characters


Great! I can calculate the 99% confidence interval for the mean durability. We will use the t-distribution because the sample size is small (n < 30) and the population standard deviation is unknown, so we are using the sample standard deviation as an estimate. The t-distribution accounts for the increased uncertainty when estimating the population standard deviation from a small sample.
Here's the code to calculate the 99% confidence interval:

I have calculated the 99% confidence interval. Now, I will explain the steps taken and the rationale for using the t-distribution, and then summarize the findings.




b. Build 99% Confidence Interval Using Known Population Standard Deviation

In [None]:
from scipy.stats import norm
from math import sqrt

# Known population standard deviation
population_std_dev = 0.2

n = len(durability_data)
confidence_level_99 = 0.99
alpha_99 = 1 - confidence_level_99

# For a known population standard deviation, we use the Z-distribution
critical_z_value_99 = norm.ppf(1 - alpha_99/2)

# Standard error with known population standard deviation
standard_error_known_std = population_std_dev / sqrt(n)

margin_of_error_known_std = critical_z_value_99 * standard_error_known_std

lower_bound_known_std = mean_durability - margin_of_error_known_std
upper_bound_known_std = mean_durability + margin_of_error_known_std

print(f"{confidence_level_99*100:.0f}% Confidence Interval for the Mean Durability (Known Population Std Dev):")
print(f"({lower_bound_known_std:.4f}, {upper_bound_known_std:.4f}) million characters")

99% Confidence Interval for the Mean Durability (Known Population Std Dev):
(1.1057, 1.3717) million characters


### Comparison and Interpretation of the 99% Confidence Intervals

We have calculated two different 99% confidence intervals for the mean durability of the print-heads:

1.  **Using the Sample Standard Deviation (t-distribution):** ({lower_bound_99:.4f}, {upper_bound_99:.4f}) million characters
2.  **Using a Known Population Standard Deviation (Z-distribution):** ({lower_bound_known_std:.4f}, {upper_bound_known_std:.4f}) million characters

**Comparison:**

*   The confidence interval calculated using the **sample standard deviation (t-distribution)** is slightly wider than the interval calculated using the **known population standard deviation (Z-distribution)**.
    *   Margin of Error (Sample Std Dev): {margin_of_error_99:.4f} million characters
    *   Margin of Error (Known Population Std Dev): {margin_of_error_known_std:.4f} million characters

**Interpretation:**

The difference in the width of the intervals is due to the difference in the underlying distributions used.

*   When the population standard deviation is unknown and we use the sample standard deviation as an estimate (as in the first case), we introduce additional uncertainty. The **t-distribution** accounts for this by having heavier tails than the normal distribution, resulting in a slightly larger critical value and thus a wider confidence interval. This wider interval reflects the greater uncertainty when estimating the population standard deviation from a small sample.
*   When the population standard deviation is known (as in the second case), there is less uncertainty regarding the variability of the population. The **Z-distribution** is used, which leads to a smaller critical value and a narrower confidence interval.

**Conclusion:**

Both intervals provide a range of plausible values for the true mean durability of the print-heads with 99% confidence. However, the interval based on the known population standard deviation provides a slightly more precise estimate (narrower interval) because it assumes less uncertainty about the population variability.

In a real-world scenario, it is more common not to know the population standard deviation, especially with destructive testing and small sample sizes. Therefore, the confidence interval calculated using the sample standard deviation and the t-distribution is generally a more realistic and appropriate estimate in such situations, unless there is strong historical data or theoretical reason to assume a known population standard deviation.

### Overall Summary

In this analysis, we estimated the mean durability of print-heads based on a sample of 15.

*   The sample mean durability was found to be **{mean_durability:.4f} million characters**.
*   We calculated two 99% confidence intervals for the true mean durability:
    *   Using the sample standard deviation (t-distribution): **({lower_bound_99:.4f}, {upper_bound_99:.4f}) million characters**
    *   Using a known population standard deviation of 0.2 million characters (Z-distribution): **({lower_bound_known_std:.4f}, {upper_bound_known_std:.4f}) million characters**

These confidence intervals suggest that we are 99% confident that the true average durability of the print-heads falls within these ranges. The interval calculated using the sample standard deviation is slightly wider, reflecting the greater uncertainty when the population standard deviation is unknown and estimated from a small sample.

This analysis provides valuable insights into the expected lifespan of the print-heads and can be used by the manufacturer for quality control and product improvement decisions. Further testing with a larger sample size could provide a more precise estimate of the mean durability.