<a href="https://colab.research.google.com/github/Lokeshgadhi/introonpandas/blob/main/basic_stats_assessment_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# prompt: Background
# In quality control processes, especially when dealing with high-value items, destructive sampling is a necessary but costly method to ensure product quality. The test to determine whether an item meets the quality standards destroys the item, leading to the requirement of small sample sizes due to cost constraints.
# Scenario
# A manufacturer of print-heads for personal computers is interested in estimating the mean durability of their print-heads in terms of the number of characters printed before failure. To assess this, the manufacturer conducts a study on a small sample of print-heads due to the destructive nature of the testing process.
# Data
# A total of 15 print-heads were randomly selected and tested until failure. The durability of each print-head (in millions of characters) was recorded as follows:
# 1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29

import numpy as np
from scipy import stats

# Data of print-head durability (in millions of characters)
data = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29])

# Sample size
n = len(data)

# Calculate the sample mean
sample_mean = np.mean(data)

# Calculate the sample standard deviation
sample_std = np.std(data, ddof=1)  # ddof=1 for sample standard deviation

# Degrees of freedom
degrees_of_freedom = n - 1

# Confidence level
confidence_level = 0.95

# Calculate the t-critical value for a two-tailed test
t_critical = stats.t.ppf((1 + confidence_level) / 2, degrees_of_freedom)

# Calculate the margin of error
margin_of_error = t_critical * (sample_std / np.sqrt(n))

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the results
print(f"Sample Mean: {sample_mean:.4f} million characters")
print(f"Sample Standard Deviation: {sample_std:.4f} million characters")
print(f"Degrees of Freedom: {degrees_of_freedom}")
print(f"t-critical value: {t_critical:.4f}")
print(f"Margin of Error: {margin_of_error:.4f} million characters")
print(f"95% Confidence Interval for the Mean Durability: ({confidence_interval[0]:.4f}, {confidence_interval[1]:.4f}) million characters")


Sample Mean: 1.2387 million characters
Sample Standard Deviation: 0.1932 million characters
Degrees of Freedom: 14
t-critical value: 2.1448
Margin of Error: 0.1070 million characters
95% Confidence Interval for the Mean Durability: (1.1317, 1.3456) million characters


In [2]:
# prompt: a. Build 99% Confidence Interval Using Sample Standard Deviation
# Assuming the sample is representative of the population, construct a 99% confidence interval for the mean number of characters printed before the print-head fails using the sample standard deviation. Explain the steps you take and the rationale behind using the t-distribution for this task.

import numpy as np
from scipy import stats

# Data of print-head durability (in millions of characters)
data = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29])

# Sample size
n = len(data)

# Calculate the sample mean
sample_mean = np.mean(data)

# Calculate the sample standard deviation
sample_std = np.std(data, ddof=1)  # ddof=1 for sample standard deviation

# Degrees of freedom
degrees_of_freedom = n - 1

# Confidence level for 99%
confidence_level = 0.99

# Calculate the t-critical value for a two-tailed test at 99% confidence
t_critical = stats.t.ppf((1 + confidence_level) / 2, degrees_of_freedom)

# Calculate the margin of error
margin_of_error = t_critical * (sample_std / np.sqrt(n))

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the results
print(f"Sample Mean: {sample_mean:.4f} million characters")
print(f"Sample Standard Deviation: {sample_std:.4f} million characters")
print(f"Degrees of Freedom: {degrees_of_freedom}")
print(f"t-critical value (99%): {t_critical:.4f}")
print(f"Margin of Error (99%): {margin_of_error:.4f} million characters")
print(f"99% Confidence Interval for the Mean Durability: ({confidence_interval[0]:.4f}, {confidence_interval[1]:.4f}) million characters")

# Explanation:
print("\nExplanation:")
print("1. **Why t-distribution?**")
print("   - We use the t-distribution because we are estimating the population mean and the population standard deviation is unknown. Instead we are using the sample standard deviation.")
print("   - The t-distribution is more appropriate than the normal distribution when the sample size is small (n < 30) or when the population standard deviation is unknown.")
print("   - The t-distribution accounts for the added uncertainty that comes from estimating the standard deviation from the sample data.")
print("2. **Steps:**")
print("   - **Calculate Sample Statistics:** We compute the sample mean and the sample standard deviation from the given data.")
print("   - **Determine Degrees of Freedom:** Degrees of freedom are calculated as n-1, where n is the sample size.")
print("   - **Find t-critical value:** We find the t-critical value associated with a 99% confidence level and the calculated degrees of freedom, this is for a two tailed test.")
print("   - **Compute Margin of Error:**  This is calculated by multiplying the t-critical value by the standard error of the mean (sample standard deviation divided by the square root of the sample size).")
print("   - **Construct Confidence Interval:** The 99% confidence interval is created by subtracting and adding the margin of error to the sample mean.")
print("3. **Interpretation**")
print("    - With 99% confidence, we estimate the true mean print-head durability (in millions of characters) lies within the range calculated above.")


Sample Mean: 1.2387 million characters
Sample Standard Deviation: 0.1932 million characters
Degrees of Freedom: 14
t-critical value (99%): 2.9768
Margin of Error (99%): 0.1485 million characters
99% Confidence Interval for the Mean Durability: (1.0902, 1.3871) million characters

Explanation:
1. **Why t-distribution?**
   - We use the t-distribution because we are estimating the population mean and the population standard deviation is unknown. Instead we are using the sample standard deviation.
   - The t-distribution is more appropriate than the normal distribution when the sample size is small (n < 30) or when the population standard deviation is unknown.
   - The t-distribution accounts for the added uncertainty that comes from estimating the standard deviation from the sample data.
2. **Steps:**
   - **Calculate Sample Statistics:** We compute the sample mean and the sample standard deviation from the given data.
   - **Determine Degrees of Freedom:** Degrees of freedom are calcula

In [3]:
# prompt: . Build 99% Confidence Interval Using Known Population Standard Deviation
# If it were known that the population standard deviation is 0.2 million characters, construct a 99% confidence interval for the mean number of characters printed before failure.

import numpy as np
from scipy import stats

# Data of print-head durability (in millions of characters)
data = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29])

# Sample size
n = len(data)

# Calculate the sample mean
sample_mean = np.mean(data)

# Known population standard deviation
population_std = 0.2

# Confidence level
confidence_level = 0.99

# Calculate the z-critical value for a two-tailed test
z_critical = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_critical * (population_std / np.sqrt(n))

# Calculate the confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

# Print the results
print(f"Sample Mean: {sample_mean:.4f} million characters")
print(f"Population Standard Deviation: {population_std:.4f} million characters")
print(f"z-critical value (99%): {z_critical:.4f}")
print(f"Margin of Error (99%): {margin_of_error:.4f} million characters")
print(f"99% Confidence Interval for the Mean Durability: ({confidence_interval[0]:.4f}, {confidence_interval[1]:.4f}) million characters")

# Explanation:
print("\nExplanation:")
print("1. **Why z-distribution?**")
print("   - We use the z-distribution because we are estimating the population mean and the population standard deviation is known.")
print("   - The z-distribution is appropriate when the population standard deviation is known, regardless of sample size.")
print("2. **Steps:**")
print("   - **Calculate Sample Statistics:** We compute the sample mean from the given data.")
print("   - **Determine z-critical value:** We find the z-critical value associated with a 99% confidence level, this is for a two-tailed test.")
print("   - **Compute Margin of Error:**  This is calculated by multiplying the z-critical value by the standard error of the mean (population standard deviation divided by the square root of the sample size).")
print("   - **Construct Confidence Interval:** The 99% confidence interval is created by subtracting and adding the margin of error to the sample mean.")
print("3. **Interpretation**")
print("    - With 99% confidence, we estimate the true mean print-head durability (in millions of characters) lies within the range calculated above.")


Sample Mean: 1.2387 million characters
Population Standard Deviation: 0.2000 million characters
z-critical value (99%): 2.5758
Margin of Error (99%): 0.1330 million characters
99% Confidence Interval for the Mean Durability: (1.1057, 1.3717) million characters

Explanation:
1. **Why z-distribution?**
   - We use the z-distribution because we are estimating the population mean and the population standard deviation is known.
   - The z-distribution is appropriate when the population standard deviation is known, regardless of sample size.
2. **Steps:**
   - **Calculate Sample Statistics:** We compute the sample mean from the given data.
   - **Determine z-critical value:** We find the z-critical value associated with a 99% confidence level, this is for a two-tailed test.
   - **Compute Margin of Error:**  This is calculated by multiplying the z-critical value by the standard error of the mean (population standard deviation divided by the square root of the sample size).
   - **Construct C