#### About

> Confidence interval for means and proportions

Confidence intervals are a statistical tool used to estimate the range of values within which a population parameter is likely to fall with a certain level of confidence. In particular, confidence intervals for means and proportions are used to estimate the range of values within which the true mean or proportion of a population is likely to fall based on a sample.

- Confidence intervals for means:
Suppose we want to estimate the average weight of all apples produced by a certain farm, but it is not feasible to measure the weight of all apples. We can randomly select a sample of apples from the farm and compute the sample mean weight. However, the sample mean weight is likely to differ from the true population mean weight. We can construct a confidence interval to estimate the range of values within which the true population mean weight is likely to fall.


In [1]:
import numpy as np
from scipy import stats

#random sample of apple weights 
np.random.seed(42)
sample = np.random.normal(loc=150, scale=10, size=30)

# sample mean and standard deviation
sample_mean = np.mean(sample)
sample_std = np.std(sample, ddof=1)

# Compute the 95% confidence interval for the true population mean weight
z = stats.t.ppf(0.975, df=len(sample)-1)
lower = sample_mean - z * (sample_std / np.sqrt(len(sample)))
upper = sample_mean + z * (sample_std / np.sqrt(len(sample)))
print("95% Confidence Interval: [{:.2f}, {:.2f}]".format(lower, upper))

95% Confidence Interval: [144.76, 151.48]


- Confidence intervals for proportions: Suppose we want to estimate the proportion of customers who are satisfied with a certain product, but it is not feasible to survey all customers. We can randomly select a sample of customers and compute the sample proportion of satisfied customers. However, the sample proportion is likely to differ from the true population proportion. We can construct a confidence interval to estimate the range of values within which the true population proportion is likely to fall.

In [2]:
import numpy as np
from statsmodels.stats.proportion import proportion_confint

In [5]:
# Generate a random sample of customer satisfaction (0 = not satisfied, 1 = satisfied)
np.random.seed(42)
sample = np.random.binomial(n=1, p=0.8, size=100)#a random sample of 100 customer satisfaction ratings using a binomial distribution with a probability of 0.8 

In [6]:
# Compute the sample proportion of satisfied customers
sample_prop = np.mean(sample)

# Compute the 95% confidence interval for the true population proportion of satisfied customers
ci = proportion_confint(count=np.sum(sample), nobs=len(sample), alpha=0.05)
print("95% Confidence Interval: [{:.2f}, {:.2f}]".format(ci[0], ci[1]))

95% Confidence Interval: [0.74, 0.90]
