***
**Computing Confidence Intervals in Python**
***

Question:
Ten randomly selected tablets, of a dosage form with labelled potency 100mg, were assayed according to a quality control specification, goving the results below (in mg):

101.8, 102.6, 99.8, 104.9, 103.8, 104.5, 100.7, 106.3, 100.6, 105.0

- Calculate a 95% confidence interval for the mean potency of the batch from which the sample came

Some notes
***
- We took a small sample from the population
- We have the potency results of the sample
- The mean potency changes from sample to sample, so we can't precisely tell what the mean potency of the population is from the small sample we have.
- Because we don't have access to the entire population, we use the sample that we have to estimate what the potency of the entire population is
- The Confidence Interval helps with this, as it provides an estimate within which the population mean potency will lie for (confidence level) (95% in this case), of the samples taken
***
Because our sample contains only 10 values, we shall use the t-interval instead of the z-interval. z-interval is used typically when the sample size is greater than 20 or 30

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

In [4]:
sample = np.array([101.8, 102.6, 99.8, 104.9, 103.8, 104.5, 100.7, 106.3, 100.6, 105.0])

In [6]:
alpha = 0.95 # 95% confidence level
df = len(sample) - 1 # degrees of freedom = n -1 (sample size -1)
sample_mean = np.mean(sample)
sample_std = np.std(sample)
n = len(sample)

# standard error
standard_error = sample_std / np.sqrt(n)  # formula for standard error = sample std / sqrt(n)

# now we compute the 95% confidence interval of the population mean:
confidence_interval = stats.t.interval(alpha,df,loc=sample_mean,scale=standard_error)


In [7]:
confidence_interval

(101.49468780412786, 104.50531219587214)

**Interpretation of this result**: If we took many samples of size n, and found the confidence interval in each case, 95% (our confidence level) of these intervals (101.49 to 104.5) will contain the true value of the population mean.

so 95% of the time, the true population mean lies between 101.49 and 104.5

**What if we took the 99% confidence interval:**

In [8]:
confidence_interval2 = stats.t.interval(0.99, df, loc = sample_mean, scale = standard_error)

In [9]:
confidence_interval2

(100.83745481716875, 105.16254518283125)

With an increase in the confidence level, the confidence interval becomes wider. Meaning the certainty reduces.