In [1]:
import math
import scipy.stats as stats

The `Sample` class contains the `calculate_confidence_interval`method which calculates the 95% confidence interval for the mean (µ) of a sample, assuming the sample is normally distributed and that the population variance is unknown. It uses the following steps:

1. **Calculate the sample mean (x̄) and the sample standard deviation (s):**

    $$
    \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}
    $$

    $$
    s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}}
    $$

   where $x_i$ are the individual measurements, and $n$ is the sample size.

2. **Find the t-value that corresponds to the desired level of confidence (95% in this case) and the degrees of freedom (n-1):**

   For a 95% confidence interval and $n-1$ degrees of freedom, you can look up the t-value in a t-table or use a t-distribution calculator. The two-tailed t-value for a 95% confidence interval with $n-1$ degrees of freedom is denoted as $t_{n-1, 0.975}$.

3. **Calculate the standard error (SE):**

    $$
    SE = \frac{s}{\sqrt{n}}
    $$

4. **Calculate the margin of error (ME):**

    $$
    ME = t_{n-1, 0.975} \cdot SE
    $$

5. **Calculate the 95% confidence interval for the mean:**

    $$
    \text{Lower limit: } \bar{x} - ME
    $$

    $$
    \text{Upper limit: } \bar{x} + ME
    $$

The 95% confidence interval for the mean µ is given by the range between the lower and upper limits. This means that you can be 95% confident that the true population mean lies within this interval.

In [26]:
class Sample:
    def __init__(self, data, population_variance=None):
        self.data = data
        self.mean = sum(data) / len(data)
        self.std_dev = math.sqrt(sum((x - self.mean) ** 2 for x in data) / (len(data) - 1))
        self.sample_size = len(data)
        self.degrees_of_freedom = self.sample_size - 1
        self.population_variance = population_variance

    def confidence_interval(self, confidence=0.95):
        t_value = stats.t.ppf((1 + confidence) / 2, self.degrees_of_freedom)
        standard_error = self.std_dev / math.sqrt(self.sample_size)
        margin_of_error = t_value * standard_error
        lower_limit = self.mean - margin_of_error
        upper_limit = self.mean + margin_of_error
        
        print(f"The 95% confidence interval for the population mean is: ({lower_limit:.4f}, {upper_limit:.4f})")

        return lower_limit, upper_limit
    
    def confidence_interval_known_var(self, confidence=0.95):
        if self.population_variance is None:
            raise ValueError("Population variance must be provided to use this method.")

        z_value = stats.norm.ppf((1 + confidence) / 2)
        standard_error = math.sqrt(self.population_variance) / math.sqrt(self.sample_size)
        margin_of_error = z_value * standard_error
        lower_limit = self.mean - margin_of_error
        upper_limit = self.mean + margin_of_error
        
        print(f"The 95% confidence interval for the population mean is: ({lower_limit:.4f}, {upper_limit:.4f})")

        return lower_limit, upper_limit

When calculating the confidence interval for the population mean (µ), the method you use depends on whether the population variance (σ²) is known or unknown. The reason for this difference lies in the type of distribution used to estimate the population mean.

1. **Known population variance (σ²):**

   When the population variance is known, we can use the standard normal (Z) distribution to calculate the confidence interval. This is because, under the assumption of normality, the sample mean (x̄) follows a normal distribution with mean µ and variance σ²/n, where n is the sample size. The standard error (SE) is calculated using the known population variance:

   $$
   SE = \frac{\sigma}{\sqrt{n}}
   $$

   The margin of error (ME) is calculated using the Z-score corresponding to the desired confidence level (e.g., 1.96 for a 95% confidence interval):

   $$
   ME = Z_{\alpha/2} \cdot SE
   $$

   Finally, the confidence interval is calculated as:

   $$
   \text{Lower limit: } \bar{x} - ME
   $$

   $$
   \text{Upper limit: } \bar{x} + ME
   $$

In [31]:
# Example usage:
data = [13, 17, 15, 23, 27, 29, 18, 27, 20, 24]
population_variance = 5

known_var_sample = Sample(data, population_variance)
lower_limit, upper_limit = known_var_sample.confidence_interval_known_var()

The 95% confidence interval for the population mean estimate is: (19.9141, 22.6859)


2. **Unknown population variance:**

   When the population variance is unknown, we use the sample standard deviation (s) as an estimate of the population standard deviation (σ). In this case, we cannot use the standard normal (Z) distribution directly, because the sample standard deviation is also an estimate with its own variability. Instead, we use the t-distribution, which accounts for the additional uncertainty introduced by using the sample standard deviation.

   The t-distribution is a family of distributions indexed by the degrees of freedom (n-1), where n is the sample size. As the sample size increases, the t-distribution approaches the standard normal distribution.

   The standard error (SE) is calculated using the sample standard deviation:

   $$
   SE = \frac{s}{\sqrt{n}}
   $$

   The margin of error (ME) is calculated using the t-score corresponding to the desired confidence level and degrees of freedom:

   $$
   ME = t_{n-1, \alpha/2} \cdot SE
   $$

   Finally, the confidence interval is calculated as:

   $$
   \text{Lower limit: } \bar{x} - ME
   $$

   $$
   \text{Upper limit: } \bar{x} + ME
   $$

In summary, when the population variance is known, we can use the standard normal (Z) distribution to calculate the confidence interval for µ, as the sample mean follows a normal distribution. When the population variance is unknown, we use the t-distribution to account for the uncertainty introduced by estimating the population standard deviation with the sample standard deviation.

In [32]:
# Example usage:
data = [13, 17, 15, 23, 27, 29, 18, 27, 20, 24]


sample = Sample(data)
lower_limit, upper_limit = sample.confidence_interval()

The 95% confidence interval for the population mean estimate is: (17.3522, 25.2478)
