In [1]:
import math
import scipy.stats as stats

In [71]:
class Sample:
    def __init__(self, data, population_variance=None):
        self.data = data
        self.mean = sum(data) / len(data)
        self.std_dev = math.sqrt(sum((x - self.mean) ** 2 for x in data) / (len(data) - 1))
        self.sample_size = len(data)
        self.degrees_of_freedom = self.sample_size - 1
        self.population_variance = population_variance

    def confidence_interval(self, confidence=0.95):
        self.t_value = stats.t.ppf((1 + confidence) / 2, self.degrees_of_freedom)
        self.standard_error = self.std_dev / math.sqrt(self.sample_size)
        self.margin_of_error = self.t_value * self.standard_error
        self.lower_limit = self.mean - self.margin_of_error
        self.upper_limit = self.mean + self.margin_of_error
        
        return self
          
    def confidence_interval_known_var(self, confidence=0.95):
        if self.population_variance is None:
            raise ValueError("Population variance must be provided to use this method.")

        self.z_value = stats.norm.ppf((1 + confidence) / 2)
        self.standard_error = math.sqrt(self.population_variance) / math.sqrt(self.sample_size)
        self.margin_of_error = self.z_value * self.standard_error
        self.lower_limit = self.mean - self.margin_of_error
        self.upper_limit = self.mean + self.margin_of_error
        
        return self
    
    def confidence_interval_variance(self, confidence=0.95):
        self.sample_variance = self.std_dev ** 2
        self.chi_squared_lower = stats.chi2.ppf((1 - confidence) / 2, self.degrees_of_freedom)
        self.chi_squared_upper = stats.chi2.ppf((1 + confidence) / 2, self.degrees_of_freedom)
        
        self.variance_lower_limit = (self.degrees_of_freedom * self.sample_variance) / self.chi_squared_upper
        self.variance_upper_limit = (self.degrees_of_freedom * self.sample_variance) / self.chi_squared_lower

        return self
    
    def confidence_interval_proportion(self, count, confidence=0.95):
        self.proportion = count / self.sample_size
        self.z_value = stats.norm.ppf((1 + confidence) / 2)
        self.standard_error = math.sqrt((self.proportion * (1 - self.proportion)) / self.sample_size)
        self.margin_of_error = self.z_value * self.standard_error
        self.proportion_lower_limit = self.proportion - self.margin_of_error
        self.proportion_upper_limit = self.proportion + self.margin_of_error
        
        return self


---

**Calculating Confidence Interval for Population Mean (µ)**

When calculating the confidence interval for the population mean (µ), the method you use depends on whether the population variance (σ²) is known or unknown. The reason for this difference lies in the type of distribution used to estimate the population mean.


---

**1. Unknown population variance:**

In this case, we use the sample standard deviation (s) as an estimate of the population standard deviation (σ) and the t-distribution to calculate the confidence interval. The t-distribution is a family of distributions indexed by the degrees of freedom (n-1), where n is the sample size. As the sample size increases, the t-distribution approaches the standard normal distribution.

Follow these steps to calculate the confidence interval:

1. **Calculate the sample mean (x̄) and the sample standard deviation (s):**

    $$
    \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}
    $$

    $$
    s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}}
    $$

    where $x_i$ are the individual measurements, and $n$ is the sample size.

2. **Find the t-value that corresponds to the desired level of confidence (95% in this case) and the degrees of freedom (n-1):**

   For a 95% confidence interval and $n-1$ degrees of freedom, you can look up the t-value in a t-table or use a t-distribution calculator. The two-tailed t-value for a 95% confidence interval with $n-1$ degrees of freedom is denoted as $t_{n-1, 0.975}$.

3. **Calculate the standard error (SE):**

    $$
    SE = \frac{s}{\sqrt{n}}
    $$

4. **Calculate the margin of error (ME):**

    $$
    ME = t_{n-1, 0.975} \cdot SE
    $$

5. **Calculate the 95% confidence interval for the mean:**

    $$
    \text{Lower limit: } \bar{x} - ME
    $$

    $$
    \text{Upper limit: } \bar{x} + ME
    $$

The 95% confidence interval for the mean µ is given by the range between the lower and upper limits. This means that you can be 95% confident that the true population mean lies within this interval.

In [58]:
# Create a list of sample data 
data = [13, 17, 15, 23, 27, 29, 18, 27, 20, 24]

# Create a Sample object with the sample data 
sample = Sample(data)

# Create a ConfidenceInterval object using the Sample object and calculate the confidence interval
ci_object = sample.confidence_interval()

# Print the values in a readable format
print(f"Sample Mean: {ci_object.mean:.2f}")
print(f"t-value: {ci_object.t_value:.2f}")
print(f"Standard Error: {ci_object.standard_error:.2f}")
print(f"Margin of Error: {ci_object.margin_of_error:.2f}")
print(f"The 95% confidence interval for the population mean is: ({ci_object.lower_limit:.4f}, {ci_object.upper_limit:.4f})")


Sample Mean: 21.30
t-value: 2.26
Standard Error: 1.75
Margin of Error: 3.95
The 95% confidence interval for the population mean is: (17.3522, 25.2478)



---
**2. Known population variance (σ²):**

When the population variance is known, we can use the standard normal (Z) distribution to calculate the confidence interval. This is because, under the assumption of normality, the sample mean (x̄) follows a normal distribution with mean µ and variance σ²/n, where n is the sample size. 

Follow these steps to calculate the confidence interval:

1. **Calculate the sample mean (x̄):**

    $$
    \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}
    $$

    where $x_i$ are the individual measurements, and $n$ is the sample size.

2. **Calculate the standard error (SE) using the known population variance:**

    $$
    SE = \frac{\sigma}{\sqrt{n}}
    $$

3. **Calculate the margin of error (ME) using the Z-score corresponding to the desired confidence level (e.g., 1.96 for a 95% confidence interval):**
    
    $$
    ME = Z_{\alpha/2} \cdot SE
    $$
    
4. **Calculate the confidence interval for the mean:**

    $$
    \text{Lower limit: } \bar{x} - ME
    $$

    $$
    \text{Upper limit: } \bar{x} + ME
    $$

In [56]:
# Create a list of sample data and the known population variance
data = [13, 17, 15, 23, 27, 29, 18, 27, 20, 24]
population_variance = 5

# Create a Sample object with the sample data and population variance
sample = Sample(data, population_variance)

# Create a ConfidenceInterval object using the Sample object and calculate the confidence interval
ci_object = sample.confidence_interval_known_var()

# Print the values in a readable format
print(f"Sample Mean: {ci_object.mean:.2f}")
print(f"z-value: {ci_object.z_value:.2f}")
print(f"Standard Error: {ci_object.standard_error:.2f}")
print(f"Margin of Error: {ci_object.margin_of_error:.2f}")
print(f"The 95% confidence interval for the population mean is: ({ci_object.lower_limit:.4f}, {ci_object.upper_limit:.4f})")

Sample Mean: 21.30
z-value: 1.96
Standard Error: 0.71
Margin of Error: 1.39
The 95% confidence interval for the population mean is: (19.9141, 22.6859)


---

**In summary, when the population variance is known, we can use the standard normal (Z) distribution to calculate the confidence interval for µ, as the sample mean follows a normal distribution. When the population variance is unknown, we use the t-distribution to account for the uncertainty introduced by estimating the population standard deviation with the sample standard deviation. The 95% confidence interval for the mean µ is given by the range between the lower and upper limits. This means that you can be 95% confident that the true population mean lies within this interval.**


---

To calculate the confidence interval of the variance for a given sample, we use the Chi-squared (χ²) distribution. The method `confidence_interval_variance` implements the following steps:

1. **Calculate the sample variance (s²):**

    $$
    s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}
    $$

   where $x_i$ are the individual measurements, $n$ is the sample size, and $\bar{x}$ is the sample mean.

2. **Find the Chi-squared (χ²) values that correspond to the desired level of confidence (95% in this case) and the degrees of freedom (n-1):**

   For a 95% confidence interval and $n-1$ degrees of freedom, you can look up the χ² values in a Chi-squared table or use a Chi-squared distribution calculator. The two-tailed χ² values for a 95% confidence interval with $n-1$ degrees of freedom are denoted as $\chi^2_{n-1, \frac{\alpha}{2}}$ and $\chi^2_{n-1, 1 - \frac{\alpha}{2}}$.

3. **Calculate the confidence interval for the variance (σ²):**

    $$
    \text{Lower limit: } \frac{(n-1) \cdot s^2}{\chi^2_{n-1, 1 - \frac{\alpha}{2}}}
    $$

    $$
    \text{Upper limit: } \frac{(n-1) \cdot s^2}{\chi^2_{n-1, \frac{\alpha}{2}}}
    $$

The confidence interval for the population variance (σ²) is given by the range between the lower and upper limits. This means that you can be 95% confident that the true population variance lies within this interval.

In [69]:
# Create a list of sample data and the known population variance
data = [2100, 2302, 1951, 2067, 2415, 1993, 2099, 2146, 2278, 2019]

# Create a Sample object with the sample data and population variance
sample = Sample(data, population_variance)

# Create a ConfidenceInterval object using the Sample object and calculate the confidence interval for the variance
ci_object_var = sample.confidence_interval_variance()

# Print the values in a readable format
print(f"Sample Variance: {ci_object_var.sample_variance:.2f}")
print(f"Chi-squared lower value: {ci_object_var.chi_squared_lower:.2f}")
print(f"Chi-squared upper value: {ci_object_var.chi_squared_upper:.2f}")
print(f"The 95% confidence interval for the population variance is: ({ci_object_var.variance_lower_limit:.4f}, {ci_object_var.variance_upper_limit:.4f})")

Sample Variance: 22382.22
Chi-squared lower value: 2.70
Chi-squared upper value: 19.02
The 95% confidence interval for the population variance is: (10589.4159, 74596.6462)


---


When calculating the confidence interval for a proportion (p) of a binary characteristic in a sample, we can use the standard normal (Z) distribution. The steps are as follows:

1. **Calculate the sample proportion (p̂):**

    $$
    \hat{p} = \frac{\text{count}}{n}
    $$

   where `count` is the number of occurrences of the binary characteristic in the sample, and `n` is the sample size.

2. **Find the Z-value that corresponds to the desired level of confidence (95% in this case):**

   For a 95% confidence interval, you can look up the Z-value in a Z-table or use a Z-distribution calculator. The two-tailed Z-value for a 95% confidence interval is denoted as $Z_{\alpha/2}$.

3. **Calculate the standard error (SE) for the proportion:**

    $$
    SE = \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}
    $$

4. **Calculate the margin of error (ME) for the proportion:**

    $$
    ME = Z_{\alpha/2} \cdot SE
    $$

5. **Calculate the 95% confidence interval for the proportion:**

    $$
    \text{Lower limit: } \hat{p} - ME
    $$

    $$
    \text{Upper limit: } \hat{p} + ME
    $$

The 95% confidence interval for the proportion p is given by the range between the lower and upper limits. This means that you can be 95% confident that the true population proportion lies within this interval.

In [73]:
# Create a dataset with binary characteristic (1 for success, 0 for failure)
data = [1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1,1,0,0,0,1]

# Instantiate a Sample object with the dataset
sample = Sample(data)

# Modify the sample object with the confidence_interval_proportion method
count = sum(data)  # Count the number of successes (1s) in the dataset
sample.confidence_interval_proportion(count)

# Print out the internal values of the method
print("Proportion:", sample.proportion)
print("Z-value:", sample.z_value)
print("Standard Error:", sample.standard_error)
print("Margin of Error:", sample.margin_of_error)
print("Proportion Lower Limit:", sample.proportion_lower_limit)
print("Proportion Upper Limit:", sample.proportion_upper_limit)

Proportion: 0.4
Z-value: 1.959963984540054
Standard Error: 0.08944271909999159
Margin of Error: 0.1753045081153163
Proportion Lower Limit: 0.2246954918846837
Proportion Upper Limit: 0.5753045081153163
