## Notebook Overview

This notebook demonstrates how to calculate a confidence interval for a given dataset using a normal distribution approximation.

It covers the following steps:
1.  **Data Generation**: Creating a synthetic dataset from a normal distribution.
2.  **Sampling**: Taking a random sample from the generated dataset.
3.  **Confidence Interval Calculation (Manual)**: Manually computing the mean, standard deviation, margin of error, and the 95% confidence interval for a sample.
4.  **Confidence Interval Function**: Defining a reusable Python function `confidence_interval` to calculate the confidence interval for any given data and confidence level.
5.  **Function Application**: Applying the `confidence_interval` function to different sample sizes to observe its behavior.

In [24]:
import numpy as np
import scipy.stats as stats

# Generate a large dataset from a normal distribution
Data = np.random.normal(70, 10, 1000)

# Take a random sample of 50 data points from the larger dataset
sample = np.random.choice(Data, 50)

# Calculate the mean and standard deviation of the sample
mean = np.mean(sample)
std = np.std(sample)
print("Mean:", mean)
print("Standard Deviation:", std)

# Define the z-score for a 95% confidence interval
z = 1.96  # For 95% confidence interval
# Calculate the margin of error
margin = z * (std / np.sqrt(len(sample)))

# Calculate the lower and upper bounds of the confidence interval
lower = mean - margin
upper = mean + margin

print("Confidence Interval:", lower, upper)

Mean: 72.76179726647696
Standard Deviation: 11.576143006586463
Confidence Interval: 69.55305413225949 75.97054040069443


In [25]:
def confidence_interval(data, confidence=0.95):
    """
    Calculates the confidence interval for a given dataset.

    Args:
        data (array-like): The dataset for which to calculate the confidence interval.
        confidence (float): The confidence level (e.g., 0.95 for 95% CI). (Note: currently uses a fixed z-value)

    Returns:
        tuple: A tuple containing the lower and upper bounds of the confidence interval.
    """
    mean = np.mean(data)
    std = np.std(data)
    n = len(data)
    # Note: The 'z' variable is assumed to be defined globally or passed as an argument.
    # For a general function, it's better to calculate z based on 'confidence' (e.g., using scipy.stats.norm.ppf).
    margin = z * (std / np.sqrt(n))
    lower = mean - margin
    upper = mean + margin
    return lower, upper

# Call the function with a sample of 10 data points
confidence_interval(np.random.choice(Data,10))

(np.float64(62.33632332867761), np.float64(75.23527930473402))

In [14]:
confidence_interval(np.random.choice(Data,100))


(np.float64(68.9417627076407), np.float64(73.05406164115895))

In [20]:
confidence_interval(np.random.choice(Data,200))


(np.float64(68.84898261952391), np.float64(71.39563992131637))

In [22]:
confidence_interval(np.random.choice(Data,500))


(np.float64(68.20065248759362), np.float64(70.16305376813283))

In [23]:
confidence_interval(np.random.choice(Data,1000))

(np.float64(69.17739222892219), np.float64(70.40823321981655))

## Summary

The notebook successfully illustrates the process of calculating a confidence interval. Initially, a confidence interval is calculated step-by-step for a specific sample size. Subsequently, a function `confidence_interval` is introduced to generalize this calculation, which is then applied to various sample sizes. The `z` value for the confidence interval calculation is currently fixed at 1.96 for a 95% confidence level. For a more robust solution, the `z` value should be dynamically calculated based on the `confidence` level using `scipy.stats.norm.ppf`.