<a href="https://colab.research.google.com/github/annakasper1/QNC/blob/main/Confidence_Intervals_and_Bootstrapping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Exercise: Compute confidence/credible intervals based on the four methods above for simulated data sampled from a population that is Gaussian distributed with mean
mu=10 and standard deviation
sigma =2, for n=5, 10, 20, 40, 80, 160, 1000 at a 95% confidence level.

**Method 1: Using Z-scores, which assume large n, Gaussian distribution, and/or known standard deviation (sigma).**

ChatGPT prompt: create a python script that will calculate the confidence interval using the knowledge that the mean = 10, the confidence level = 95%, and that the standard deviation = 2. The script should calculate the confidence intervals using the following list of sample sizes (n): 2, 5, 10, 20, 40, 80, 160, and 1000. Utilize the z-scores to calculate the confidence intervals, since the standard deviation is known.

The CI interval obtained by the code for n = 80 was double-checked manually.


In [None]:
import math
import scipy.stats as stats

# Given values
mean = 10
std_dev = 2
confidence_level = 0.95
sample_sizes = [2, 5, 10, 20, 40, 80, 160, 1000]

# z-critical value for two-tailed 95% confidence interval
z_critical = stats.norm.ppf((1 + confidence_level) / 2)

# Loop through sample sizes
for n in sample_sizes:
    # Standard error of the mean
    sem = std_dev / math.sqrt(n)
    print(f"n={n:<5} SEM=({sem:.2f}, Z-critical={z_critical:.2f})") # Print standard error of the mean

    margin_of_error = z_critical * sem
    print(f"n={n:<5} Margin of error=({margin_of_error:.2f})") # Print standard error of the mean

    ci_lower = mean - margin_of_error
    ci_upper = mean + margin_of_error
    print(f"n={n:<5} CI=({ci_lower:.2f}, {ci_upper:.2f})")



n=2     SEM=(1.41, Z-critical=1.96)
n=2     Margin of error=(2.77)
n=2     CI=(7.23, 12.77)
n=5     SEM=(0.89, Z-critical=1.96)
n=5     Margin of error=(1.75)
n=5     CI=(8.25, 11.75)
n=10    SEM=(0.63, Z-critical=1.96)
n=10    Margin of error=(1.24)
n=10    CI=(8.76, 11.24)
n=20    SEM=(0.45, Z-critical=1.96)
n=20    Margin of error=(0.88)
n=20    CI=(9.12, 10.88)
n=40    SEM=(0.32, Z-critical=1.96)
n=40    Margin of error=(0.62)
n=40    CI=(9.38, 10.62)
n=80    SEM=(0.22, Z-critical=1.96)
n=80    Margin of error=(0.44)
n=80    CI=(9.56, 10.44)
n=160   SEM=(0.16, Z-critical=1.96)
n=160   Margin of error=(0.31)
n=160   CI=(9.69, 10.31)
n=1000  SEM=(0.06, Z-critical=1.96)
n=1000  Margin of error=(0.12)
n=1000  CI=(9.88, 10.12)


**Method 2: Using t-table values with Bessel's correction to minimize bias; assuming small n values.**

ChatGPT prompt: Create a python script that will calculate the confidence interval using the knowledge that the mean = 10, the confidence level = 95%, and that the standard deviation = 2. The script should calculate the confidence intervals using the following list of sample sizes (n): 2, 5, 10, 20, 40, 80, 160, and 1000. Assume that all sample sizes are considered small and use the t-table.

The CI interval obtained by the code for n = 40 was double-checked manually.


In [None]:
import math
import scipy.stats as stats

# Given values
mean = 10
std_dev = 2
confidence_level = 0.95
sample_sizes = [2, 5, 10, 20, 40, 80, 160, 1000]

# Loop through sample sizes
for n in sample_sizes:
    # Standard error of the mean
    sem = std_dev / math.sqrt(n)

    # Degrees of freedom
    df = n - 1

    # t-critical value for two-tailed 95% confidence
    t_critical = stats.t.ppf((1 + confidence_level) / 2, df)

    # Margin of error
    margin_of_error = t_critical * sem

    # Confidence interval
    ci_lower = mean - margin_of_error
    ci_upper = mean + margin_of_error

    print(f"n={n:<5} CI=({ci_lower:.3f}, {ci_upper:.3f})")


n=2     CI=(-7.969, 27.969)
n=5     CI=(7.517, 12.483)
n=10    CI=(8.569, 11.431)
n=20    CI=(9.064, 10.936)
n=40    CI=(9.360, 10.640)
n=80    CI=(9.555, 10.445)
n=160   CI=(9.688, 10.312)
n=1000  CI=(9.876, 10.124)


**Method 3: Bootstrapping; no assumptions about the population distribution.**

ChatGPT prompt: Switch to the bootstrapping method.

In [None]:
import numpy as np

# Given values
true_mean = 10
true_std = 2
confidence_level = 0.95
sample_sizes = [2, 5, 10, 20, 40, 80, 160, 1000]

# Number of bootstrap iterations
n_bootstrap = 10000

# Loop through sample sizes
for n in sample_sizes:
    # Generate one sample of size n from the normal population
    sample = np.random.normal(loc=true_mean, scale=true_std, size=n)

    # Bootstrap: resample with replacement and calculate means
    boot_means = []
    for _ in range(n_bootstrap):
        resample = np.random.choice(sample, size=n, replace=True)
        boot_means.append(np.mean(resample))

    # Compute confidence interval using percentiles
    alpha = (1 - confidence_level) / 2
    lower = np.percentile(boot_means, 100 * alpha)
    upper = np.percentile(boot_means, 100 * (1 - alpha))

    print(f"n={n:<5} CI=({lower:.3f}, {upper:.3f})")


n=2     CI=(11.308, 12.076)
n=5     CI=(8.975, 11.398)
n=10    CI=(8.357, 10.495)
n=20    CI=(8.861, 10.898)
n=40    CI=(9.729, 10.980)
n=80    CI=(9.430, 10.211)
n=160   CI=(9.507, 10.095)
n=1000  CI=(10.022, 10.268)


**Method 4: Bayesian credible intervals; use Bayes' Rule.**

ChatGPT prompt: Switch to using Bayes' rule to calculate Bayesian credible intervals.

In [None]:
import numpy as np
import scipy.stats as stats

# Known values
true_mean = 10
true_std = 2
confidence_level = 0.95
sample_sizes = [2, 5, 10, 20, 40, 80, 160, 1000]

# Weak prior: Normal(0, 1e6)
prior_mean = 0
prior_var = 1e6

alpha = (1 - confidence_level) / 2

for n in sample_sizes:
    # Simulate observed data from population
    sample = np.random.normal(loc=true_mean, scale=true_std, size=n)
    sample_mean = np.mean(sample)

    # Posterior variance
    post_var = 1 / (1/prior_var + n/(true_std**2))

    # Posterior mean
    post_mean = post_var * (prior_mean/prior_var + n*sample_mean/(true_std**2))

    # Posterior standard deviation
    post_std = np.sqrt(post_var)

    # Credible interval (percentiles of posterior Normal)
    lower = stats.norm.ppf(alpha, loc=post_mean, scale=post_std)
    upper = stats.norm.ppf(1-alpha, loc=post_mean, scale=post_std)

    print(f"n={n:<5} Bayesian CI=({lower:.3f}, {upper:.3f})")


n=2     Bayesian CI=(6.227, 11.771)
n=5     Bayesian CI=(8.151, 11.657)
n=10    Bayesian CI=(7.845, 10.324)
n=20    Bayesian CI=(9.560, 11.313)
n=40    Bayesian CI=(9.695, 10.934)
n=80    Bayesian CI=(9.760, 10.636)
n=160   Bayesian CI=(9.632, 10.252)
n=1000  Bayesian CI=(9.907, 10.155)
