[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PennNGG/Quantitative-Neuroscience/blob/master/Answers%20to%20Exercises/Python/Confidence%20Intervals%20and%20Bootstrapping%20Exercise%20Answers.ipynb)

# Getting Started with Code


Matlab code is found in the [NGG Statistics GitHub Repository](https://github.com/PennNGG/Statistics.git) under "Concepts/ConfidenceIntervals.m".


Python code is included below. First run the code cell just below to make sure all of the required Python modules are loaded, then you can run the other cell(s).

In [1]:
import scipy.stats as st
import numpy as np

# Exercises

Compute confidence/credible intervals based on the four methods described [here](https://github.com/PennNGG/Quantitative-Neuroscience/blob/master/Concepts/Python/Confidence%20Intervals%20and%20Bootstrapping.ipynb) for simulated data sampled from a population that is Gaussian distributed with mean $\mu$=10 and standard deviation $\sigma$=2, for *n*=5, 10, 20, 40, 80, 160, 1000 at a 95% confidence level.

In [2]:
# Exercise: Compute confidence/credible intervals for simulated data sampled from a population that is Gaussian distributed with mean mu=10 and standard deviation sigma=2, for n=5, 10, 20, 40, 80 at a 95% confidence level.
mu = 10
sigma = 2
alpha = 0.95
num_bootstraps = 1000

# Loop through the ns. Note that the different approaches converge on the same answer as n gets large
for n in [5, 10, 20, 40, 80, 160, 1000]:
   
   # Simulate some data
   samples = np.random.normal(mu, sigma, n)
   
   # Save the mean
   sample_mean = np.mean(samples)
   
   # Show the mean, n
   print(f'n = {n}, mean = {sample_mean:.2f}')
   
   # Method 1: analytic solution assuming Gaussian
   #
   # Get the z-score for the given confidence level (make it negative so we can subtract it to make the lower interval)
   z = -st.norm.ppf((1-alpha)/2)

   # 1a. Use the given sigma
   sem = sigma/np.sqrt(n)
   print(f'1a: CI=[{sample_mean-sem*z:.2f}, {sample_mean+sem*z:.2f}]')
      
   # 1b. Use the sample sigma
   #     BEST IF n IS LARGE (>30)
   sem = np.std(samples)/np.sqrt(n)
   print(f'1b: CI=[{sample_mean-sem*z:.2f}, {sample_mean+sem*z:.2f}]')
   
   # Method 2: analytic solution assuming t-distribution
   #      BEST IF n IS SMALL (<30) ... note that as n increases, the t distribution approaches a Gaussian and methods 1 and 2 become more and more similar

   # Get the cutoff using the t distribution, which is said to have n-1 degrees of freedom
   t = -st.t.ppf((1-alpha)/2,df=n-1)
   sem = np.std(samples)/np.sqrt(n);
   print(f'2 : CI=[{sample_mean-sem*t:.2f}, {sample_mean+sem*t:.2f}]')
   
   # Method 3: bootstrap!
   # Resample the data with replacement to get new estimates of mu 
   # Note that here we do not make any assumptions about the nature of the real distribution.
   mu_star = [np.mean(np.random.choice(samples, size=n)) for ii in np.arange(num_bootstraps)]
   
   # Now report the CI directly from the bootstrapped distribution
   print(f'3 : CI=[{np.percentile(mu_star, 100*(1-alpha)/2):.2f}, {np.percentile(mu_star, 100*(alpha+(1-alpha)/2)):.2f}]')
            
   # Method 4: Credible interval
   # See the Canvas discussion -- under these assumptions (i.e., data generated from a Gaussian distribution with known sigma), the answer is exactly the same as with Method 1, above. Note that this equivalence is NOT true in general, which means that frequentist confidence intervals and Bayesian credible intervals can givedifferent answers for certain distributions.
   
   # Formatting
   print(f'----')


n = 5, mean = 7.57
1a: CI=[5.81, 9.32]
1b: CI=[6.48, 8.65]
2 : CI=[6.02, 9.11]
3 : CI=[6.53, 8.60]
----
n = 10, mean = 10.04
1a: CI=[8.80, 11.28]
1b: CI=[9.22, 10.87]
2 : CI=[9.09, 10.99]
3 : CI=[9.21, 10.81]
----
n = 20, mean = 9.84
1a: CI=[8.96, 10.72]
1b: CI=[8.85, 10.83]
2 : CI=[8.79, 10.89]
3 : CI=[8.83, 10.77]
----
n = 40, mean = 9.85
1a: CI=[9.23, 10.47]
1b: CI=[9.23, 10.47]
2 : CI=[9.21, 10.49]
3 : CI=[9.23, 10.44]
----
n = 80, mean = 10.01
1a: CI=[9.57, 10.45]
1b: CI=[9.58, 10.45]
2 : CI=[9.57, 10.45]
3 : CI=[9.58, 10.44]
----
n = 160, mean = 10.08
1a: CI=[9.77, 10.39]
1b: CI=[9.79, 10.38]
2 : CI=[9.78, 10.38]
3 : CI=[9.78, 10.37]
----
n = 1000, mean = 9.94
1a: CI=[9.81, 10.06]
1b: CI=[9.81, 10.06]
2 : CI=[9.81, 10.06]
3 : CI=[9.81, 10.07]
----


# Credits

Copyright 2021 by Joshua I. Gold, University of Pennsylvania