- @Author: Carson Hanel
- Class  : STAT 489 Principles of Data Science and Statistics
- Prof.  : Alan Dabney, Ph.D.
- Label  : Confidence interval testing, creation of random experiments.

Random Experiment Generation:
----
First, import all necessary packages:

In [1]:
import numpy as np
import random

Below is a simulation of operating characteristics of confidence intervals.

To begin:
---
- Statement     : "I'm 95% confident that $\mu$ is between a and b."
- Interpretation: "95% of all 95% CI's will contain $\mu$"
 
Simulation:
---
- $n$: sample size
- $\mu$: population mean
- $\sigma$: population standard deviation
- $N$: number of simulations

Explanation: 
---
For each of $N$ times:
- Simulate a dataset of size n with parameters $\mu$ and $\sigma$.
- Compute sample statistics and use them to perform a 95% confidence interval
- Check and store whether $\mu$ $\in$ Confidence Interval

In [2]:
# Number of experiments and pieces of data
N = 1000
n = 100

# Generate simulated data:
sim = np.random.standard_normal((n, N))
print(sim.shape)

# Printing the upper-leftmost data in the generated random set.
sim[:2, :3]

(100L, 1000L)


array([[-1.06490253,  1.7048808 , -0.61522383],
       [ 0.05755169,  0.16063181,  0.45558841]])

Computing Confidence Intervals:
----
For a given simulated dataset:

- $\bar{y}$ = $\frac{1}{n}$$\sum_{i = 1}^{n}$$y_i$
- $\sigma^2$= $\frac{1}{n}$$\sum_{i = 1}^{n}$($y_i$ - $\bar{y}$)


Now, we'll compute the 95% confidence intervals for our generated data:

In [4]:
sim_ci = np.zeros((N, 2))
for i in range(N):
    sim_i   = sim[:, i]
    x_bar_i = np.mean(sim_i)
    s_i     = np.std(sim_i)
    sim_ci [i, 0] = x_bar_i - 1.96 * s_i / np.sqrt(n)
    sim_ci [i, 1] = x_bar_i + 1.96 * s_i / np.sqrt(n)
    
print(sim_ci.shape)
print(sim_ci[:5, :])

(1000L, 2L)
[[-0.14656928  0.26078124]
 [-0.29011492  0.12910485]
 [-0.25896131  0.12693091]
 [-0.26386062  0.15472119]
 [-0.03330084  0.36579819]]


Compute the proportion of 95% confidence intervals that contained $\mu$

In [5]:
cvrg = np.zeros(N)
for i in range(N):
    if(sim_ci[i,0] <= 0 and sim_ci[i,1] >= 0):
        cvrg[i] = 1
print(np.mean(cvrg))

0.949
