# Test for confidence intervals

*SG2227 Saleh Rezaeiravesh and Philipp Schlatter*

Simple test for a confidence interval: Estimate the population mean with the sample mean and check the CI for a given confidence. We assume many samples so we do not care about the bias.

Note that this is the correct way for checking the confidence interval. The idea is that in $\gamma$% of the cases, the true mean lies within the bounds computed by the same procedure.

In the below case, we compute the probability

$$P[ \hat{\mu}-C < \mu < \hat{\mu}+C  ] = \gamma$$

with the confidence level $\gamma$. The interval from $\hat{\mu}-C$ to $\hat{\mu}$ is then the $\gamma$-% confidence interval for $\hat\mu$.


It is important to understand that:
- A 95% confidence level does not mean that 95% of the sample data lie within the confidence interval.
- A 95% confidence level does not mean that for a given realised interval there is a 95% probability that the population parameter lies within the interval


In [23]:
import numpy as np
from scipy.stats import norm
from tqdm.notebook import trange, tqdm

# set the confidence level
confidence = 0.95

# do a total of W repetitions to check CI
W=200

# Divide N samples to K batches each of size M.
K=1000; M=500; 
N=K*M;

mu, sigma = 5, 2.5 # mean and standard deviation

# t_crit for the given confidence level. We assume >30 or so samples, therefore we take the 
# normal distribution directly.
t_crit = np.abs(norm.ppf((1-confidence)/2))

Wok=0
smean = np.zeros(K)
svar = np.zeros(K)
for j in tqdm(range(W)):
    
    # create samples, either normal or uniform
    # s = np.random.normal(mu, sigma, N)
    s = mu+np.sqrt(12)*sigma*(np.random.random_sample((N,))-0.5)

    # compute batch means
    for i in range(K):
        ss = s[i*M:(i+1)*M]   # this is batch i
        smean[i] = np.mean(ss)
    # compute mean and variance over all batches. We assume >30 or so samples therefore we do not 
    # care about the potential bias.
    bmu = np.mean(smean)
    bsigma = np.std(smean,ddof=0)
    # compute confidence interval
    CI = bsigma/np.sqrt(K)*t_crit
    # check confidence interval
    if bmu-CI<mu and bmu+CI>mu:
        Wok = Wok+1
print('True mean within %2.1f%%-CI: %2.1f%%'% (confidence*100,Wok/W*100)  )

  0%|          | 0/200 [00:00<?, ?it/s]

True mean within 95.0%-CI: 93.5%
