# Confidence intervals

**Confidence intervals (CIs)** come from a _confidence procedure_ that defines how many CIs will include the true parameter value $\theta$ if we repeatedly sampled from a population indefinitely. Below, we generate 400 samples of 30 observations from a single normal distribution with known parameters ($\mu = 5$; $\sigma=2.5$). We then estimate the parameters of the sampling distribution using the statistics of the samples. Our confidence procedure begins on `line 26` where we define what proportion of CIs that we want to include the true population parameter. We then use scipy's `stats.norm.ppf()` function to get the critical $z$ values that symmetrically cut off 5% of the probability from the normal distribution with mean of 0 and SD of 1. We then construct the CI by converting these $z$ values into the standard error of the sampling distribution.

Next, we empirically verify that our CIs work as the CI procedure intended. We create a CI for each individual sample mean, then plot it as a horizontal bar centered at the corresponding sample mean. We color CIs that include the population mean green and others red. We also count the number of green bars and print their proportion in the total number of bars.

The CI procedure defines the width of the generated CIs. Broader CIs are less informative because they include more of the uniformly distributed parameter space. Note also, that each individual CI comes from a single experiment, a single random data collection event. An individual CI does not necessarily mean that there is a $X\%$ chance that the true population mean is captured in it, because the **frequentist probability** is defined as a proportion of an occurrence of a random event over multiple repetitions. If we are interested in being right 95% of the time on average, then this CI procedure makes sense. If, however, we are interested in parameter values and precision, we cannot rely on CIs, because CIs do not inform us about those things. As a verification of this reasoning we can think about the fact that given the same confidence coefficient, different CI procedures can give CIs of different sizes. Moreover, sometimes you can be 100% certain that a (e.g. 50%) CI includes the true parameter value (when the likelihood interval is completely nested within a wider arbitrary confidence interval).

The **fundamental confidence fallacy** comes from confusing the term _confidence_ in a frequentist sense with _probability_ in a Bayesian sense (one could argue then that the Bayesian notion of probability is more intuitive). _Confidence_ stems from a frequentist repeated-sampling definition of probability. We can be confident about an outcome of a random event, _before_ it occurs. This confidence expresses our belief about the proportion of an occurrence of that random even had we repeated the experiment multiple times (but we rarely do). After we've sampled our data, the confidence is irrelevant to the actual result, because the result is either positive or negative and there is no more (frequentist) randomness. The distinction between confidence and probability is technical and slight, but confusing the two may lead to contradiction.

If we only care about the outcomes of the decisions based on the results of our experiments, we care about the _error rates_, and confidence intervals correspond to error rates quite well. Again, these error rates are proportions of errors over a repeated number of experiments. I think, in a world where we do not repeat the same experiment multiple number of times, it is difficult to go with frequentists' view of probability.

In [1]:
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt

plt.close()
fig = plt.figure('CIs', figsize=[4, 12])
ax = fig.add_subplot(111)

N = 50 # Sample size
K = 50 # Nnumber of samples

# 400 samples of size N
samples = np.random.normal(5, 2.5, N*K).reshape([K, N])

# Estimate the sampling distribution
x_bars = samples.mean(axis=1)
mu = x_bars.mean()
se = x_bars.std()

# Plot the sampling distribution
ax.hist(x_bars, bins=30, alpha=.7)
ax.axvline(mu, color='red', lw=3, alpha=.8)
ax.axvline(mu+se, color='red', ls='--', lw=1.5, alpha=.8)
ax.axvline(mu-se, color='red', ls='--', lw=1.5, alpha=.8)

# Construct confidence intervals
ci_size = .05
z_crit = stats.norm.ppf(1-(1-ci_size)/2, 0, 1)
t_crit = stats.t.ppf(1-(1-ci_size)/2, df=N, loc=0, scale=1)
ci = t_crit*se

# Plot mean and CI for each sample and color green if it includes the "true" mean
includes = []
for i, sample in enumerate(samples):
    L = x_bars[i]-ci
    U = x_bars[i]+ci
    mean_in_ci = int((mu>L)&(mu<U))
    ax.plot([L, U], [i+1, i+1], color=['red','green'][mean_in_ci], alpha=.5)
    includes.append(mean_in_ci)
    np.random.randint(0, 1, size=None, dtype='l')

# How many CIs include the mean?
print(np.mean(includes))

# What are the type I and type II error rates?
includes = np.array(includes).astype(bool)
decisions = np.random.binomial(size=includes.size, n=1, p=1).astype(bool) # If we are 95% confident that we found a result, we should claim that we did in 95% of the studies

N_pos = np.sum(decisions)
N_false_pos = np.sum(decisions & ~includes)
type1_rate = N_false_pos / N_pos
print('Type I error =  {} / {} = {}'.format(N_false_pos, N_pos, type1_rate))

# N_neg = np.sum(~decisions)
# N_false_neg = np.sum(~decisions & includes)
# type2_rate = N_false_neg / N_neg
# print('Type II error =  {} / {} = {}'.format(N_false_neg, N_neg, type2_rate))


0.04
Type I error =  48 / 50 = 0.96


# Credible intervals

Bayesian credible intervals also express our confidence in finding a true parameter, but in a very different way. It is much more subjective. Probability in a Bayesian sense corresponds to the strength in a belief, relative to other beliefs. The probability that some interval contains the true parameter thus corresponds to how sure (or unsure) we are that this interval contains the true parameter. Sounds much more intuitive. The way Bayesian credible intervals (BCIs) are constructed demonstrates why we can interpret it that way. 

We begin with a set of prior beliefs with a known distribution of strengths in the (the prior probability distribution). We then collect the data, assess each of the prior belief's likelihood given the collected data and get an updated posterior distribution (beliefs and their strengths updated by likelihood). The posterior probability in a set of belief (a range of parameter values) is our Bayesian (not frequentist) confidence in that belief after collecting the data. If we take any interval that contains $X\%$ of probability of that posterior distribution, we can say that we are $X\%$ confident that this interval contains the true value of the parameter. 

# Cookie jar example

From Keith Weinstein's answer here: https://stats.stackexchange.com/questions/2272/whats-the-difference-between-a-confidence-interval-and-a-credible-interval

In [98]:
xk = np.arange(5)

A_pk = (0.01, 0.01, 0.70, 0.28, 0.00)
A = stats.rv_discrete(name='A', values=(xk, A_pk))

B_pk = (0.12, 0.19, 0.24, 0.20, 0.25)
B = stats.rv_discrete(name='B', values=(xk, B_pk))

C_pk = (0.13, 0.20, 0.00, 0.00, 0.67)
C = stats.rv_discrete(name='C', values=(xk, C_pk))

D_pk = (0.27, 0.70, 0.01, 0.01, 0.01)
D = stats.rv_discrete(name='D', values=(xk, D_pk))

h_space = [A, B, C, D]
prior = stats.rv_discrete(name='prior', values=([0,1,2,3], [1/4, 1/4, 1/4, 1/4]))

K = 1000
CP = {0: 'BC', 1: 'BD', 2: 'AB', 3: 'B', 4: 'C'} # Likelihood-based confidence procedure
CP = {0: 'BD', 1: 'BD', 2: 'A', 3: 'AB', 4: 'C'} # 
hits = []
for i in range(K):
    rand_jar = h_space[prior.rvs(1)-1]
    nb_chips = rand_jar.rvs(1)-1
    CI = CP[nb_chips]
    hits.append(rand_jar.name in CI)

print(sum(hits)/len(hits))


0.796
