In statistics, a confidence interval (CI) is a type of interval estimate (of a population parameter) that is computed from the observed data. The confidence level is the frequency (i.e., the proportion) of possible confidence intervals that contain the true value of their corresponding parameter. In other words, if confidence intervals are constructed using a given confidence level in an infinite number of independent experiments, the proportion of those intervals that contain the true value of the parameter will match the confidence level.

Confidence intervals consist of a range of values (interval) that act as good estimates of the unknown population parameter. However, the interval computed from a particular sample does not necessarily include the true value of the parameter. Since the observed data are random samples from the true population, the confidence interval obtained from the data is also random. If a corresponding hypothesis test is performed, the confidence level is the complement of the level of significance; for example, a 95% confidence interval reflects a significance level of 0.05. If it is hypothesized that a true parameter value is 0 but the 95% confidence interval does not contain 0, then the estimate is significantly different from zero at the 5% significance level.

The desired level of confidence is set by the researcher (not determined by data). Most commonly, the 95% confidence level is used. However, other confidence levels can be used, for example, 90% and 99%.

Factors affecting the width of the confidence interval include the size of the sample, the confidence level, and the variability in the sample. A larger sample size normally will lead to a better estimate of the population parameter.

Confidence intervals were introduced to statistics by Jerzy Neyman in a paper published in 1937.

In [1]:

!pip install --upgrade numpy
!pip install --upgrade scipy
# Install a pip package in the current Jupyter kernel
import sys

!{sys.executable} -m pip install --upgrade statsmodels

Requirement already up-to-date: numpy in c:\users\dave\anaconda3\lib\site-packages
Requirement already up-to-date: scipy in c:\users\dave\anaconda3\lib\site-packages
Requirement already up-to-date: numpy>=1.8.2 in c:\users\dave\anaconda3\lib\site-packages (from scipy)
Requirement already up-to-date: statsmodels in c:\users\dave\anaconda3\lib\site-packages


In [2]:

import numpy as np
from scipy.stats import sem, t
import statsmodels.stats.api as sms

def mean_confidence_interval(data, confidence=0.95):
    a = 1.0*np.array(data)
    n = len(a)
    m, se = np.mean(a), sem(a)
    h = se * t._ppf((1+confidence)/2., n-1)
    return m, m-h, m+h

a = range(10, 14)

<module 'platform' from 'C:\\Users\\Dave\\Anaconda3\\lib\\platform.py'>


  from pandas.core import datetools



The underlying assumptions for both are that the sample (array a) was drawn independently from a normal distribution with unknown standard deviation (see MathWorld or Wikipedia).

For large sample size n, the sample mean is normally distributed, and one can calculate its confidence interval using st.norm.interval() (as suggested in Jaime's comment). But the above solutions are correct also for small n, where st.norm.interval() gives confidence intervals that are too narrow (i.e., "fake confidence"). See my answer to a similar question for more details (and one of Russ's comments here).

Here an example where the correct options give (essentially) identical confidence intervals:

In [3]:

t.interval(0.95, len(a)-1, loc=np.mean(a), scale=sem(a))

(9.4457397432391215, 13.554260256760879)

In [4]:

sms.DescrStatsW(a).tconfint_mean()

(9.4457397432391197, 13.55426025676088)

In [5]:

mean_confidence_interval(a)

(11.5, 9.4457397432391215, 13.554260256760879)

In [15]:

%pprint
from scipy import stats

[d for d in dir(stats) if isinstance(getattr(stats, d), stats.rv_continuous)]

Pretty printing has been turned OFF


['alpha', 'anglit', 'arcsine', 'argus', 'beta', 'betaprime', 'bradford', 'burr', 'burr12', 'cauchy', 'chi', 'chi2', 'cosine', 'crystalball', 'dgamma', 'dweibull', 'erlang', 'expon', 'exponnorm', 'exponpow', 'exponweib', 'f', 'fatiguelife', 'fisk', 'foldcauchy', 'foldnorm', 'frechet_l', 'frechet_r', 'gamma', 'gausshyper', 'genexpon', 'genextreme', 'gengamma', 'genhalflogistic', 'genlogistic', 'gennorm', 'genpareto', 'gilbrat', 'gompertz', 'gumbel_l', 'gumbel_r', 'halfcauchy', 'halfgennorm', 'halflogistic', 'halfnorm', 'hypsecant', 'invgamma', 'invgauss', 'invweibull', 'johnsonsb', 'johnsonsu', 'kappa3', 'kappa4', 'ksone', 'kstwobign', 'laplace', 'levy', 'levy_l', 'levy_stable', 'loggamma', 'logistic', 'loglaplace', 'lognorm', 'lomax', 'maxwell', 'mielke', 'nakagami', 'ncf', 'nct', 'ncx2', 'norm', 'pareto', 'pearson3', 'powerlaw', 'powerlognorm', 'powernorm', 'rayleigh', 'rdist', 'recipinvgauss', 'reciprocal', 'rice', 'semicircular', 'skewnorm', 't', 'trapz', 'triang', 'truncexpon', 'tru

In [2]:

%pprint
from scipy import stats

[d for d in dir(stats) if isinstance(getattr(stats, d), stats.rv_discrete)]

Pretty printing has been turned OFF


['bernoulli', 'binom', 'boltzmann', 'dlaplace', 'geom', 'hypergeom', 'logser', 'nbinom', 'planck', 'poisson', 'randint', 'skellam', 'zipf']