### Credible Intervals

Once you have computed a posterior distribution, it is often useful to summarize the results with a single point estimate or an interval. For point estimates it is common to use the mean, median, or the value with maximum likelihood.

A **credible interval** are the values where there is a 90% chance that the unknown value falls between them. 

To compute a **credible interval** add up the probabilities in the posterior distribution and record the values that correspond to the 5th and 95th percentiles.

We can use ThinkBayes

In [6]:
def Percentile(pmf, percentage):
    p = percentage / 100.0
    total = 0
    for val, prob in pmf.Items():
        total += prob
        if total >= p:
            return val

Now import the locomotive suite of hypotheses so we can apply the Percentile function to it. 

In [11]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from thinkbayes import Pmf, Suite

In [12]:
# Taken from the first "Estimation" tutorial
class Dice(Suite): 
    def Likelihood(self, data, hypo):
        if hypo < data:
            return 0 
        else:
            return 1.0/hypo

# The likelihood function is the same in the Train as the Dice
class Train(Dice):
    def __init__(self, hypos, alpha = 1.0):  # Adding alpha to the arguments
        Pmf.__init__(self)
        for hypo in hypos:
            self.Set(hypo, hypo**(-alpha))  # adding in the power law here to alter the prior
        self.Normalize()

In [13]:
hypos = range(1, 1001) # PRIOR p(H)
suite = Train(hypos)

for data in [60, 30, 90]:
    suite.Update(data)

Now we can use the Percentile function we defined above.

In [10]:
# To use Percentile
interval = Percentile(suite, 5), Percentile(suite, 95)
print (interval)

(91, 242)


For the locomotive problem, using a power law prior and 3 trains, the 90% credible interval is (91, 243) - (5th ,95th). This very wide range correctly suggests the massive uncertainty in how many trains there are all together. 

### Cumulative distribution functions

In the previous section we computed percentiles by iterating through the values and probabilities in a Pmf. If we need to compute more than a few percentiles, it is more efficient to use a cumulative distribution function (Cdf).

Cdfs and Pmfs are equivalent in the sense that they contain the same information about the distribution, and you can convert on to the other. The advantage of the Cdf is that you can compute percentiles more efficiently.

thinkbayes provides a Cdf class that represents a cumulative distribution function. Pmf provides a method that makes the corresponsing Cdf:

In [15]:
cdf = suite.MakeCdf()

# Cdf provides a function named Percentile
interval = cdf.Percentile(5), cdf.Percentile(95)

print(interval)

(91, 242)
