In [8]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from thinkbayes import Pmf, Suite

#### "A railroad numbers its locomotives in order 1...n One day you see a locomotive with the number 60. Estimate how many locomotives the railroad has"

From this statement we know the railroad has at least 60 locomotives. But how many more?    
Apply Baysian reasoning:   

1) What did we know about N before we saw the data? - Use the **Prior**

2) For any given value of N, what is the likelihood of seeing the data (a locomotive with the number 60)? - use the **Likelihood**

We don't have much on which to base a prior, but lets start with something and work from there


In [9]:
# Assume N is equally likely to be any value from 1 to 1000

hypos = range(1, 1000)

# Now Likelihood. If the have N locomotives and we are equally likely to see any of them, then the chance of seeing
# any particular one is 1 / N.

class Train(Suite):
    def Likelihood(self, data, hypo):
        if hypo < data:
            return 0
        else:
            return 1.0/hypo



In [10]:
# Now to update:
suite = Train(hypos)
b = suite.Update(60)

<img src="thinkbayesLoco.png">

There are too many results to be plotted (see graph above, stolen from the book).

As you can see, the most likely value is 60. But, what are the chances that you just happened to see the train with the highest number? Not very good. 

An alternative is to compute the mean of the posterior distribution:

In [11]:
print(suite.Mean())

333.41989326371095


In [12]:
# or the long way ....
def Mean(suite):
    total = 0
    for hypo, prob in suite.Items():
        total += hypo * prob
    return total
print (Mean(suite))

333.41989326371095


### Maybe we should rethink that PRIOR

#### We need to: *Get more data* and/or *Get more background information*

For example, we also see trains 30 and 90. We can update the distribution like this:

In [19]:
# Use a prior with upper bound 500
hypos = range(1, 500)
suite = Train(hypos)
b = suite.Update(60)
print(suite.Mean())

# Perhaps 1000 was an unrealistic starting point. An upper bound does quite a different value.

206.8038772952583


In [20]:
# Use a prior with upper bound 500
hypos = range(1, 2000)
suite = Train(hypos)
b = suite.Update(60)
print(suite.Mean())

551.9730485662769


In [14]:
# initiate suite again
suite = Train(hypos)

for data in [60, 30, 90]:
    suite.Update(data)
    print(suite.Mean())

333.41989326371095
178.5473531797158
164.3055864227336
