# Chapter 5. Estimating Counts

[Link to chapter online](https://allendowney.github.io/ThinkBayes2/chap05.html)


In this chapter, we’ll work on problems related to counting, or estimating the size of a population.

A reminder of Bayes’s Theorem:

$P(A|B) = \frac{P(A)P(B|A)}{P(B)}$

or

$P(H|D) = \frac{P(H)P(D|H)}{P(D)}$

## Warning

The content of this file may be incorrect, erroneous and/or harmful. Use it at Your own risk.

## Imports

In [None]:
include("pmf.jl")
import .ProbabilityMassFunction as Pmf

## The Train Problem

I found the train problem in Frederick Mosteller’s, [Fifty Challenging Problems
in Probability with
Solutions](https://store.doverpublications.com/0486653552.html):

> “A railroad numbers its locomotives in order 1…N. One day you see a locomotive
with the number 60. Estimate how many locomotives the railroad has.”

In [None]:
# names - max number of locomotives in the fleet
train = Pmf.getPmfFromSeq(1:1000 |> collect)

In [None]:
"""
    Update Pmf (names are hypothesized max counts)

    data - observed counts
"""
function updateCounts!(pmf::Pmf.Pmf{Int}, data::Int)

    # the chance of seeing any number out of postulated N (max counts)
    # is 1/N
    likelihood::Vector{<:Float64} = 1 ./ pmf.names
    impossible::BitVector = data .> pmf.names
    likelihood[impossible] .= 0
    Pmf.setLikelihoods!(pmf, likelihood)
    Pmf.updatePosteriors!(pmf, true)
    
    return nothing
end

In [None]:
data = 60
updateCounts!(train, data)

In [None]:
Pmf.drawLinesPosteriors(
    train,
    "Posterior distribution\n(after observing train 60)",
    "Number of trains",
    "PMF"
)

In [None]:
Pmf.getNameMaxPosterior(train)

That might not seem like a very good guess; after all, what are the chances that
you just happened to see the train with the highest number? Nevertheless, if you
want to maximize the chance of getting the answer exactly right, you should
guess 60

An alternative is to compute the mean of the posterior distribution. Given a
set of possible quantities, $q_i$, and their probabilities, $p_i$, the mean of
the distribution is:

$mean = \sum_{i=1}^{n}p_{i}q_{i}$

In [None]:
sum(train.posteriors .* train.names)

In [None]:
Pmf.getMeanPosterior(train)

The mean of the posterior is 333, so that might be a good guess if you want to
minimize error. If you played this guessing game over and over, using the mean
of the posterior as your estimate would minimize the [mean squared
error](https://allendowney.github.io/ThinkBayes2/chap05.html) over the long run.