# Chapter 3. Distributions
[Link to chapter online](https://allendowney.github.io/ThinkBayes2/chap03.html)

A reminder of Bayes’s Theorem:

$P(A|B) = \frac{P(A)P(B|A)}{P(B)}$

or

$P(H|D) = \frac{P(H)P(D|H)}{P(D)}$

## Warning

The content of this file may be incorrect, erroneous and/or harmful. Use it at Your own risk.

## Imports

In [None]:
import CairoMakie as Cmk
import StatsBase as Sb

## Functionality developed in this chapter

In [None]:
mutable struct Pmf{T}
    names::Vector{T} # names of hypotheses
    priors::Vector{Float64}
    likelihoods::Vector{Float64}
    posteriors::Vector{Float64}

    Pmf(ns::Vector{Int}, prs) = (length(ns) != length(prs)) ?
        error("length(names) must be equal length(priors)") :
        new{Int}(
            ns, prs, zeros(length(ns)), zeros(length(ns))
        )

    Pmf(ns::Vector{Float64}, prs) = (length(ns) != length(prs)) ?
        error("length(names) must be equal length(priors)") :
        new{Float64}(
            ns, prs, zeros(length(ns)), zeros(length(ns))
        )

    Pmf(ns::Vector{String}, prs) = (length(ns) != length(prs)) ?
        error("length(names) must be equal length(priors)") :
        new{String}(
            ns, prs, zeros(length(ns)), zeros(length(ns))
        )
end

function Base.show(io::IO, pmf::Pmf)
    result = "names: $(join(pmf.names, ", "))\n"
    result = result * "priors: $(join(map(x -> round(x, digits=3) |> string, pmf.priors), ", "))\n"
    result = result * "likelihoods: $(join(map(x -> round(x, digits=3) |> string, pmf.likelihoods), ", "))\n"
    result = result * "posteriors: $(join(map(x -> round(x, digits=3) |> string, pmf.posteriors), ", "))\n"
    print(io, result)
end

In [None]:
function getPmfFromSeq(seq::Vector{T})::Pmf{T} where T
    probs::Dict{T, Float64} = Sb.proportionmap(seq)
    sortedKeys::Vector{T} = keys(probs) |> collect |> sort
    sortedVals::Vector{Float64} = [probs[k] for k in sortedKeys]
    return Pmf(sortedKeys, sortedVals)
end

In [None]:
function getFieldValsEqName(pmf::Pmf{T}, name::T, fieldName::String, default) where T
    ind = findfirst(x -> x == name, getproperty(pmf, Symbol("names")))
    return isnothing(ind) ? default : getproperty(pmf, Symbol(fieldName))[ind]
end

In [None]:
function getPriorByName(pmf::Pmf{T}, name::T)::Float64 where T
    return getFieldValsEqName(pmf, name, "priors", 0.0)
end

function getPriorsByNames(pmf::Pmf{T}, names::Vector{T})::Vector{Float64} where T
    return map(n -> getPriorByName(pmf, n), names)
end

In [None]:
function updatePosteriors!(pmf::Pmf{T}, normalize::Bool=true) where T
    unnorms::Vector{Float64} = pmf.priors .* pmf.likelihoods
    if normalize
        pmf.posteriors = unnorms ./ sum(unnorms)
    else
        pmf.posteriors = unnorms
    end
end

function normalizePosteriors!(pmf::Pmf{T}) where T
    pmf.posteriors = pmf.posteriors ./ sum(pmf.posteriors)
end

function setLikelihoods!(pmf::Pmf{T}, likelihoods::Vector{Float64}) where T
    pmf.likelihoods = likelihoods
end

In [None]:
function drawLinesPosteriors(pmf::Pmf{T},
    title::String,
    xlabel::String,
    ylabel::String)::Cmk.Figure where T
    fig = Cmk.Figure()
    ax1, l1 = Cmk.lines(fig[1, 1],
        bowls101.names, bowls101.posteriors, color="navy",
        axis=(;
            title=title,
            xlabel=xlabel,
            ylabel=ylabel,
        ))
    Cmk.axislegend(ax1,
        [l1],
        ["posterior"],
        position=:lt
        )
    return fig
end

In [None]:
function getIdMaxPosterior(pmf::Pmf)::Int
    maxProb::Float64 = max(pmf.posteriors...)
    return findfirst(x -> x==maxProb, pmf.posteriors) 
end

function getNameMaxPosterior(pmf::Pmf{T})::T where T
    return pmf.names[getIdMaxPosterior(pmf)]
end

## Probability Mass Function

In [None]:
coin = getPmfFromSeq(["heads", "tails"])

In [None]:
die = getPmfFromSeq(collect(1:6))

In [None]:
letters = getPmfFromSeq(Vector{String}(split("Mississippi", "")))

In [None]:
getPriorByName(letters, "s")

In [None]:
getPriorByName(letters, "t")

In [None]:
getPriorsByNames(die, [1, 4, 7])

## The Cookie Problem Revisited

> Suppose there are two bowls of cookies.
>
> Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies.
>
> Bowl 2 contains 20 vanilla cookies and 20 chocolate cookies.
>
> Now suppose you choose one of the bowls at random and, without looking, choose a cookie at random. If the cookie is vanilla, what is the probability that it came from Bowl 1?

In [None]:
bowls2 = getPmfFromSeq(["bowl1", "bowl2"])

In [None]:
setLikelihoods!(bowls2, [30/40, 20/40]) # P(D|H)
updatePosteriors!(bowls2)
bowls2

And the answer is 0.6.

One benefit of using `Pmf` objects is that it is easy to do successive updates with more data. For example, suppose you put the first cookie back (so the contents of the bowls don’t change) and draw again from the same bowl. If the second cookie is also vanilla, we can do a second update like this:

In [None]:
bowls2.posteriors .*= bowls2.likelihoods
normalizePosteriors!(bowls2)
bowls2

Now the posterior probability for Bowl 1 is almost 70%. But suppose we do the same thing again and get a chocolate cookie.

In [None]:
bowls2.posteriors .*= [10/40, 20/40] # P(D|H)
normalizePosteriors!(bowls2)
bowls2

Now the posterior probability for Bowl 1 is about 53%. After two vanilla cookies and one chocolate, the posterior probabilities are close to 50/50.

## 101 Bowls

Next let’s solve a cookie problem with 101 bowls:
- Bowl 0 contains 0% vanilla cookies,
- Bowl 1 contains 1% vanilla cookies,
- Bowl 2 contains 2% vanilla cookies,
- ...
- Bowl 99 contains 99% vanilla cookies, and
- Bowl 100 contains all vanilla cookies.

As in the previous version, there are only two kinds of cookies, vanilla and chocolate. So Bowl 0 is all chocolate cookies, Bowl 1 is 99% chocolate, and so on.

Suppose we choose a bowl at random, choose a cookie at random, and it turns out to be vanilla. What is the probability that the cookie came from Bowl $x$, for each value of $x$?

In [None]:
# or collect(range(0, 100, 101))
bowls101 = getPmfFromSeq(collect(0:1:100))

In [None]:
bowls101.likelihoods = bowls101.names ./ 100
updatePosteriors!(bowls101)
bowls101

In [None]:
first(bowls101.posteriors, 5)

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1],
    bowls101.names, bowls101.priors, color="gray",
    axis=(;
        title="Posterior after one vanilla cookie",
        xlabel="Bowl #",
        ylabel="PMF",
        xticks=0:20:100,
        yticks=0:0.0025:0.02
    ))
l2 = Cmk.lines!(bowls101.names, bowls101.posteriors, coor="navy")
Cmk.axislegend(ax1,
    [l1, l2],
    ["prior", "posterior"],
    position=:lt
    )
fig

Now suppose we put the cookie back, draw again from the same bowl, and get another vanilla cookie. Here’s the update after the second cookie:

In [None]:
bowls101.posteriors .*= bowls101.likelihoods
normalizePosteriors!(bowls101)
bowls101

In [None]:
drawLinesPosteriors(
    bowls101,
    "Posterior after two vanilla cookies",
    "Bowl #",
    "PMF"
    )

But suppose we draw again and get a chocolate cookie. Here’s the update:

In [None]:
bowls101.posteriors .*= (1 .- bowls101.likelihoods)
normalizePosteriors!(bowls101)
bowls101

In [None]:
drawLinesPosteriors(
    bowls101,
    "Posterior after 2 vanilla, 1 chocolate cookie",
    "Bowl #",
    "PMF"
    )

In [None]:
getIdMaxPosterior(bowls101)

In [None]:
getNameMaxPosterior(bowls101)

## The Dice Problem

In the previous chapter we solved the dice problem using a Bayes table. Here’s the statement of the problem:

> Suppose I have a box with a 6-sided die, an 8-sided die, and a 12-sided die. I choose one of the dice at random, roll it, and report that the outcome is a 1. 
>
> What is the probability that I chose the 6-sided die?

In [None]:
dice = getPmfFromSeq([6, 8, 12])

In [None]:
setLikelihoods!(dice, [1/6, 1/8, 1/12])
updatePosteriors!(dice)
dice

Now suppose I roll the same die again and get a 7. Here are the likelihoods:

In [None]:
dice.posteriors .*= [0/6, 1/8, 1/12]
normalizePosteriors!(dice)
dice

After rolling a 1 and a 7, the posterior probability of the 8-sided die is about 69%.