# Chapter 8. Poisson Processes

[Link to chapter online](https://allendowney.github.io/ThinkBayes2/chap08.html)

## Warning

The content of this file may be incorrect, erroneous and/or harmful. Use it at Your own risk.

## Imports

In [None]:
include("./pmfAndCdf.jl")
#import .ProbabilityMassFunction as Pmf # if defined with module (and export) in pmfAndCdf.jl
#using .ProbabilityMassFunction # if defined with module and export in pmfAndCdf.jl

include("./simplestat.jl")
#import .SimpleStatistics as Ss # if defined with module (and export) in simplestat.jl
#using .ProbabilityMassFunction # if defined with module and export in simplestat.jl

### The World Cup Problem

In the 2018 FIFA World Cup final, France defeated Croatia 4 goals to 2. Based on this outcome:
- How confident should we be that France is the better team?
- If the same teams played again, what is the chance France would win again?

To answer these questions, we have to make some modeling decisions.
- First, I’ll (A.B.D.) assume that for any team against another team there is some unknown goal-scoring rate, measured in goals per game, which I’ll denote with the Python variable `lam` or the Greek letter $\lambda$, pronounced “lambda”.
- Second, I’ll (A.B.D.) assume that a goal is equally likely during any minute of a game. So, in a 90 minute game, the probability of scoring during any minute is $\lambda / 90$.
- Third, I’ll (A.B.D.) assume that a team never scores twice during the same minute."

## The Poisson Distribution

If the number of goals scored in a game follows a Poisson distribution with a goal-scoring rate $\lambda$, the probability of scoring $k$ goals is

$\lambda^{k} * exp(-\lambda)/k!$

or

$\frac{\lambda^{k} * exp^{-\lambda}}{k!}$

or

$\frac{\lambda^{k}}{k!}e^{-\lambda}$

for any non-negative value of $k$.

In [None]:
lam = 1.4
distPois = Dsts.Poisson(lam)

In [None]:
# k goals
k = 4
Dsts.pdf(distPois, k)

This result implies that if the average goal-scoring rate is 1.4 goals per game, then probability of scoring 4 goals in a game is ~4%.

In [None]:
# custom type
Num = Union{Int, Float64}

In [None]:
function getPoisPmf(
        lam::A,
        qs::Vector{B}
        )::Pmf{B} where {A<:Num, B<:Num}
    ps::Vector{Float64} = Dsts.pdf.(Dsts.Poisson(lam), qs)
    pmf::Pmf{B} = Pmf(qs, ps)
    return pmf
end

In [None]:
lam = 1.4
goals = 0:10 |> collect
pmfGoals = getPoisPmf(lam, goals)

In [None]:
fig = Cmk.Figure()
ax, bp = Cmk.barplot(fig[1, 1],
    pmfGoals.names, pmfGoals.priors,
    axis=(;title="Distribution of goals scored",
        xlabel="Number of goals",
        ylabel="PMF",
        xticks=0:10
    )
)
Cmk.axislegend(
    ax,
    [bp],
    ["λ = 1.4"],
    "Poisson distribution"
)
fig

[...]

Now let’s turn it around: given a number of goals, what can we say about the goal-scoring rate?

To answer that, we need to think about the prior distribution of `lam`, which represents the range of possible values and their probabilities before we see the score.

## The Gamma Distribution

Using [data](https://www.statista.com/statistics/269031/goals-scored-per-game-at-the-fifa-world-cup-since-1930/) [...] each team scores about 1.4 goals per game, on average (so $\lambda = 1.4$).

For a good team against a bad one, $\lambda$ will be higher; for a bad team against a good one, it will be lower.

We will model it with [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution). It is usually used to model the time that passes before $\alpha$ occurences of a randomly occuring event (an event that occurs via Poisson process).

In [None]:
alpha = 1.4
qs = range(0, 10, 101)
ps = Dsts.pdf.(Dsts.Gamma(alpha), qs);

The parameter, `alpha`, is the mean of the distribution. The `qs` are possible values of `lam` between 0 and 10. The `ps` are probability densities, which we can think of as unnormalized probabilities.

In [None]:
lambdasPmf = Pmf(qs |> collect, ps)

In [None]:
fig = Cmk.Figure()
Cmk.lines(fig[1, 1],
    lambdasPmf.names, lambdasPmf.priors,
    color="lightgray", linestyle=:dash, linewidth=3,
    axis=(;title="Prior distribution of λ",
        xlabel="Goal scoring rate (λ)", ylabel="PMF",
        xticks=0:2:10, yticks=0:0.01:0.06)
)
fig

This distribution represents our prior knowledge about goal scoring: 'lam' is usually less than 2, occasionally as high as 6, and seldom higher than that.

We can confirm that mean is about 1.4.

In [None]:
getMean(lambdasPmf, true),
getQuantile(lambdasPmf, 0.5, true),
getCredibleInterval(lambdasPmf, 0.5, true)

(one) [...] could disagree about [...] the prior, but this is good enough to get started.

## The Update

Given the goal-scoring rate, $\lambda$, what is the probability of scoring `k` goals?

In [None]:
lam = 1.4
k = 4
Dsts.pdf(Dsts.Poisson(lam), k)

Now suppose we have an array of possible values for $\lambda$ we can compute the likelihood of the data for each hypothetical $\lambda$

In [None]:
likelihoods = Dsts.pdf.(Dsts.Poisson.(lambdasPmf.names), k)

We can encapsulate it into a function

In [None]:
# pmfPoisson.names = lambdas for Poisson distribution
# pmfPoisson.priors = probabilities for the lambdas
# data - k of successes (placed on x-axis) of poisson distribution
function updatePoisson!(pmfPoisson::Pmf{Float64}, data::Num)
    k::Num = data
    lams::Vector{Float64} = pmfPoisson.names
    likelihoods::Vector{Float64} = Dsts.pdf.(Dsts.Poisson.(lams), k)
    setLikelihoods!(pmfPoisson, likelihoods)
    updatePosteriors!(pmfPoisson, true)
    return nothing
end

In [None]:
# in the example, France scored 4 goals
france = Pmf(lambdasPmf.names |> copy, lambdasPmf.priors |> copy)
updatePoisson!(france, 4)

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1],
    france.names, france.priors,
    color="lightgray", linestyle=:dash, linewidth=3,
    axis=(;title="Prior distribution of λ",
        xlabel="Goal scoring rate (λ)", ylabel="PMF",
        xticks=0:2:10, yticks=0:0.01:0.06)
)
l2 = Cmk.lines!(fig[1, 1],
    france.names, france.posteriors,
    color="orange", linestyle=:solid, linewidth=3
)
Cmk.axislegend(
    ax1,
    [l1, l2],
    ["prior (average team)", "France posterior"],
    "Poisson distribution"
)
fig

In [None]:
# Croatia scored 2 goals, so
croatia = Pmf(lambdasPmf.names |> copy, lambdasPmf.priors |> copy)
updatePoisson!(croatia, 2)

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1],
    croatia.names, croatia.priors,
    color="lightgray", linestyle=:dash, linewidth=3,
    axis=(;title="Prior distribution of λ",
        xlabel="Goal scoring rate (λ)", ylabel="PMF",
        xticks=0:2:10, yticks=0:0.01:0.06)
)
l2 = Cmk.lines!(fig[1, 1],
    croatia.names, croatia.posteriors,
    color="blue", linestyle=:solid, linewidth=3
)
Cmk.axislegend(
    ax1,
    [l1, l2],
    ["prior (average team)", "Croatia posterior"],
    "Poisson distribution"
)
fig

In [None]:
getMean(france),
getMean(croatia)

## Probability of Superiority

How confident should we be that France is the better team?

In the model, “better” means having a higher goal-scoring rate against the opponent. We can use the posterior distributions to compute the probability that a random value drawn from France’s distribution exceeds a value drawn from Croatia’s.

One way to do that is to enumerate all pairs of values from the two distributions, adding up the total probability that one value exceeds the other.

In [None]:
# prob pmf1 > pmf2
# prob a random value (pmf1.names) is greater than (pmf2.names)
# where pmf1.names/pmf2.names is qs (e.g. goal scoring rate)
function getProbGt(pmf1::Pmf{A}, pmf2::Pmf{B},
        usePriors::Bool = false)::Float64 where {A<:Num, B<:Num} 
    total::Float64 = 0
    for (n1, p1) in zip(pmf1.names, 
            usePriors ? pmf1.priors : pmf1.posteriors)
        for (n2, p2) in zip(pmf2.names,
                usePriors ? pmf2.priors : pmf2.posteriors)
            if n1 > n2
                total += p1 * p2
            end
        end
    end
    return total
end

In [None]:
getProbGt(france, croatia)

Of course, we should remember that this result is based on the assumption that the goal-scoring rate is constant. In reality, if a team is down by one goal, they might play more aggressively toward the end of the game, making them more likely to score, but also more likely to give up an additional goal.

## Predicting the Rematch

If the same teams played again, what is the chance Croatia would win? To answer this question, we’ll generate the “posterior predictive distribution”, which is the number of goals we expect a team to score.

If we knew the goal scoring rate, `lam`, the distribution of goals would be a Poisson distribution with parameter `lam`. Since we don’t know `lam`, the distribution of goals is a mixture of a Poisson distributions with different values of `lam`.

In [None]:
goals = 0:10 |> collect
pmfSeq = [getPoisPmf(lam, goals)
    for lam in lambdasPmf.names];

In [None]:
fig = Cmk.Figure()
inds = 11:10:41 |> collect
i = 0
for r in 1:2
    for c in 1:2
        i += 1
        ax, bp = Cmk.barplot(fig[r, c],
            pmfSeq[inds[i]].names,
            pmfSeq[inds[i]].priors,
            axis=(;
                title="Goals distribution",
                xlabel="Number of goals",
                ylabel="PMF",
                xticks=0:2:10
            )
        )
        Cmk.axislegend(ax,
            [bp],
            ["λ = $(lambdasPmf.names[inds[i]])"]
        )
    end
end
fig

In [None]:
francePmfsMixture = getMixture(
    france, pmfSeq,
    map(x -> string("λ = ", x), lambdasPmf.names),
    0:10 |> collect,
    false
)

In [None]:
fig = Cmk.Figure()
ax, bp = Cmk.barplot(fig[1, 1],
    francePmfsMixture.outcome, francePmfsMixture.posteriors,
    axis=(;title="Posterior predictive distribution",
        xlabel = "Number of goals",
        ylabel = "PMF",
        xticks=0:10
    )
)
Cmk.axislegend(ax,
    [bp],
    ["France"]
)
fig

This distribution represents two sources of uncertainty: we don’t know the actual value of `lam`, and even if we did, we would not know the number of goals in the next game.

Here’s the predictive distribution for Croatia.

In [None]:
croatiaPmfsMixture = getMixture(
    croatia, pmfSeq,
    map(x -> string("λ = ", x), lambdasPmf.names),
    0:10 |> collect,
    false
)

In [None]:
fig = Cmk.Figure()
ax, bp = Cmk.barplot(fig[1, 1],
    croatiaPmfsMixture.outcome, croatiaPmfsMixture.posteriors,
    axis=(;title="Posterior predictive distribution",
        xlabel = "Number of goals",
        ylabel = "PMF",
        xticks=0:10
    )
)
Cmk.axislegend(ax,
    [bp],
    ["Croatia"]
)
fig

We can use these distributions to compute the probability that France wins, loses, or ties the rematch.

In [None]:
predFrance = Pmf(francePmfsMixture.outcome, francePmfsMixture.posteriors)
predCroatia = Pmf(croatiaPmfsMixture.outcome, croatiaPmfsMixture.posteriors);

In [None]:
pFranceWin = getProbGt(predFrance, predCroatia, true)

In [None]:
# prob pmf1 < pmf2
# prob a random value (pmf1.names) is smaller than (pmf2.names)
# where pmf1.names/pmf2.names is qs (e.g. goal scoring rate)
function getProbLt(pmf1::Pmf{A}, pmf2::Pmf{B},
        usePriors::Bool = false)::Float64 where {A<:Num, B<:Num} 
    total::Float64 = 0
    for (n1, p1) in zip(pmf1.names, 
            usePriors ? pmf1.priors : pmf1.posteriors)
        for (n2, p2) in zip(pmf2.names,
                usePriors ? pmf2.priors : pmf2.posteriors)
            if n1 < n2
                total += p1 * p2
            end
        end
    end
    return total
end

In [None]:
pFranceLose = getProbLt(predFrance, predCroatia, true)

In [None]:
# prob pmf1 == pmf2
# prob a random value (pmf1.names) is equal to (pmf2.names)
# where pmf1.names/pmf2.names is qs (e.g. goal scoring rate)
function getProbEq(pmf1::Pmf{A}, pmf2::Pmf{B},
        usePriors::Bool = false)::Float64 where {A<:Num, B<:Num} 
    total::Float64 = 0
    for (n1, p1) in zip(pmf1.names, 
            usePriors ? pmf1.priors : pmf1.posteriors)
        for (n2, p2) in zip(pmf2.names,
                usePriors ? pmf2.priors : pmf2.posteriors)
            if isapprox(n1, n2)
                total += p1 * p2
            end
        end
    end
    return total
end

In [None]:
pFranceTie = getProbEq(predFrance, predCroatia, true)

In [None]:
pFranceWin + pFranceLose + pFranceTie # should add up to 1

Assuming that France wins half of the ties, their chance of winning the rematch is about

In [None]:
pFranceWin + pFranceTie/2

This is a bit lower than their probability of superiority, which is 75%. And that makes sense, because we are less certain about the outcome of a single game than we are about the goal-scoring rates. Even if France is the better team, they might lose the game.

## The Exponential Distribution

> In the 2014 FIFA World Cup, Germany played Brazil in a semifinal match. Germany scored after 11 minutes and again at the 23 minute mark. At that point in the match, how many goals would you expect Germany to score after 90 minutes? What was the probability that they would score 5 more goals (as, in fact, they did)?

In this version, notice that the data is not the number of goals in a fixed period of time, but the time between goals.

We will compute that using [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution). (Note. The exponential distribution is a special case of the previously mentioned gamma distribution -> see ch08/The Gamma Distribution).

if the goal-scoring is $\lambda$, the probability of seeing an interval between goals of `t` is proportional to PDF of the exponential distribution:

$\lambda * exp(-\lambda * t)$

`Distributions` got `Exponential` distributuon but it does not take `lam` so working with it is not stragithforward. Therefore our own function below:

In [None]:
# t - time between two events in the poisson distribution
# lam - lambda (avg) of success in the poisson distribution
# returns prob of seing an interval t between two events in the poisson distribution
function getExpoPdf(t::A, lam::B)::Float64 where {A<:Num, B<:Num}
    return lam * exp(-lam * t)
end

To see what the exponential distribution looks like, let’s assume again that `lam` is 1.4

In [None]:
lam = 1.4
qs = range(0, 4, 101) |> collect
ps = getExpoPdf.(qs, lam)
pmfTime = Pmf(qs, ps);

In [None]:
fig = Cmk.Figure()
ax, ln = Cmk.lines(fig[1, 1],
    pmfTime.names, pmfTime.priors,
    axis=(;title="Distribution of time between goals",
        xlabel="Time between goals (games)",
        ylabel="PMF",
        xticks=0:0.5:4,
        yticks=0:0.01:0.06
    )
)
Cmk.axislegend(ax,
    [ln],
    ["exponential with λ = 1.4"]
)
fig

It is counterintuitive, but true, that the most likely time to score a goal is immediately. After that, the probability of each successive interval is a little lower.

With a goal-scoring rate of 1.4, it is possible that a team will take more than one game to score a goal, but it is unlikely that they will take more than two games.

## Summary

This chapter introduces three new distributions [...]. Let’s review:
- If a system satisfies the assumptions of a Poisson model, the number of events in a period of time follows a Poisson distribution, which is a discrete distribution with integer quantities from 0 to infinity. In practice, we can usually ignore low-probability quantities above a finite limit.

- Also under the Poisson model, the interval between events follows an exponential distribution, which is a continuous distribution with quantities from 0 to infinity. Because it is continuous, it is described by a probability density function (PDF) rather than a probability mass function (PMF). But when we use an exponential distribution to compute the likelihood of the data, we can treat densities as unnormalized probabilities.

- The Poisson and exponential distributions are parameterized by an event rate, denoted $\lambda$ or `lam`.

- For the prior distribution of $\lambda$, I (A.B.D.) used a gamma distribution, which is a continuous distribution with quantities from 0 to infinity, but I (A.B.D.) approximated it with a discrete, bounded PMF. The gamma distribution has one parameter, denoted $\alpha$ or alpha, which is also its mean.

## Exercises

### Exercise 1

> In the 2014 FIFA World Cup, Germany played Brazil in a semifinal match. Germany scored after 11 minutes and again at the 23 minute mark. At that point in the match, how many goals would you expect Germany to score after 90 minutes? What was the probability that they would score 5 more goals (as, in fact, they did)?

Here are the steps I (A.D.) recommend:

1. Starting with the same gamma prior we used in the previous problem, compute the likelihood of scoring a goal after 11 minutes for each possible value of lam. Don’t forget to convert all times into games rather than minutes.

2. Compute the posterior distribution of lam for Germany after the first goal.

3. Compute the likelihood of scoring another goal after 12 more minutes and do another update. Plot the prior, posterior after one goal, and posterior after two goals.

4. Compute the posterior predictive distribution of goals Germany might score during the remaining time in the game, 90-23 minutes. Note: You will have to think about how to generate predicted goals for a fraction of a game.

5. Compute the probability of scoring 5 or more goals during the remaining time.

#### Ex 1.1

In [None]:
alpha = 1.4 # the average is 1.4 per game
qs = range(0, 10, 101) |> collect # possible real values of lambda for germany
ps = Dsts.pdf.(Dsts.Gamma(alpha), qs); # probs of those values (qs)

In [None]:
germany = Pmf(qs, ps)

In [None]:
t = 11 / 90 # 1.4 goals per game (per 90 mins), so t = 11 min per game (per 90 mins)
germany.likelihoods = getExpoPdf.(t, germany.names)
germany.likelihoods = germany.likelihoods ./ sum(germany.likelihoods);

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1],
    germany.names, germany.priors,
    color="blue",
    axis=(;title="Germany goal scoring distribution",
        ylabel="PMF",
        xlabel="Goal scoring rate (λ)",
        xticks=0:10
    )
)
l2 = Cmk.lines!(fig[1, 1],
    germany.names, germany.likelihoods,
    color="orange", linestyle=:dash
)
Cmk.axislegend(ax1,
    [l1, l2],
    ["Priors", "Likelihoods (goal in 11 min)"]
)
fig

#### Ex 1.2

In [None]:
updatePosteriors!(germany, true);

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1],
    germany.names, germany.priors,
    color="blue",
    axis=(;title="Germany goal scoring distribution",
        ylabel="PMF",
        xlabel="Goal scoring rate (λ)",
        xticks=0:10
    )
)
l2 = Cmk.lines!(fig[1, 1],
    germany.names, germany.posteriors,
    color="orange", linestyle=:dash
)
Cmk.axislegend(ax1,
    [l1, l2],
    ["Priors", "Posteriors (goal in 11 min)"]
)
fig

### Ex 1.3

In [None]:
function updateExponential!(pmf::Pmf{A}, data::B) where {A<:Num, B<:Num}
    setLikelihoods!(pmf, getExpoPdf.(data, pmf.names))
    updatePosteriors!(pmf, true)
    return nothing
end

In [None]:
germany2 = Pmf(germany.names |> copy, germany.posteriors |> copy)
# 1st goal scored at 11 min mark, 2nd goal scoree at 23 minute mark
updateExponential!(germany2, (23-11)/90)

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1],
    germany.names, germany.priors,
    color="blue",
    axis=(;title="Germany goal scoring distribution",
        ylabel="PMF",
        xlabel="Goal scoring rate (λ)",
        xticks=0:10
    )
)
l2 = Cmk.lines!(fig[1, 1],
    germany.names, germany.posteriors,
    color="orange", linestyle=:dash, linewidth=4,
)
l3 = Cmk.lines!(fig[1, 1],
    germany2.names, germany2.posteriors,
    color="lightblue", linestyle=:dashdot, linewidth=4,
)
Cmk.axislegend(ax1,
    [l1, l2, l3],
    ["Priors", "Posteriors (goal in 11 min)", "Posteriors (goal in 11, 23 min)"]
)
fig

### Ex 1.4

In [None]:
alpha = 1.4 # avg num goals per game, based on games from world cup
t = (90 - 23) / 90 # goals after the prev. goal at 23 min as fraction of whole game
qs = germany.names |> copy # possible 'real' values of lambda for germany
goals = 0:10 |> collect; # goals scored by a team during a game

In [None]:
# q*t (goal scoring rate in remaining part of game)
pmfSeq = [getPoisPmf(q*t, goals) for q in qs];

In [None]:
germany3 = getMixture(
    germany2, # best estimate of germany real goal scoring rate
    pmfSeq, # possible goal scoring rates for the remaining game fraction
    map(x -> string("λ = ", x), qs), # prepend "λ ="
    0:10 |> collect, # number of possible goals to score in remaining time
    false) # use posteriors from germany2

In [None]:
fig = Cmk.Figure()
Cmk.barplot(fig[1, 1],
        germany3.outcome, germany3.posteriors,
        axis=(;
              title="Posterior predictive distribution for Germany",
              xlabel="Number of goals in remaining part of the game",
              ylabel="PMF",
              xticks=0:10
              )
)
fig

### Ex 1.5

In [None]:
# probability of scoring exactly 5 goals in the remaining fraction of game
# 6, because 0:10 goals, and we got indexing starting at 1
germany3[6, ["outcome", "posteriors"]]

In [None]:
# probability of scoring 5 or more goals in the remaining fraction of game
# 6, because 0:10 goals, and we got indexing starting at 1
germany3[6:end, ["outcome", "posteriors"]]

In [None]:
germany3[6:end, "posteriors"] |> sum

### Exercise 2
Returning to the first version of the World Cup Problem. Suppose France and Croatia play a rematch. What is the probability that France scores first?

*Hint*: Compute the posterior predictive distribution for the time until the first goal by making a mixture of exponential distributions. You can use the following function to make a PMF that approximates an exponential distribution.

In [None]:
# Make a PMF of an exponential distribution.
# lam: event rate
# high: upper bound on the interval `t`
# returns: Pmf of the interval (names) between events
function getExpoPmf(lam::A, high::B)::Pmf{Float64} where {A<:Num, B<:Num}
    qs::Vector{Float64} = range(0, high, 101)
    ps::Vector{Float64} = getExpoPdf.(qs, lam)
    return Pmf(qs, ps)
end

In [None]:
lam = 1.4
qs = france.names |> copy
# previously posterior predictive distribution was the number of goals the team is expected to score
# here time until the first goal for different true (unknown) values of lambda
# france.names - possible values of lambda
# high time till first goal ()
pmfSeq = [
    getExpoPmf(lam, 4)
    for lam in france.names
]
# for lam = 0, time betw goals is undefined, so priors are 0s,
# and default normalizing (p ./ sum(ps)) of priors by Pmf constructor will produce NaN
# we replace them with 0s
pmfSeq[1].priors .= 0.0

In [None]:
predFrance = getMixture(
    france,
    pmfSeq,
    map(x -> string("λ = ", x), qs), # postulated true lambdas
    range(0, 4, 101) |> collect
);
predCroatia = getMixture(
    croatia,
    pmfSeq,
    map(x -> string("λ = ", x), qs), # postulated true lambdas
    range(0, 4, 101) |> collect
);

In [None]:
fig = Cmk.Figure()
ax, l1 = Cmk.lines(fig[1, 1],
    predFrance.outcome, predFrance.posteriors,
    axis=(;title="Posterior predictive distributions",
        xlabel="time between goals",
        ylabel="PMF"
    )
)
l2 = Cmk.lines!(fig[1, 1],
    predCroatia.outcome, predCroatia.posteriors,
)
Cmk.axislegend(ax,
    [l1, l2],
    ["France", "Croatia"],
    "Team"
)
fig

In [None]:
# prob pmf1.names < pmf2.names, so less time between goals for france vs. croatia
# therefore that france scores first goal
getProbLt(
    Pmf(predFrance.outcome, predFrance.posteriors), # posteriros as priors in new Pmf
    Pmf(predCroatia.outcome, predCroatia.posteriors), # posteriros as priors in new Pmf
    true
)

### Exercise 3

In the 2010-11 National Hockey League (NHL) Finals [...] Boston Bruins played a best-of-seven championship against [...] Vancouver Canucks. Boston lost the first two games 0-1 and 2-3, then won the next two games 8-1 and 4-0. At this point in the series, what is the probability that Boston will win the next game, and what is their probability of winning the championship?

To choose a prior distribution, I got some statistics from http://www.nhl.com, specifically the average goals per game for each team in the 2010-11 season. The distribution is well modeled by a gamma distribution with mean 2.8.

In what ways do you think the outcome of these games might violate the assumptions of the Poisson model? How would these violations affect your predictions?

In [None]:
goalAvg = 2.8
qs = range(0, 15, 101) |> collect
# prob of having lambda distribution with a given avg
# when the observed avg is 2.8 (compare beginning of the chapter)
ps = Dsts.pdf.(Dsts.Gamma(goalAvg), qs) 
ex3 = Pmf(qs, ps)

In [None]:
boston = Pmf(ex3.names |> copy, ex3.priors |> copy)
for bostonGoal in [0, 2, 8, 4] # goals scored by boston in first 4 games
    updatePoisson!(boston, bostonGoal)
end

In [None]:
vancouver = Pmf(ex3.names |> copy, ex3.priors |> copy)
for vancouverGoal in [1, 3, 1, 0] # goals scored by vancouver in first 4 games
    updatePoisson!(vancouver, vancouverGoal)
end

In [None]:
fig = Cmk.Figure()
ax1, l1 = Cmk.lines(fig[1, 1], ex3.names, ex3.priors,
    axis=(;title="Goal scoring rage distribution for hockey teams",
    xlabel="Goal scoring rate (λ)",
    ylabel="PMF"),
    linestyle=:dashdot
)
l2 = Cmk.lines!(fig[1, 1], boston.names, boston.posteriors)
l3 = Cmk.lines!(fig[1, 1], vancouver.names, vancouver.posteriors)
Cmk.axislegend(
    ax1,
    [l1, l2, l3],
    ["priors", "posteriors (Boston)", "posteriors (Vancouver)"]
)
fig

In [None]:
goals = 0:14 |> collect # postulated range of goals possible to score in a game
# for each possible goal scoring average (lam) probs of scoring given nums of goals
pmfSeqBoston = [getPoisPmf(lam, goals) for lam in boston.names];

In [None]:
predBoston = getMixture(
    boston, # probs of getting a dist in pmfSeqBoston
    pmfSeqBoston,
    map(x -> string("λ = ", x), boston.names),
    0:14 |> collect,
    false 
)

predBoston = Pmf(predBoston.outcome, predBoston.posteriors)

In [None]:
Cmk.barplot(predBoston.names, predBoston.priors,
            axis=(;title="Posterior predictive distribution for Boston team",
                  xlabel="Number of goals",
                  ylabel="PMF", xticks=0:14))

In [None]:
goals = 0:14 |> collect # postulated range of goals possible to score in a game
# for each possible goal scoring average (lam) probs of scoring given nums of goals
pmfSeqVancouver = [getPoisPmf(lam, goals) for lam in vancouver.names];

In [None]:
predVancouver = getMixture(
    vancouver, # probs of getting a dist in pmfSeqVancouver
    pmfSeqVancouver,
    map(x -> string("λ = ", x), vancouver.names),
    0:14 |> collect,
    false 
)

predVancouver = Pmf(predVancouver.outcome, predVancouver.posteriors)

In [None]:
Cmk.barplot(predVancouver.names, predVancouver.priors,
            axis=(;title="Posterior predictive distribution for Vancouver team",
                  xlabel="Number of goals",
                  ylabel="PMF", xticks=0:14))

In [None]:
pBostonWin = getProbGt(predBoston, predVancouver, true)
pBostonDraw = getProbEq(predBoston, predVancouver, true)
pBostonLose = getProbLt(predBoston, predVancouver, true)

pBostonWin, pBostonDraw, pBostonLose

In [None]:
# prob of Boston team winning the next game
pBostonWin + pBostonDraw / 2 # winning half of the ties in tiebreak?

In [None]:
# prob of Boston team winning the championship
# e.g. best-of-seven, 4 games played, 3 games remain
# score so far is 2-2, so must win 2 of 3 games
# from docs: Dsts.Binom(n, p) and Dsts.pdf(binomDist, num success)
Dsts.pdf.(Dsts.Binomial(3, pBostonWin + pBostonDraw / 2), 2:3) |> sum