# Chapter 8. Poisson Processes

[Link to chapter online](https://allendowney.github.io/ThinkBayes2/chap08.html)

## Warning

The content of this file may be incorrect, erroneous and/or harmful. Use it at Your own risk.

## Imports

In [None]:
include("./pmfAndCdf.jl")
#import .ProbabilityMassFunction as Pmf # if defined with module (and export) in pmfAndCdf.jl
#using .ProbabilityMassFunction # if defined with module and export in pmfAndCdf.jl

include("./simplestat.jl")
#import .SimpleStatistics as Ss # if defined with module (and export) in simplestat.jl
#using .ProbabilityMassFunction # if defined with module and export in simplestat.jl

### The World Cup Problem

In the 2018 FIFA World Cup final, France defeated Croatia 4 goals to 2. Based on this outcome:
- How confident should we be that France is the better team?
- If the same teams played again, what is the chance France would win again?

To answer these questions, we have to make some modeling decisions.
- First, I’ll assume that for any team against another team there is some unknown goal-scoring rate, measured in goals per game, which I’ll denote with the Python variable `lam` or the Greek letter $\lambda$, pronounced “lambda”.
- Second, I’ll assume that a goal is equally likely during any minute of a game. So, in a 90 minute game, the probability of scoring during any minute is $\lambda / 90$.
- Third, I’ll assume that a team never scores twice during the same minute."

## The Poisson Distribution

If the number of goals scored in a game follows a Poisson distribution with a goal-scoring rate $\lambda$, the probability of scoring $k$ goals is

$\lambda^{k} * exp(-\lambda)/k!$

or

$\frac{\lambda^{k} * exp^{-\lambda}}{k!}$

or

$\frac{\lambda^{k}}{k!}e^{-\lambda}$

for any non-negative value of $k$.

In [None]:
lam = 1.4
distPois = Dsts.Poisson(lam)

In [None]:
# k goals
k = 4
Dsts.pdf(distPois, k)

This result implies that if the average goal-scoring rate is 1.4 goals per game, then probability of scoring 4 goals in a game is ~4%.

In [None]:
# custom type
Num = Union{Int, Float64}

In [None]:
function getPoisPmf(
        lam::A,
        qs::Vector{B}
        )::Pmf{B} where {A<:Num, B<:Num}
    ps::Vector{Float64} = Dsts.pdf.(Dsts.Poisson(lam), qs)
    pmf::Pmf{B} = Pmf(qs, ps)
    return pmf
end

In [None]:
lam = 1.4
goals = 0:10 |> collect
pmfGoals = getPoisPmf(lam, goals)

In [None]:
fig = Cmk.Figure()
ax, bp = Cmk.barplot(fig[1, 1],
    pmfGoals.names, pmfGoals.priors,
    axis=(;title="Distribution of goals scored",
        xlabel="Number of goals",
        ylabel="PMF",
        xticks=0:10
    )
)
Cmk.axislegend(
    ax,
    [bp],
    ["λ = 1.4"],
    "Poisson distribution"
)
fig

[...]

Now let’s turn it around: given a number of goals, what can we say about the goal-scoring rate?

To answer that, we need to think about the prior distribution of `lam`, which represents the range of possible values and their probabilities before we see the score.

## The Gamma Distribution

Using [data](https://www.statista.com/statistics/269031/goals-scored-per-game-at-the-fifa-world-cup-since-1930/) [...] each team scores about 1.4 goals per game, on average (so $\lambda = 1.4$).

For a good team against a bad one, $\lambda$ will be higher; for a bad team against a good one, it will be lower.

We will model it with [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution). It is usually used to model the time that passes before $\alpha$ occurences of a randomly occuring event (an event that occurs via Poisson process).

In [None]:
alpha = 1.4
qs = range(0, 10, 101)
ps = Dsts.pdf.(Dsts.Gamma(alpha), qs);

The parameter, `alpha`, is the mean of the distribution. The `qs` are possible values of `lam` between 0 and 10. The `ps` are probability densities, which we can think of as unnormalized probabilities.

In [None]:
lambdasPmf = Pmf(qs |> collect, ps)

In [None]:
fig = Cmk.Figure()
Cmk.lines(fig[1, 1],
    lambdasPmf.names, lambdasPmf.priors,
    color="lightgray", linestyle=:dash, linewidth=3,
    axis=(;title="Prior distribution of λ",
        xlabel="Goal scoring rate (λ)", ylabel="PMF",
        xticks=0:2:10, yticks=0:0.01:0.06)
)
fig

This distribution represents our prior knowledge about goal scoring: 'lam' is usually less than 2, occasionally as high as 6, and seldom higher than that.

We can confirm that mean is about 1.4.

In [None]:
getMean(lambdasPmf, true),
getQuantile(lambdasPmf, 0.5, true),
getCredibleInterval(lambdasPmf, 0.5, true)

[...] could disagree about [...] the prior, but this is good enough to get started.