# Poisson Processes

In [None]:
from utils import set_pyplot_params
set_pyplot_params()

Here we will introduce the [Poisson process](https://en.wikipedia.org/wiki/Poisson_point_process), which is a model used to describe events that occur at random intervals.
As an example of a Poisson process, we'll model goal-scoring in football (not American football).
We'll use goals scored in a game to estimate the parameter of a Poisson process; then we'll use the posterior distribution to make predictions.

And we'll solve The World Cup Problem.

## The World Cup Problem

In the 2018 FIFA World Cup final, France defeated Croatia 4 goals to 2.  Based on this outcome:

1. How confident should we be that France is the better team?

2. If the same teams played again, what is the chance France would win again?

To answer these questions, we have to make some modeling decisions.

* First, we'll assume that for any team against another team there is some unknown goal-scoring rate, measured in goals per game, which we'll denote with the Python variable `lam` or the Greek letter $\lambda$.

* Second, we'll assume that a goal is equally likely during any minute of a game.  So, in a 90 minute game, the probability of scoring during any minute is $\lambda/90$.

* Third, we'll assume that a team never scores twice during the same minute.

Of course, none of these assumptions is completely true in the real world, but they are reasonable simplifications.
As George Box said, "All models are wrong; some are useful."
(https://en.wikipedia.org/wiki/All_models_are_wrong).

In this case, the model is useful because if these assumptions are 
true, at least roughly, the number of goals scored in a game follows a Poisson distribution, at least roughly.

## The Poisson Distribution

If the number of goals scored in a game follows a [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution) with a goal-scoring rate, $\lambda$, the probability of scoring $k$ goals is

$$\lambda^k \exp(-\lambda) ~/~ k!$$

for any non-negative value of $k$.

SciPy provides a `poisson` object that represents a Poisson distribution.
We can create one with $\lambda=1.4$ like this:

- The result is an object that represents a "frozen" random variable and provides `pmf`, which evaluates the probability mass function of the Poisson distribution.


- `frozen` means that the parameter $\lambda$ is fixed at 1.4 and we can use methods like `.pmf()`, `.cdf()`, `.rvs()` without needing to specify $\lambda$ again.

- This result implies that if the average goal-scoring rate is 1.4 goals per game, the probability of scoring 4 goals in a game is about 4%.


- We'll use the following function to make a `Pmf` that represents a Poisson distribution.

In [None]:
from empiricaldist import Pmf

def make_poisson_pmf(lam, qs):
    """Make a Pmf of a Poisson distribution."""
    ps = poisson(lam).pmf(qs)
    pmf = Pmf(ps, qs)
    pmf.normalize()
    return pmf

- `make_poisson_pmf` takes as parameters the goal-scoring rate, `lam`, and an array of quantities, `qs`, where it should evaluate the Poisson PMF.  It returns a `Pmf` object.

- For example, here's the distribution of goals scored for `lam=1.4`, computed for values of `k` from 0 to 9.

And here's what it looks like.

In [None]:
from utils import decorate

def decorate_goals(title=''):
    decorate(xlabel='Number of goals',
        ylabel='PMF',
        title=title)

- The most likely outcomes are 0, 1, and 2; higher values are possible but increasingly unlikely.
Values above 7 are negligible.

- This distribution shows that if we know the goal scoring rate, we can predict the number of goals.


- Now let's turn it around: given a number of goals, what can we say about the goal-scoring rate?


- To answer that, we need to think about the prior distribution of `lam`, which represents the range of possible values and their probabilities before we see the score.

## The Gamma Distribution

- If you have ever seen a soccer game, you have some information about `lam`.  

- In most games, teams score a few goals each.  In rare cases, a team might score more than 5 goals, but they almost never score more than 10.


- Using [data from previous World Cups](https://www.statista.com/statistics/269031/goals-scored-per-game-at-the-fifa-world-cup-since-1930/), we estimate that ***each team*** scores about 1.4 goals per game, on average.  So we'll set the mean of `lam` to be 1.4.


- For a good team against a bad one, we expect `lam` to be higher; for a bad team against a good one, we expect it to be lower.

- To model the distribution of goal-scoring rates, i.e. $\gamma$, we'll use a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution), which we chose because:


1. The goal scoring rate is continuous and non-negative, and the gamma distribution is appropriate for this kind of quantity.

2. The gamma distribution has only one parameter, `alpha`, which is the mean.  So it's easy to construct a gamma distribution with the mean we want.

3. As we'll see, the shape of the gamma distribution is a reasonable choice, given what we know about soccer.


- And there's one more reason, which I will reveal in <<_ConjugatePriors>>.


- SciPy provides `gamma`, which creates an object that represents a gamma distribution.
And the `gamma` object provides provides `pdf`, which evaluates the  **probability density function** (PDF) of the gamma distribution.


- Here's how we use it.

The parameter, `alpha`, is the mean of the distribution.
The `qs` are possible values of `lam` between 0 and 10.
The `ps` are **probability densities**, which we can think of as unnormalized probabilities.

To normalize them, we can put them in a `Pmf` and call `normalize`:

- Recall that the result of the script above, 9.88936, is the original sum of the unnormalized "probabilities" before normalizing the PMF. 

- After normalization, the sum is 1.

The result is a discrete approximation of a gamma distribution.
Here's what it looks like.

In [None]:
def decorate_rate(title=''):
    decorate(xlabel='Goal scoring rate (lam)',
        ylabel='PMF',
        title=title)

In [None]:
prior.plot(ls='--', label='prior', color='C5')
decorate_rate(r'Prior distribution of $\lambda$')

This distribution represents our ***prior knowledge*** about goal scoring: `lam` is usually less than 2, occasionally as high as 6, and seldom higher than that.  

And we can confirm that the mean is about 1.4.

As usual, reasonable people could disagree about the details of the prior, but this is good enough to get started.  Let's do an update.

## The Update

Suppose you are given the goal-scoring rate, $\lambda$, and asked to compute the probability of scoring a number of goals, $k$.  That is precisely the question we answered by computing the Poisson PMF.

For example, if $\lambda$ is 1.4, the probability of scoring 4 goals in a game is:

Now suppose we are have an array of possible values for $\lambda$; we can compute the likelihood of the data for each hypothetical value of `lam`, like this:

- And that's all we need to do the update.

- To get the posterior distribution, we multiply the prior by the likelihoods we just computed and normalize the result.


- The following function encapsulates these steps.

In [None]:
def update_poisson(pmf, data):
    """Update Pmf with a Poisson likelihood."""
    k = data
    lams = pmf.qs
    likelihood = poisson(lams).pmf(k)
    pmf *= likelihood
    pmf.normalize()

The first parameter is the prior; the second is the number of goals.

In the example, France scored 4 goals, so I'll make a copy of the prior and update it with the data.

Here's what the posterior distribution looks like, along with the prior.

In [None]:
prior.plot(ls='--', label='prior', color='C5')


decorate_rate(r'Posterior distribution of $\lambda$ for France')

The data, `k=4`, makes us think higher values of `lam` are more likely and lower values are less likely.  So the posterior distribution is shifted to the right.

Let's do the same for Croatia (that scored 2 goals in the final):

And here are the results.

In [None]:
prior.plot(ls='--', label='prior', color='C5')
croatia.plot(label='Croatia posterior', color='C0')

decorate_rate(r'Posterior distribution of $\lambda$ for Croatia')

Here are the posterior means for these distributions.

- The mean of the prior distribution is about 1.4.

- After Croatia scores 2 goals, their posterior mean is 1.7, which is near the midpoint of the prior and the data.

- Likewise after France scores 4 goals, their posterior mean is 2.7.


- These results are typical of a Bayesian update: ***the location of the posterior distribution is a compromise between the prior and the data***.

## Probability of Superiority

- Now that we have a posterior distribution for each team, we can answer the first question: How confident should we be that France is the better team?


- In the model, "better" means having a higher goal-scoring rate against the opponent.  We can use the posterior distributions to compute the probability that a random value drawn from France's distribution exceeds a value drawn from Croatia's.


- One way to do that is to enumerate all pairs of values from the two distributions, adding up the total probability that one value exceeds the other.

In [None]:
def prob_gt(pmf1, pmf2):
    """Compute the probability of superiority."""
    total = 0
    for q1, p1 in pmf1.items():
        for q2, p2 in pmf2.items():
            if q1 > q2:
                total += p1 * p2
    return total

Here's how we use it:

`Pmf` provides a function that does the same thing.

- The results are slightly different because `Pmf.prob_gt` uses array operators rather than `for` loops.


- Either way, the result is close to 75%.  So, on the basis of one game, we have moderate confidence that France is actually the better team.


- Of course, we should remember that this result is based on the assumption that the goal-scoring rate is constant.
In reality, if a team is down by one goal, they might play more aggressively toward the end of the game, making them more likely to score, but also more likely to give up an additional goal.


- As always, the results are only as good as the model.

## Predicting the Rematch

- Now we can take on the second question: If the same teams played again, what is the chance Croatia would win?

- To answer this question, we'll generate the "posterior predictive distribution", which is the number of goals we expect a team to score.


- If we knew the goal scoring rate, `lam`, the distribution of goals would be a Poisson distribution with parameter `lam`.

- Since we don't know `lam`, the distribution of goals is a mixture of a Poisson distributions with different values of `lam`.


- First we'll generate a sequence of `Pmf` objects, one for each value of `lam`.

In [None]:
pmf_seq = [make_poisson_pmf(lam, goals) 
           for lam in prior.qs]

The following figure shows what these distributions look like for a few values of `lam`.

In [None]:
import matplotlib.pyplot as plt

for i, index in enumerate([10, 20, 30, 40]):
    plt.subplot(2, 2, i+1)
    lam = prior.qs[index]
    pmf = pmf_seq[index]
    pmf.bar(label=f'$\lambda$ = {lam}', color='C3')
    decorate_goals()

- The predictive distribution is a mixture of these `Pmf` objects, weighted with the posterior probabilities.


- We can use `make_mixture` from <<_GeneralMixtures>> to compute this mixture.

Here's the posterior predictive distribution for the number of goals France would score in a rematch.

In [None]:
# Define possible goal counts (D values)
# Possible number of goals (0 to 9)


# Initialize the posterior predictive distribution


# Loop over each lambda value in the posterior
for lam, p_lambda in zip(france.qs, france.ps):  # Iterate over posterior (lambda values and probabilities)
    pred_france += p_lambda * poisson.pmf(goal_counts, lam)  # Weighted sum of Poisson distributions

# Normalize (optional, should already sum to 1)


Plot the result

In [None]:
plt.bar(goal_counts, pred_france, color='C3', label="France")
plt.xlabel("Number of goals")
plt.ylabel("PMF")
plt.title("Posterior Predictive Distribution")
plt.legend()
plt.show()

This distribution represents two sources of uncertainty: we don't know the actual value of `lam`, and even if we did, we would not know the number of goals in the next game.

Here's the predictive distribution for Croatia.

In [None]:
# Initialize the posterior predictive distribution
pred_croatia = np.zeros_like(goal_counts, dtype=float)

# Loop over each lambda value in the posterior
for lam, p_lambda in zip(croatia.qs, croatia.ps):  # Iterate over posterior (lambda values and probabilities)
    pred_croatia += p_lambda * poisson.pmf(goal_counts, lam)  # Weighted sum of Poisson distributions

# Normalize (optional, should already sum to 1)
pred_croatia /= pred_croatia.sum()

In [None]:
plt.bar(goal_counts, posterior_pred_croatia, color='C3', label="Croatia")
plt.xlabel("Number of goals")
plt.ylabel("PMF")
plt.title("Posterior Predictive Distribution")
plt.legend()
plt.show()

### What is the difference between a Posterior distribution and a Posterior Predictive distribution?

#### 1. Posterior Distribution $P(\lambda \mid D)$

The **posterior distribution** represents our **updated belief** about the **parameter** $\lambda$ (e.g., a team's goal-scoring rate) after observing data $D$.

It is computed using **Bayes' Rule**:

$$
P(\lambda \mid D) = \frac{P(D \mid \lambda) P(\lambda)}{P(D)}
$$

where:

- $P(\lambda)$ is the **prior** (belief before data).
- $P(D \mid \lambda)$ is the **likelihood** (how likely the data is given $\lambda$).
- $P(D)$ is the **marginal likelihood** (normalizing factor).

The **posterior** gives us a **distribution over possible values of** $\lambda$ **based on observed data**.

#### **Example: Posterior for France’s Goal-Scoring Rate $\lambda$**
- Before observing data, we may assume $\lambda \sim \text{Gamma}(1.4)$.
- After observing **France scores 4 goals**, we update our belief to obtain a **posterior distribution over** $\lambda$.

---

#### 2. Posterior Predictive Distribution $P(D_{\text{new}} \mid D)$

The **posterior predictive distribution** describes **what we expect future data to look like**, given:

- The **posterior distribution** of $\lambda$.
- The **likelihood function**.

It is computed by **integrating out** $\lambda$:

$$
P(D_{\text{new}} \mid D) = \int P(D_{\text{new}} \mid \lambda) P(\lambda \mid D) d\lambda
$$

This means we:

1. **Sample** $\lambda$ from the **posterior** $P(\lambda \mid D)$.
2. **Use** $\lambda$ to generate **new predictions** using the likelihood $P(D_{\text{new}} \mid \lambda)$.

#### **Example: Posterior Predictive for Future Goals**
- Suppose we now want to **predict how many goals France will score in a rematch**.
- Since **$\lambda$ is uncertain**, we **don’t use a single Poisson distribution** $\text{Poisson}(\lambda)$.
- Instead, we **average over all possible $\lambda$ values** weighted by their **posterior probability**.


We can use these distributions to compute the **probability that France wins, loses, or ties** the rematch.

In [None]:
# Convert to Pmf objects for comparison
pred_france_pmf = Pmf(pred_france, goal_counts)
pred_croatia_pmf = Pmf(pred_croatia, goal_counts)

# Compute probabilities
win = Pmf.prob_gt(pred_france_pmf, pred_croatia_pmf)
lose = Pmf.prob_lt(pred_france_pmf, pred_croatia_pmf)
tie = Pmf.prob_eq(pred_france_pmf, pred_croatia_pmf)

# Compute final win probability
win_probability = win + tie / 2

# Print the results
print("France's Probability of Winning:", win_probability)
print("Probability France loses:", lose)
print("Probability of a tie:", tie)

Assuming that France wins half of the ties, their chance of winning the rematch is about 65%.

This is a bit lower than their probability of superiority, which is 75%. And that makes sense, because we are less certain about the outcome of a single game than we are about the goal-scoring rates.
Even if France is the better team, they might lose the game.

## Summary

* If a system satisfies the assumptions of a Poisson model, the number of events in a period of time follows a Poisson distribution, which is a discrete distribution with integer quantities from 0 to infinity. In practice, we can usually ignore low-probability quantities above a finite limit.

* For the prior distribution of $\lambda$, we used a gamma distribution, which is a continuous distribution with quantities from 0 to infinity, but we approximated it with a discrete, bounded PMF. The gamma distribution has one parameter, denoted $\alpha$ or `alpha`, which is also its mean.

- We chose the gamma distribution because the shape is consistent with our background knowledge about goal-scoring rates.

- There are other distributions we could have used; however, we will see in <<_ConjugatePriors>> that the gamma distribution can be a particularly good choice.


# Conjugate Priors

In [None]:
from utils import set_pyplot_params
set_pyplot_params()

- Earlier we have used what's called grid approximations to solve some problems.


- One of the goals has been to show that this approach is sufficient to solve many real-world problems.

- And it's a good place to start because it shows clearly how the methods work.


- However, as we increase the number of parameters, the number of points in the grid grows (literally) exponentially.

- With more than 3-4 parameters, grid methods become impractical.


- Here, we will discuss an alternative **conjugate priors**.


- We'll start with the World Cup problem.

## The World Cup Problem Revisited

- Previously, we solved the World Cup problem using a Poisson process to model goals in a soccer game as random events that are equally likely to occur at any point during a game.


- We used a gamma distribution to represent the prior distribution of $\lambda$, the goal-scoring rate.  And we used a Poisson distribution to compute the probability of $k$, the number of goals scored.


- Here's a gamma object that represents the prior distribution.

In [None]:
from scipy.stats import gamma

alpha = 1.4
dist = gamma(alpha)

And here's a grid approximation.

In [None]:
import numpy as np
from utils import pmf_from_dist

lams = np.linspace(0, 10, 101)
prior = pmf_from_dist(dist, lams)

Here's the likelihood of scoring 4 goals for each possible value of `lam`.

And here's the update.

- So far, this should be familiar.

- Now we'll solve the same problem using the **conjugate prior**.

### The Conjugate Prior

- One reason we earlier chose the gamma distribution is that it is the "conjugate prior" of the Poisson distribution, so-called because the two distributions are connected or coupled, which is what "conjugate" means.


- In the next section we'll show *how* they are connected, but first we'll look at the consequence of this connection, which is that there is a remarkably simple way to compute the posterior distribution.


- However, in order to demonstrate it, we have to switch from the one-parameter version of the gamma distribution to the two-parameter version.  Since the first parameter is called `alpha`, you might guess that the second parameter is called `beta`.


- The following function takes `alpha` and `beta` and makes an object that represents a gamma distribution with those parameters.

In [None]:
def make_gamma_dist(alpha, beta):
    """Makes a gamma object."""
    dist = gamma(alpha, scale=1/beta)
    dist.alpha = alpha
    dist.beta = beta
    return dist

Here's the prior distribution with `alpha=1.4` again and `beta=1`. 

Now I claim without proof that we can do a Bayesian update with `k` goals just by making a gamma distribution with parameters `alpha+k` and `beta+1`.

In [None]:
def update_gamma(prior, data):
    """Update a gamma prior."""
    k, t = data
    alpha = prior.alpha + k
    beta = prior.beta + t
    return make_gamma_dist(alpha, beta)

Here's how we update it with `k=4` goals in `t=1` game.

- After all the work we did with the grid, it might seem absurd that we can do a Bayesian update by adding two pairs of numbers.


- So let's confirm that it works.


- We'll make a `Pmf` with a discrete approximation of the posterior distribution.

The following figure shows the result along with the posterior we computed using the grid algorithm.

In [None]:

def decorate_rate(title=''):
    decorate(xlabel='Goal scoring rate (lam)',
             ylabel='PMF',
             title=title)

In [None]:
posterior.plot(label='grid posterior', color='C1')
posterior_conjugate.plot(label='conjugate posterior', 
                         color='C4', ls=':')

decorate_rate('Posterior distribution')

#### What does `np.allclose()` Do?


- It checks element-wise if the two arrays are nearly equal within a tolerance.


- It returns True if all corresponding elements are close enough based on a default or specified tolerance.


- It is useful for verifying numerical stability in Bayesian updates or probabilistic models.

### How does that work?

To understand how that works, we'll write the PDF of the gamma prior and the PMF of the Poisson likelihood, then multiply them together, because that's what the Bayesian update does.
We'll see that the result is a gamma distribution, and we'll derive its parameters.

Here's the PDF of the gamma prior, which is the probability density for each value of $\lambda$, given parameters $\alpha$ and $\beta$:

$$\lambda^{\alpha-1} e^{-\lambda \beta}$$

We have omitted the normalizing factor; since we are planning to normalize the posterior distribution anyway, we don't really need it.

Now suppose a team scores $k$ goals in $t$ games.
The probability of this data is given by the PMF of the Poisson distribution, which is a function of $k$ with $\lambda$ and $t$ as parameters.

$$\lambda^k e^{-\lambda t}$$

Again, we have omitted the normalizing factor, which makes it clearer that the gamma and Poisson distributions have the same functional form.
When we multiply them together, we can pair up the factors and add up the exponents.
The result is the unnormalized posterior distribution,

$$\lambda^{\alpha-1+k} e^{-\lambda(\beta + t)}$$

which we can recognize as an unnormalized gamma distribution with parameters $\alpha + k$ and $\beta + t$.

This derivation provides insight into what the parameters of the posterior distribution mean: $\alpha$ reflects the number of events that have occurred; $\beta$ reflects the elapsed time.

## Summary


- Unfortunately, there are only a few problems we can solve with conjugate priors.


- For the vast majority of problems, there is no conjugate prior and no shortcut to compute the posterior distribution.


- That's why we need grid algorithms and the methods we do not have enough time to get into in this course, Approximate Bayesian Computation (ABC) and Markov chain Monte Carlo methods (MCMC).

# The End