# Poisson counting

Fall 2022: Peter Ralph

https://uodsci.github.io/dsci345

In [1]:
import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['figure.figsize'] = (15, 8)
import numpy as np
import pandas as pd

rng = np.random.default_rng()

$$\renewcommand{\P}{\mathbb{P}} \newcommand{\E}{\mathbb{E}} \newcommand{\var}{\text{var}} \newcommand{\sd}{\text{sd}}$$
This is here so we can use `\P` and `\E` and `\var` and `\sd` in LaTeX below.

# Motivation, and the Poisson

Suppose that we're running a solar panel manufacturing plant.
Each panel is made up of many modules, each of which may be defective.
Usually only a few are:
the average number of defects per panel is 1.
A few defects are okay, but if a panel has more then 2 defects,
the panel will not work.
*What proportion of panels will work?*

*What do we need to know?*
Well, let's suppose that a panel has $N$ modules,
and each module is broken independently of the others.
If the probability of a module being broken is $p$, then we must have $p = 1/N$.
(*Why?*)

## Poisson approximation

Let's call the number of defects $X$. So, $X$ is Binomial($N$, $\lambda/N$) with $\lambda=1$.
Binomial probabilities say that
$$\begin{aligned}
    \P\{ X = k \}
    &=
    \frac{N (N-1) \cdots (N-k+1)}{k!} \left(\frac{\lambda}{N}\right)^k \left(1-\frac{\lambda}{N}\right)^{N-k} .
\end{aligned}$$
However, $N$ was arbitrary. For large $N$, (see board)
$$\begin{aligned}
    \P\{ X = k \}
    &\approx
    \frac{1}{k!} \lambda^k e^{-\lambda} ,
\end{aligned}$$
i.e., $X$ is [Poisson with mean $\lambda$](https://en.wikipedia.org/wiki/Poisson_distribution).

# Example: defects

Suppose that we're running a solar panel manufacturing plant.
Each panel has a Poisson number of defects with mean 1.
A few defects are okay, but if a panel has more then 2 defects,
the panel will not work.
*What proportion of panels will work?*

*Solution:*
Let $X$ denote the number of defective modules on a randomly chosen panel.
Then $X$ has a Poisson($\lambda = 1$) distribution, so
$$\begin{aligned}
    \P\{X \le 2\} &= \P\{X = 0\} + \P\{X = 1\} + \P\{X = 2\} \\
        &= e^{-\lambda} \left( 1 + \lambda + \lambda^2 / 2 \right) ,
\end{aligned}$$
which is here

In [2]:
lam = 1
p_good = np.exp(-lam) * (1 + lam + lam**2 / 2)
print(f"Proportion that are good: {p_good:.3f}")

Proportion that are good: 0.920


# Complications

*However:* this turns out *not* to match observation,
and in fact the distribution of number of defects per panel looks like this:

In [3]:
npanels = 10000
nd = rng.poisson(lam * rng.exponential(size=npanels), size=npanels)
print(f"proportion good: {np.mean(nd <= 2)}")
print("\t".join(['value', 'expected', 'observed']))
for k in range(12):
    exp = npanels * np.exp(-lam) * lam**k / np.math.factorial(k)
    obs = np.sum(nd == k)
    print("\t".join([str(k), f"{exp:.2f}", str(obs)]))

proportion good: 0.8717
value	expected	observed
0	3678.79	4990
1	3678.79	2487
2	1839.40	1240
3	613.13	604
4	153.28	354
5	30.66	173
6	5.11	81
7	0.73	31
8	0.09	25
9	0.01	5
10	0.00	5
11	0.00	3


```
value	expected	observed
0	3678.79	4954
1	3678.79	2511
2	1839.40	1263
3	613.13	625
4	153.28	336
5	30.66	146
6	5.11	89
7	0.73	42
8	0.09	17
9	0.01	8
10	0.00	5
11	0.00	3
```

## Overdispersion

What's going on? With some more investigation,
we find that some panels are more error-prone than others:
a better model for the number of defects per panel is
that the "quality" of a panel, $R$, is drawn from an Exponential distribution,
and given this quality, the number of defects is Poisson with mean $R$:
$$\begin{aligned}
\text{error rate: } R &\sim \text{Exponential}(1) \\
\text{number of defects: } X &\sim \text{Poisson}(R) .
\end{aligned}$$

*Question:* What is $\E[X]$?

Well, given $R$, the mean is, well, $R$, i.e., $\E[X|R] = R$.

So, it would make sense if $\E[X] = \E[R] = 1$.

This is true; here is the "proof" from first principles:
$$\begin{aligned}
 \E[X]
 &=
 \sum_x x \P\{X = x\} \\
 &=
 \sum_x x \sum_r \P\{X = x, R = r\} \\
 &=
 \sum_x x \sum_r \P\{R = r\} \P\{X = x \;|\; R = r\} \\
 &=
 \sum_r \P\{R = r\} \sum_x x \P\{X = x \;|\; R = r\} \\
 &=
 \sum_r \P\{R = r\} \E[X = x \;|\; R = r] \\
 &=
 \sum_r \P\{R = r\} r \\
 &=
 \E[R] .
\end{aligned}$$

# Challenge

Estimate the proportion of broken panels from this model by simulation.