### Poisson Distribution

[Poisson Definition and Notation]()<br>
<br>

[Requirements Of A Poisson Distribution](#the-requirements-of-a-poisson-distribution)<br>
<br>

[The Poisson Distribution Equation](#poisson-distribution-probability-equation)<br>
<br>

[Calculate Specific Probability With PMF](#calculate-the-probability-of-observing-a-specific-value-given-the-ex-of-a-poisson-distro-with-pmf)<br>
<br>

[Use CDF To Calculate A Range Of A Poisson Distribution](#calculating-probabilities-of-a-range-using-the-cumulative-density-function)<br>
<br>

[Use CDF To Calculate A Specific Value Or More  Of Poisson Distribution](#calculate-the-probability-of-observing-a-specific-value-or-more-given-ex-with-cdf)<br>
<br>

[Use CDF To Calculate A Specific Value Of A Poisson Distribution](#calculating-the-probability-of-observing-a-specific-value-given-ex-with-cdf)<br>
<br>

[Expected Value Of A Poisson Distribution](#the-expected-value-ex-of-a-poisson-distribution)<br>
<br>

[]()<br>
<br>

[The Spread Of A Poisson Distribution](#spread-of-a-poisson-distribution)<br>
<br>

[]()<br>
<br>

#### Definition And Notation
The Poisson distribution is another common distribution, and it is used to describe the number of times a certain event occurs within a fixed time or space interval. For example, the Poisson distribution can be used to describe the number of cars that pass through a specific intersection between 4pm and 5pm on a given day. It can also be used to describe the number of calls received in an office between 1pm to 3pm on a certain day. It can be used to find the probability of finding a four leaf clover in a 3 square meter of grass.<br><br>
If short it can be used to find the probability of an event in space or time<br>
<br>
<b>Poisson Distribution Notation</b><br>
<br>
&emsp;$Po~\lambda$ denotes a Poisson Distribution<br>
<br>
&emsp;$Y \sim Po(\lambda)$<br>
<br>
&emsp;&emsp;$\lambda$ denotes expected value/mean value<br>
<br>
&emsp;Example:<br>
&emsp;&emsp;$Y \sim Po(4)  \leftarrow \text{Variable Y follows a Poisson Distribution with lambda equals to four}$<br>
<br>

#### The Requirements Of A Poisson Distribution 

1. The count of occurrences of events divided by a measure of space or time 
   1. Number of coins found in a mile of travel
   2. The number of customers to enter a store in an hour
2. Every interval must have the same mean
   1. The mean amount of coins over any given mile must be the same (can be assumed)
      1. Over 10 miles the mean amount of coins is the same in mile 1, is the same in mile 10, etc
      2. You know that Rise sells 90% of their biscuits before 10 AM. there for the mean of sold biscuits at 8 AM will not be the same as those sold at Noon. So your intervals should only cover 4 hours not 8 for example. 
3. Events counted in each interval are independent of event events in another interval 
   1. Say a store can only accept 40 customers per hour. In hour 2 there are 55 customers that tried to enter, therefore 15 customers were rolled over to hour 3, therefore hour 3 is dependent on hour 2. This scenario does not allow a Poisson distribution to be used. 
4. No overlapping interval
   1. Interval 1 starts at noon and ends at 1:10 PM. Interval 2 starts at 1:PM. this is an overlapping scenario that is not allowed. 
   2. Interval 1 starts at 0 and goes to 1 mile, Interval 2 starts and 1 mile and to 2 miles, then interval 3 starts and 0.5 miles and goes to 1.5 miles. Interval 3 is not allowed. 
<br>

#### Poisson Distribution Probability Equation 
<br>

&emsp;$ P(X) = \dfrac{\lambda^x e^{- \lambda}}{x!}$<br>
<br>
Where<br>

* x = number of occurrences (observations) of an event
  * Ex: seeing 5 coins in the next mile then x = 5
  
* $\lambda$ - The mean number of occurrences in an interval 
  * This is a value usually known ahead of time through observations 
  * If $\lambda = 4$ then we know that in any interval there is a mean of 4 coins observed  
<br>

&emsp;$P(4) = \dfrac{4^5 e^{- 4}}{5!} = \dfrac{4^5 e^{- 4}}{5!}$

```text 
   poisson_dict = {
   measure : "string",
   occurrences : 'int'
   mean : 'int or float'
}
```




In [1]:
 

import sys
sys.path.insert(0, '..')
import resources.datum as datum 

data = datum.Data()

item = 'coins'
measure = 'miles'
x = 5
mean = 4.0


poisson_dict = {
    "item" : item,
    "measure" : measure,
    "x" : x,
    "lambda" : mean
}

data.get_poission_prob(poisson_dict)


<IPython.core.display.Math object>

ℹ️NOTE: <br>
The farther away your x value is from lambda the smaller your probability. <br>
<Br>

In [3]:
import sys
sys.path.insert(0, '..')
import resources.datum as datum 

data = datum.Data()
for i in range(8):
    poisson_dict = {
        "item" : item,
        "measure" : measure,
        "x" : i,
        "lambda" : mean
    }
    print(f'P({i}) = {data.get_poission_prob(poisson_dict, std_out="Y")}')

P(0) = 1.83
P(1) = 7.33
P(2) = 14.65
P(3) = 19.54
P(4) = 19.54
P(5) = 15.63
P(6) = 10.42
P(7) = 5.95


Above is a print out of the poisson probabilities where x ranges from 0 to 7<br>
The greatest probability is when x equals to lambda<br>
Again note the farther x is away from lambda the smaller the probability<Br>
Also note the smaller x values that are equal distances away from lambda than the greater x valuesues have slightly better probabilities. For instance P(3) has a greater probability than P(5). The reasons for this are, you can't have a negative probability. where on the larger side you can go out to infinity. Therefore the spread on the larger side is much greater, resulting in the spread of the probabilities of greater distance, hence increasing the smaller probability values on the greater side of lambda. 




#### The graph Of A  Poisson Distribution

* Plots the number of instances the event occurs in a specified period of time / time interval (x axis)
  * x value is passed in to scipy.stats.poisson as the parameter k
  * scipy.stats.poisson mu parameter takes $\mu$ as the mean, which as stated earlier is $\lambda$ and $E(X)$ as well
  * scipy.stats.poison takes the parameter loc which can be used to move the starting point (usually set to 0, default is 1)
* Plots the probability of each event (y axis)
* A Poisson Distribution graph normally starts from zero since no event can happen a negative amount of times
* There is no limit to the amount of times the event can occur in over the time interval

In [10]:
from bokeh.plotting import figure, show, output_notebook, curdoc
import numpy as np
from bokeh.io import curdoc, show
from bokeh.models import ColumnDataSource, Grid, LinearAxis, Plot, VBar
import sys
sys.path.insert(0, '')
import resources.datum as datum 

output_notebook(hide_banner=True)
curdoc().theme = 'dark_minimal'

N = 21 #Number of ff
mu = 10,
loc = 0
new_poi = datum.Data(N = N, mu = mu)
x, y = new_poi.get_poisson_pmf_distro(loc = loc)

source = ColumnDataSource(dict(x=x,top=y,))

plot = Plot(
    title=None, width=500, height=400,
    min_border=0, toolbar_location=None)

# u
glyph = VBar(x="x", top="top", bottom=0, width=0.95, fill_color="dodgerblue")
plot.add_glyph(source, glyph)

xaxis = LinearAxis()
plot.add_layout(xaxis, 'below')

yaxis = LinearAxis()
plot.add_layout(yaxis, 'left')

plot.add_layout(Grid(dimension=0, ticker=xaxis.ticker))
plot.add_layout(Grid(dimension=1, ticker=yaxis.ticker))

curdoc().add_root(plot)

show(plot)

#### Calculate The Probability Of Observing A Specific Value Given The E[X] Of A Poisson Distro With PMF

Obviously the Poisson distribution is a discrete probability distribution, so it can be described by a probability mass function and cumulative distribution function.<br>
<br>
We can use the poisson.pmf() method in the scipy.stats library to evaluate the probability of observing a specific number given the parameter (expected value) of a distribution. For example, suppose that we expect it to rain 10 times in the next 30 days. The number of times it rains in the next 30 days is “Poisson distributed” with lambda = 10. We can calculate the probability of exactly 6 times of rain as follows:

In [11]:
from scipy.stats import poisson

ans = poisson.pmf(k = 6, mu = 10)
print(ans)

0.06305545800345125


You can see with scipy.stats.poisson's pmf function there is a probability of it raining 6 times in 30 days is 0.06305 times in the next 30 days or approximately 6.31 percent. <br>
<br>
Now I will use my function to make sure it is the same. 

In [5]:
 

import sys
sys.path.insert(0, '..')
import resources.datum as datum 

data = datum.Data()

item = 'rainy days'
measure = 'days'
x = 6
mean = 10.0


poisson_dict = {
    "item" : item,
    "measure" : measure,
    "x" : x,
    "lambda" : mean
}

data.get_poission_prob(poisson_dict)


<IPython.core.display.Math object>

Example:<br>
A teacher knows based on history that his class asks on average 4 questions per day. However today there were 7 questions asked. Surprised by the sudden interest, he wants to know the probability of a class asking 7 questions.

In [7]:


item = 'questions'
measure = 'per day'
x = 7
mean = 4.0


poisson_dict = {
    "item" : item,
    "measure" : measure,
    "x" : x,
    "lambda" : mean
}

data.get_poission_prob(poisson_dict)

<IPython.core.display.Math object>

Validate that my manual calculation is correct:

In [1]:
import datum 

poi_data = datum.Data(mu = 4)

ans = poi_data.get_poisson_pmf_value(k = 7)

print(round((100 * ans), 5))

5.95404


#### Calculating Exact Values Of A Poisson Distribution With The Probability Mass Function For Discrete Values

<b>The Poisson Probability Mass Function</b><br>
<br>

&emsp;$P(X = x) = \dfrac{\lambda^x e^{- \lambda}}{x !} \quad \text{Think of }P(X = x)~as~P(x)$<br>
<br>
Example<br>
Plutonium-239 is an isotope of plutonium that is used in nuclear weapons and reactors. <br>
One nano-gram of Plutonium-239 will have average 2.3 radioactive decays per second, and the number of decays will follow a Poisson Distribution.<br>
<br>
<span style = "color:dodgerblue;font-size:101%">
What is the probability that in a 2 second period there are exactly 3 radioactive decays? 
</span><br>

* $X = \text{the number of decays in a 2 second period}$
* $\lambda = \text{lambda is the mean number of decays in the period} = 2 \cdot 2.3 = 4.6$
* $Pu \text{ is the periodic symbol for Plutonium}$
<br>
$P(X = x) = \dfrac{\lambda^x e^{- \lambda}}{x !} \Rightarrow P(X = 3) = \dfrac{4.6^3 \cdot e^{- 4.6}}{3 !}$<br>
<br>

In [5]:
from IPython.display import display, Math
import datum as datum


lam = 4.6
k = 3

poi_data = datum.Data(mu=4.6)

ans = poi_data.get_poisson_pmf_value(k)

ans_pct = round((100 * ans), 3) 

msg = '\\textbf P(X = x) = \\dfrac{\\lambda^x e^{- \\lambda}}{x !} \\Rightarrow \
    P(X = 3) = \\dfrac{%s^%s \\cdot e^{- %s}}{%s !} = %s\\\\~\\\\'
msg = msg + '\\color{dodgerblue}\\textbf{There is a %s percent probability of getting exactly 3 decays in a %s second period}'
display(Math(msg%(lam, k, lam, k, ans, ans_pct, k)))


<IPython.core.display.Math object>

The Poisson Distribution by its nature of starting at zero typically has some right skewness as depicted in the plot above. The amount of right skewness depends on the value of $\lambda$ . When $\lambda$ is large, the plot is close to symmetric. As $\lambda$ gets close to zero the skewness increases.<br>
<br>
<b>The Plutonium-239 Example 2</b><br>
<br>
<span style = "color:dodgerblue;font-size:101%">
What is the probability there are no more than 3 radioactive decays ? $(\lambda = 4.6)$
</span><br>
<br>
Consider the plot above. We need to consider bins 0, 1, 2, and 3 and add them together using the Probability Mass Function (PMF)<br>
<br>

$$\text{Poisson Probability Mass Function}\\~\\ P(X \leq 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) \\~\\ 
= \dfrac{4.6^0 e^{- 4.6}}{0 !} + \dfrac{4.6^1 e^{- 4.6}}{1 !} + \dfrac{4.6^2 e^{- 4.6}}{2 !} + \dfrac{4.6^3 e^{- 4.6}}{3 !} \\~\\= 0.010 + 0.046 + 0.106 + 0.163 \\~\\= 0.326$$
<br>
<br>
<b>The Relationship between Binomial and Poisson Distribution</b><br>
<br>

The binomial distribution tends toward the Poisson distribution as n $\rightarrow \infty , p \rightarrow p$ and np stays constant. <br>
<br>
The Poisson distribution with $\lambda = np$ closely approximates the binomial distribution if n is large and p is small 
<br>
<br>

Like previous probability mass functions of discrete random variables, individual probabilities can be summed together to find the probability of observing a value in a range<br>
<br>
For example, if we expect it to rain 10 times in the next 30 days, the number of times it rains in the next 30 days is “Poisson distributed” with lambda = 10. We can calculate the probability of 12-14 times of rain as follows:


In [25]:
import datum 
from IPython.display import display, Math

def get_poison_range(ran: tuple, mu : int):
    poi_data = datum.Data(mu = mu)
    arr = []
    cnt = []
    msg = ""
    for i in range(ran[0], (ran[-1] + 1)):
        msg = msg + "\\text{The probability of it raining %s times in %s is %s pct}\\\\~\\\\"
        arr.append(poi_data.get_poisson_pmf_value(i))
        cnt.append(i)
    msg = msg + "\\text{The probability of it raining %s to %s in %s days is %s pct}"
    return cnt, arr, msg

new_range = (12, 14)
mu = 10 
freq = 30
cnt, arr, msg = get_poison_range(new_range, mu)

ans = round((sum(arr) * 100), 5)

display(Math(msg%(cnt[0],freq, round((arr[0] * 100), 4), cnt[1],freq, round((arr[1] * 100), 4), cnt[2],freq, round((arr[2] * 100), 4), cnt[0], cnt[-1], freq, ans)))



<IPython.core.display.Math object>

#### Calculating Probabilities Of A Range Using The Cumulative Density Function 

We can use the poisson.cdf() method in the scipy.stats library to evaluate the probability of observing a specific number or less given the expected value of a distribution.<br>
<br>
For example, if we wanted to calculate the probability of observing 6 or fewer rain events in the next 30 days when we expected 10, we could do the following<br>
<br>

In [11]:
import datum

x = 6
mu = 10
t = 30

poi_data = datum.Data(mu = mu)

ans = round((100 * poi_data.get_poisson_cdf_value(x)), 3)

msg = '\\text{The probability of observing %s or fewer rain events in %s days when %s are expected is %s pct.}'
display(Math(msg%(x, t, mu, ans)))

<IPython.core.display.Math object>

#### Calculate The Probability Of Observing A Specific Value Or More Given E(X) With CDF

We can also use this method to evaluate the probability of observing a specific number or more given the expected value of the distribution<br>
<br>
For example, if we wanted to calculate the probability of observing 12 or more rain events in the next 30 days when we expected 10, we could do the following:


In [14]:
import datum

x = 11
mu = 10
t = 30

poi_data = datum.Data(mu = mu)

ans = round((100 * (1 - poi_data.get_poisson_cdf_value(x))), 3)

msg = '\\text{The probability of observing %s or more rain events in %s days when %s are expected is %s pct.}'
display(Math(msg%(x, t, mu, ans)))


<IPython.core.display.Math object>

<span style = "color:yellowgreen;font-size:101%">
Note that we used 11 instead of 12 for the value k because we wanted to include 12 in the calculation. We wanted to calculate the probability of observing 12 or more rains, which includes 12. stats.poisson.cdf(11, 10) evaluates the probability of observing 11 or fewer rains, so 1 - stats.poisson.cdf(11, 10) would equal the probability of observing 12 or more rains.
</span><br>
<br>

#### Calculating The Probability Of Observing A Specific Value Given E[X] With CDF
Summing individual probabilities over a wide range can be cumbersome. It is often easier to calculate the probability of a range using the cumulative density function instead of the probability mass function.<br>
<br>
However you can do it by taking the difference between to CDF ranges<br>
For example, while still expecting 10 rainfalls in the next 30 days, we could use the following code to calculate the probability of observing between 12 and 18 rainfall events:<Br>


In [16]:
import datum

x1 = 18
x2 = 11
mu = 10
t = 30

poi_data = datum.Data(mu = mu)

ans1 = poi_data.get_poisson_cdf_value(x1)
ans2 = poi_data.get_poisson_cdf_value(x2)
prop_12_to_18 = round((100 * (ans1 - ans2)), 3)


msg = '\\text{The probability of observing %s to %s rain events in %s days when %s are expected is %s pct.}'
display(Math(msg%(x1, (x2 + 1), t, mu, prop_12_to_18)))

<IPython.core.display.Math object>

#### The Expected Value E[X] Of A Poisson Distribution

You can see that the highest values are 9 and 10 in the histogram<br>
When we talk about the expected value, we mean the average over many observations. This relates to the Law of Large Numbers: the more samples we have, the more likely samples will resemble the true population, and the mean of the samples will approach the expected value. So even though the salesperson may make 3 sales one week, they may make 16 the next, and 11 the week after. In the long run, after many weeks, the expected value (or average) would still be 10.
<br>
<br>


In [1]:
from scipy.stats import poisson
import datum

new_datum = datum.Data()

rvs = poisson.rvs(10, size = 1000)

print(f'count: {len(rvs)}\nmode of rvs: {new_datum.get_mode(rvs)}\nmedian of rvs: {new_datum.get_median(rvs)}\nmean of rvs: {rvs.mean()}')


count: 1000
mode of rvs: [10, 9]
median of rvs: 8.0
mean of rvs: 10.083


&emsp;$p(y) \Rightarrow E(y)$<br>
<br>
The Expected Value of y is equal to sum of the products of the <b>distinct</b> values and its probability in the Sample Space<br>
<br>

&emsp;$E(y) = y_{_0} \cdot p(y_{_0}) + y_{_1} \cdot p(y_{_1}) + y_{_2} \cdot p(y_{_2}) + ....$<br>
<br>
By plugging in the distinct values and their probabilities we get this complicated expression:<br>
<br>

&emsp;$E(y) = y_{_0} \dfrac{\lambda^{y_{_0}} e^{- \lambda}}{y_{_0} !} + y_{1} \dfrac{\lambda^{y_1} e^{- \lambda}}{y_1 !} + ...$<br>
<br>
The result is $E(y) = \lambda$<Br>
<br>

#### Spread Of A Poisson Distribution

Probability distributions also have calculable variances. Variances are a way of measuring the spread or dispersion of values and probabilities in the distribution. <b>For the Poisson distribution, the variance is simply the value of lambda (λ), meaning that the expected value and variance are equivalent in Poisson distributions.</b>
<br>
We know that the Poisson distribution has a discrete random variable and must be greater than 0 (think, a salesperson cannot have less than 0 sales, a shop cannot have fewer than 0 customers), so as the expected value increases, the number of possible values the distribution can take on would also increase.<br>
<br>
Below, the first plot below shows a Poisson distribution with lambda equal to three, and the second plot shows a Poisson distribution with lambda equal to fifteen. Notice that in the second plot, the spread of the distribution increases. Also, take note that the height of the bars in the second bar decrease since there are more possible values in the distribution.

<b>Poisson Distribution Variance</b><br>
<br>
The variance $\sigma^2 = (y_{_0} - \mu)^2~+~(y_{_1} - \mu)^2 + .... = \lambda $<br>
<br>


In [27]:

import numpy as np
from bokeh.layouts import column, row 
from bokeh.plotting import figure, show, output_notebook, curdoc
from bokeh.models import Div, TeX
import datum

output_notebook(hide_banner=True)
curdoc().theme = 'dark_minimal'


N = 1000

poisson_data = datum.Data(N = N)

# histogram 1
cnts1 = sorted(poisson_data.get_poisson_distro(lam = 3))

cnt1, mu1, sigma1, min_val1, max_val1, unique1, frequency1 = poisson_data.get_central_tendency(cnts1)

fig1 = figure(width=500, height=400, toolbar_location=None,
           title="Poisson Distribution lambda = 3")

# Histogram
bins1 = np.linspace(min_val1, max_val1, 10)
hist1, edges1 = np.histogram(cnts1, density=True, bins=bins1)
fig1.quad(top = hist1
       ,bottom=0 
       ,left = edges1[:-1] 
       ,right = edges1[1:]
       ,fill_color="dodgerblue" 
       ,alpha = 0.5
       ,line_color="white"
       ,legend_label=f"{len(cnts1)} observations")

div1 = Div(text=r"""
<p>
$$min value: %s$$<br>
</p>
<p>
$$max value: %s$$<br>
</p>
<p>
$$\mu: %s$$<br>
</p>
<p>
$$\sigma: %s$$<br>
</p>
"""%(min_val1, max_val1, round(mu1, 3), round(sigma1, 3)))

# histogram 2
cnts2 = sorted(poisson_data.get_poisson_distro(lam = 15))

cnt2, mu2, sigma2, min_val2, max_val2, unique2, frequency2 = poisson_data.get_central_tendency(cnts2)

fig2 = figure(width=500, height=400, toolbar_location=None,
           title="Poisson Distribution lambda = 15")

# Histogram
bins2 = np.linspace(min_val2, max_val2, 12)
hist2, edges2 = np.histogram(cnts2, density=True, bins=bins2)
fig2.quad(top = hist2
       ,bottom=0 
       ,left = edges2[:-1] 
       ,right = edges2[1:]
       ,fill_color="dodgerblue" 
       ,alpha = 0.5
       ,line_color="white"
       ,legend_label=f"{len(cnts2)} observations")

div2 = Div(text=r"""
<p>
$$min~value: %s$$<br>
</p>
<p>
$$max~value: %s$$<br>
</p>
<p>
$$\mu: %s$$<br>
</p>
<p>
$$\sigma: %s$$<br>
</p>
"""%(min_val2, max_val2, round(mu2, 3), round(sigma2, 3)))

show(row(column(fig1, div1), column(fig2, div2)))

As we can see,<br>
<span style = "color:yellowgreen;font-size:101%">as the parameter lambda increases, the variance — or spread — of possible values increases as well.</span><br>
<br>

