# Standard probability mass functions

In [3]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson, bernoulli, binom
import ipywidgets as widgets

## Bernoulli distribution 

The Bernoulli distribution corresponds to the case where the random variable
$X$ can only take two values, for instance 0 and 1, and
we wish to describe the probability of both outcomes.
This family of distributions depends on a single parameter
$p\in [0, 1]$, and is defined according to,
\begin{equation}
    \begin{cases}
        p_X(1) &= p\\
        p_X(0) &= 1 - p.
    \end{cases}
\end{equation}

It is common to say that $p$ is the probability of
success (probability of $X$ taking the value 1), and
that $1-p$ is the probability of failure.
For instance, a coin throw follows a Bernoulli distribution,
where $p=0.5$ if the coin is well-balanced.

The widget on the next subslide shows the probability mass function of the Bernoulli distribution, and allows you to change its parameter $p$. You can also show the distribution of a random sample, and alter the size of the sample.

In the case of a sample size of 1, the random value is shown as a circle where it occurs rather than as a distribution.

In [1]:
class PMFwidget:
    def __init__(self, distribution, value_range):
        self.distribution = distribution
        self.value_range = value_range
        self.out = widgets.Output(layout=dict(height='550px'))
        self.fig = plt.figure()
        self.ax = self.fig.add_subplot()
        plt.close()
        self._build()
        self.param_sliders = []

    def _build(self):
        self.button = widgets.Button(description='Draw random sample', layout=dict(width='25%'))
        self.size = widgets.Dropdown(
            options=[('1', 1), 
                     ('10', 10),
                     ('100', 100),
                     ('1000', 1000),
                     ('10000', 10000)],
            value=1,
            description='Sample size:'
        )
        self.plot_pdf = widgets.Checkbox(
            value=True,
            description='Show pmf',
            disabled=False,
            indent=False
        )
        self.plot_sample = widgets.Checkbox(
            value=False,
            description='Show random sample',
            disabled=False,
            indent=False
        )
        self.button.on_click(self.on_button_clicked)
        self.plot_pdf.observe(self.on_button_clicked, names='value')
        self.plot_sample.observe(self.on_button_clicked, names='value')

    def add_param_slider(self, slider):
        self.param_sliders.append(slider)
        slider.observe(self.on_button_clicked, names='value')

    def on_button_clicked(self, b):
        with self.out:
            self.out.clear_output(wait=True)
            self.ax.clear()
            dist = self.distribution(*[slider.value for slider in self.param_sliders])
            values = np.arange(self.value_range[0], self.value_range[1]).reshape(1, -1)
            if self.size.value > 1:
                sample = dist.rvs(size=self.size.value).reshape(-1, 1)
                proportions = np.mean(sample == values, axis=0)
                if self.plot_sample.value:
                    self.ax.vlines(values + 0.05, 0, proportions, linewidth=3)
            else:
                self.ax.scatter(dist.rvs() + 0.05, 0, marker='o')
            if self.plot_pdf.value:
                xs = values
                ys = dist.pmf(xs)
                self.ax.vlines(xs + 0.2, 0, ys, color='orange', linewidth=3)
            self.ax.set_xlim(0, self.value_range[1])
            self.ax.set_ylim(0, 1.1)
            self.ax.set_xticks(np.arange(self.value_range[1]))
            with self.out:
                display(self.fig)
            plt.close()

    def __call__(self):
        self.button.click()
        display(widgets.VBox([self.button, self.size, self.plot_pdf, self.plot_sample]
                             + self.param_sliders
                             + [self.out,]))

In [4]:
p_sel = widgets.FloatSlider(
    value=0.25,
    min=0,
    max=1,
    step=.05,
    description=r'p',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
)

widget = PMFwidget(bernoulli, (0, 2))
widget.add_param_slider(p_sel)
widget()

VBox(children=(Button(description='Draw random sample', layout=Layout(width='25%'), style=ButtonStyle()), Drop…

Can you think of real-world scenarios that could be modelled as Bernoulli random variables?

Here are just a few examples:
1. Coin throw: the most standard exampe
2. Rainy day vs sunny day
3. Right answer vs Wrong answer to a quiw question

## Binomial distribution

Another common family of discrete probability distribution functions
is the Binomial distribution. The Binomial distribution occurs when counting
the positive outcomes in a fixed number of independent 
Bernoulli experiments.

For instance, we saw that the outcome of a coin throw follows
a Bernoulli distribution. If instead of a single coin throw 
we repeatedly throw the same coin \(n\geq 1\) times,
then the total number of Heads observed follows a Binomial
distribution $\mathcal{B}(n, p)$ with parameters $(n, p)$, where $p$ is the
probability of obtaining Head. The probability mass function of the
Binomial Distribution with parameters $n, p$ is given by,
\begin{equation}
    p_X(k) = {n \choose k} p^k (1 - p)^{n - k}, \quad k\in\mathbb{N}.
\end{equation}
Note that it has finite support $\{0, \ldots, n\}$. Indeed, if you 
throw a coin 10 times the probability of observing 11 Heads is zero.

In [7]:
p_sel = widgets.FloatSlider(
    value=0.25,
    min=0,
    max=1,
    step=.05,
    description=r'p',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
)

n_sel = widgets.IntSlider(
    value=5,
    min=0,
    max=10,
    step=1,
    description=r'n',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
)

widget = PMFwidget(binom, (0, n_sel.max + 1))
widget.add_param_slider(n_sel)
widget.add_param_slider(p_sel)
widget()

VBox(children=(Button(description='Draw random sample', layout=Layout(width='25%'), style=ButtonStyle()), Drop…

## Poisson distribution

The Poisson distribution is often used for count data. By count data,
we mean that we count the number of occurrences of a random event
within a fixed time window --- for instance the number of connections
made to a website per day --- or the number of locations where an event
occurs within a fixed region --- for instance the number of flaws in 
a piece of cloth on a production site. 
The Poisson distribution has a single parameter, often denoted $\mu$,
and has its probability mass function given by,
\begin{equation}
    p_X(k) = \frac{\mu^k}{k!} \exp(-\mu), \quad k\in\mathbb{N}.
\end{equation}

A Binomial distribution can be shown to converge to a Poisson distribution as $\frac{p}{n}$ goes to zero. More specifically, it converges to a Poisson distribution with parameter $\mu=np$.

In [8]:
mu_sel = widgets.FloatSlider(
    value=5,
    min=1,
    max=10.0,
    step=.5,
    description=r'$\mu$',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
)

widget = PMFwidget(poisson, (0, mu_sel.max * 2))
widget.add_param_slider(mu_sel)
widget()

VBox(children=(Button(description='Draw random sample', layout=Layout(width='25%'), style=ButtonStyle()), Drop…

Can you think of real-world scenarios that could be modelled as Poisson random variables?

Here are just a few examples:
1. Number of car accidents in London in a year
2. Number of credit card fraud payments to a company in a month