<img src="../../shared/img/slides_banner.svg" width=2560></img>

# Models and Random Variables 02

In [None]:
%matplotlib notebook

In [None]:
import sys

sys.path.append("../../")

from shared.src import quiet
from shared.src import seed
from shared.src import style

In [None]:
import math
import random

import daft
from IPython.display import Image
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import scipy.special
import scipy.stats
import seaborn as sns

In [None]:
def sample_from(model, n, filt=10):
    with model:
        samples = pm.sample(chains=1, draws=n * filt)[::filt]
    [samples.remove_values(name) for name in samples.varnames if "_log__" in name]
    
    return samples

In [None]:
def samples_to_dataframe(samples):
    return pd.DataFrame([sample for sample in samples])

In [None]:
mean = lambda xs: np.mean(xs, axis=1)
var = lambda xs: np.var(xs, axis=1)

def setup_clt_plot(sample_cts):
    nrows = 2
    ncols = len(sample_cts) // nrows
    aspect = 1 / (ncols / 2)
    f, axs = plt.subplots(figsize=(10, 10 * aspect), nrows=nrows, ncols=ncols,
                          sharex=True, sharey=True)
    return f, axs

def clt_plot(samples, sample_cts, axs, stat_func=mean, lims=[0, 5]):

    for ax, sample_ct in zip(axs.flatten(), sample_cts):
        ax.hist(stat_func(samples[:, :sample_ct]), #bins=np.linspace(*lims, 10),
                 histtype="step", align="mid", lw=2, normed=True,
                 label="Sampling Distribution");

    for ax, sample_ct in zip(axs.flatten(), sample_cts):
        xs = np.linspace(*lims, num=100)
        ax.plot(xs,
                scipy.stats.norm(np.mean(stat_func(samples[:, :sample_ct])),
                                 np.std(stat_func(samples[:, :sample_ct]))).pdf(xs),
                lw=2, label="Normal Approximation")
        ax.set_title("Sample Size: {0}".format(sample_ct))
        
    axs[1, 0].legend(loc="upper right")

## There are many kinds of random variables available in `pymc`

The traditional way of learning them goes like this:

I tell you the name, then I tell you a formula for the distribution of the random variable.

Remember that, for all kinds of random variable, the distribution is what the histogram is approximating.

For example:

> A random variable is said to have the _Laplace_ distribution with mean $\mu$ and scale $\lambda$ if the distribution of the variable is given by

$$
p(x ; \lambda, \mu) = \frac{\lambda}{2}\mathrm{e}^{-\lambda|x-\mu|}
$$

and then we would look at a graph of the formula for some choices of the parameter $\lambda$, then start deriving properties of this random variable.

### In this lecture, we will learn about them by building them up from smaller parts

We'll start with simple random variables.

By combining them into more complicated models, we'll discover new random variables.

## Let's start where probability theory began: dice games.

The first mathematical works on probability were called, in Latin, _Liber de Ludo Aleae_, or _Book on Games of Chance_, by Gerolamo Cardano (written c. 1564, published 1663) and _De Ratiociniis in Ludo Aleae_ or _Reasoning in Games of Chance_, by Christian Huygens (1657).

In [None]:
Image("./img/cardano.jpg", width=350)

Cardano was something of a mathematical bad boy: a proponent of such scandalous ideas as negative, even imaginary numbers, and constantly in dire financial straits. He used his knowledge of probability to earn money at games of dice. The publication of his book was delayed because it included a section on ways to cheat, which he did not wish to reveal.

### So we begin with a humble random variable: the `DiscreteUniform`.

`Discrete` means you can count the number of possible values, as when drawing from a deck of cards or guessing the number of avocados produced in a year in California.

`Uniform` means that each outcome is equally likely, as when drawing from a shuffled deck of cards or looking at the position of an air molecule in a room.

The traditional six-sided die is a `DiscreteUniform` with six outcomes, numbered `1` through `6`.

In [None]:
d6_model = pm.Model()

with d6_model:
    pm.DiscreteUniform(name="U", lower=1, upper=6)

In [None]:
data = samples_to_dataframe(sample_from(d6_model, 1000, filt=25))

data.head(n=10)

The resulting frequencies are roughly even, or `Uniform`, as expected:

In [None]:
print(data["U"].value_counts() / len(data))

## We can simulate collections of random variables with `shape`

Just before Cardano's and Huygens's books were published, two prominent French mathematicians,
Blaise Pascal and Pierre de Fermat exchanged some letters comparing their notes on games of chance.

Pascal is famous for [Pascal's wager](https://plato.stanford.edu/entries/pascal-wager/) that God exists; betting was popular then.

Their letters were occasioned by a question from another libertine-mathematician, alias Chevalier de Méré.

His (incorrect) rules for reasoning with probability told him that if he bet even money that he could roll two sixes, (aka "ringing the bell" or _sonnez_) at least once in 24 rolls of a pair of dice, he would come out ahead.
Experience had shown that he lost money unless he rolled 25 times.

Let's see if we can simulate this _sonnez_ game and verify the Chevalier's claims.

There's a similar rule of thumb still in place today:
if you want to have at least a 50% chance of seeing an event that happens with chance 1 / N,
you should try it `2 / 3 * N` times.

We will use `DiscreteUniform` again.

This time, however, we want to simulate the process of rolling multiple pairs of dice.

To achieve this, we'll add a `shape` argument to our definition of the variable. The result will be outputs that are arrays whose shape is equal to the value of the `shape` argument. This is more efficient than creating `num_rolls * 2` separate variables.

In [None]:
sonnez_model = pm.Model()
num_rolls = 24

with sonnez_model:
    pm.DiscreteUniform(name="rolls", lower=1, upper=6, shape=(num_rolls, 2))

Two `pyMC` tricks are happening here:

1. You might object that, in a real game of _sonnez_, we'd stop whenever a _sonnez_ was rolled, so this model doesn't really simulate our process. For technical reasons, it's very hard for `pyMC` to simulate situations where _the existence of certain variables is determined by chance_. For example, the 24th roll only happens if the first 23 rolls were all failures, which is determined by chance. Instead, what we typically do is simulate all variables that might be present, then ignore values as needed. This will be important for the lab.

For example, if we were simulating our winnings from _sonnez_, we would ignore any cases where we rolled two or more _sonnez_s in a set of 24 rolls.

2. Second, you might wonder why we bother with `shape` here. Couldn't we just take the `d6_model` above, take `num_rolls * 2 * num_samples` samples, and use those as our rolls? Unfortunately, the samples from `pyMC` aren't _independent_ -- if you look closely, you'll see that streaks are more common than you'd expect by chance. This is especially bad for a model of _sonnez_. We'll talk more about what sample dependence means for using `pyMC` correctly later on in the course.

In [None]:
raw_roll_data = samples_to_dataframe(sample_from(sonnez_model, 25000))

In [None]:
for ii in range(num_rolls):
    raw_roll_data["roll_" + str(ii)] = raw_roll_data["rolls"].apply(lambda lst: lst[ii])
    
roll_data = raw_roll_data.drop("rolls", axis=1)

In [None]:
roll_data.sample(5)

With our rolls in hand, we first move through each roll, determining whether it was a _sonnez_.
We use `apply` to apply the same function, `is_success`, to each roll.

In [None]:
def is_success(roll):
    return roll[0] == roll[1] == 6


success_data = pd.DataFrame()
for ii in range(num_rolls):
    success_data["success_on_roll_" + str(ii)] = roll_data["roll_" + str(ii)].apply(is_success)

success_data.sample(5)

Then, we can figure out which batches of 24 rolls included a _sonnez_ by `apply`ing the `max` function to each row:

In [None]:
success_in_batch = success_data.apply(max, axis=1)

In [None]:
success_in_batch.describe()

In [None]:
chance_failure = success_in_batch.value_counts()[False] / len(success_in_batch)
chance_failure

At least once when I ran this, I got an incorrect answer: it looked like betting you can get a _sonnez_ in 24 rolls was a money-making, rather than money-losing, bet!
Remember never to trust a single point value calculated from data (i.e., one without error bars), and that bootstrapping is still a valid way to do inference on data from `pyMC` models!

To really draw an inference, we could bootstrap our samples many times:

In [None]:
success_in_batch.sample(frac=1, replace=True).describe()

Note the similarity to the "unknown digits of pi" example.
One can derive, as Pascal and Fermat did,
that getting a _sonnez_ in 24 rolls is a money-losing bet,
so from a certain perspective, it not something we can use probability to describe our uncertainty about.

However, this example demonstrates the utility of using probability for this sort of uncertainty.
So long as we can build a model of a claim, we can use sampling from the model to determine
a measure of our certainty and to think inferentially.

## Counting successes

In another game, we might win more money the more times we roll a _sonnez_.

If we want to know our odds of success or average winnings in that game, we want to add up our _sonnez_ rolls in each batch (we use `sum` instead of `max`):

In [None]:
number_success = success_data.apply(sum, axis=1)
number_success.sample(10)

In [None]:
plt.figure();

In [None]:
plt.hist(number_success, bins=range(max(number_success) + 1), align="left", normed=True);

We might be happy to stop here, but when calculating by hand, it's good to have shortcuts for these sorts of things.

So Jakob Bernoulli, building off the work of Pascal and Cardano, was able to derive the general shape of this distribution, as a function of both the number of rolls and the chance of the outcome of interest.

In Python:

In [None]:
def binomial_pmf(k, N, q):
    chance_of_k_successes_in_a_row = (q ** k)
    chance_of_all_failures_after = (1 - q) ** (N - k)
    number_of_ways_to_scramble = scipy.special.binom(N, k)
    return number_of_ways_to_scramble * chance_of_k_successes_in_a_row * chance_of_all_failures_after

and math:

#### The Binomial($N$, $q$) Distribution: $$p(k) = \color{ForestGreen}{\binom{N}{k}} q^{k}(1-q)^{N-k}$$

where $N$ is the number of attempts and $q$ is the chance of success on any given attempt.

The quantity in $\color{ForestGreen}{\text{green}}$ is the _binomial coefficient_ (discovered by Cardano in a different area of mathematics!), implemented in Python as `scipy.special.binom`.

Loosely, it counts the number of ways we can succeed `k` times in `N` tries. `k` in a row at the start, one failure then `k` successes, and so on.

We can estimate `q` from the `success_data`. Can you guess how?

In [None]:
q = success_data.mean().mean()

In [None]:
plt.figure();

In [None]:
ks = np.arange(max(number_success) + 2)
plt.hist(number_success.sample(frac=1, replace=True), bins=ks, align="left", normed=True,
         histtype="step", lw=4, color="C0");

plt.plot(ks, binomial_pmf(ks, num_rolls, q),
         lw=4, marker='.', markersize=24, color="C1");

And, as promised, this formula gives a good match to our data histogram.

Go back and change the definition of `is_success` to something else -- maybe check if any number rolled is even, or if the sum of the rolls is `7`.

The shape of the distribution will change, but the shape given by `biniomial_pmf` will track it!

The match won't be perfect, but if you apply bootstrapping to the samples,
you can get a sense for the variability in our estimate of the histogram,
and it should be clear that the differences between our estimate and the curve given
by the binomial pmf are not much comparable to the differences between bootstrap estiamtes of the histogram.

To do so, change the histogram plotting call to
```python
plt.hist(number_success.sample(frac=1, replace=True),
         bins=ks, align="left", normed=True,
         histtype="step", lw=4, color="C0");

```
and run it multiple times.

## We can count successes with `pm.Binomial`

Rather than modeling our _sonnez_ game with `DiscreteUniform`s representing dice rolls, we could instead model it with a `Binomial` representing the number of _sonnez_s directly:

A `pm.Binomial` random variable has the distribution given by the binomial distribution formula above,
aka `binomial_pmf` or `scipy.stats.binom.pmf`, with the same parameters.

In [None]:
binomal_sonnez_model = pm.Model()

with binomal_sonnez_model:
    pm.Binomial(name="num_sonnez", n=24, p=q)  # some use p, some use q

In [None]:
binomial_sonnez_data = samples_to_dataframe(sample_from(binomal_sonnez_model, 10000))

In [None]:
binomial_sonnez_frequencies = binomial_sonnez_data["num_sonnez"]\
    .value_counts() / len(binomial_sonnez_data)
binomial_sonnez_frequencies

In [None]:
plt.figure();

In [None]:
plt.hist(number_success, bins=ks, align="left", normed=True,
         histtype="step", lw=4);
plt.hist(binomial_sonnez_data["num_sonnez"], bins=ks,
         normed=True, align='left',
         histtype="step", lw=4);

### Modeling always requires trade-offs, like between flexibility and efficiency

By making an assumption about the phenomenon we are modeling, we were able to simplify our model: instead of simulating each die roll, we can just simulate how many were _sonnez_s.

In this case, this buys us

1. A smaller dataset. In `success_model`, each row has 48 numbers. In `binomial_sonnez_model`, there's just 1. Imagine the savings if we were playing a game with 1,000,000 rolls!

2. Less work. Our data is already in the form we want to work with, so the analysis is simpler: just `add_counts` and `groupby`.

but at a price: we no longer have a detailed model. So if the context changes and we are instead interested in how many streaks of three `3`s there are, or how often an odd number is rolled, we'd have to build a new model, rather than re-analyzing the output of an old model.

When we're in an exploratory phase or another rapidly-changing environment, keeping a flexible model is preferred. Once we've nailed down exactly what we'd like to model and understand, we can switch over to something more efficient and limited.

## We can model the rain

In [None]:
Image('./img/umbrella.png')

Say it is raining. How many drops are hitting your umbrella each second? How much does that count vary?

With a bit of cleverness, we can answer this question using the `Binomial`.

Divide the area of your umbrella into some number of small pieces and, for each piece, record whether at least one raindrop hit there in a given second.

In [None]:
Image('./img/umbrella_top.png')

The result in each area is a like a `Categorical` on `0` (no drop) and `1` (at least 1 drop). The raindrops don't have much effect on each other, so they're like our separate die rolls, or independent samples from `Categorical`.

A `Categorical` with two outcomes is so common that it has a special name:
a `Bernoulli` variable, after Jakob Bernoulli, who came across it on the way to the `Binomial`.

The sum of these `Categorical`s is a `Binomial`. The smaller we make the pieces, the less likely any individual piece is to have more than 1 drop hit it, and so the closer we get to counting the raindrops.

Let's say that when we split the umbrella into `16` pieces, the chance any individual piece gets at least one rain drop in a second is `1 / 2` (it's a very light drizzle).

In [None]:
raindrops_model = pm.Model()

num_pieces = 16
q = 1 / 2

with raindrops_model:
    pm.Binomial(name="num_drops", n=num_pieces, p=q)

raindrops_data = samples_to_dataframe(sample_from(raindrops_model, 1000))

In [None]:
plt.figure();

In [None]:
plt.hist(raindrops_data["num_drops"],
         bins=range(max(raindrops_data["num_drops"]) + 2),
         histtype="step", linewidth=4, align="left", normed=True);

If we want more accuracy, we can split into smaller pieces (say, split each piece in 1/2 or in quarters).

Then the number of trials would go up (double, quadruple) and the chance of a drop hitting would go down (1/2, 1/4).

Let's simulate what happens to the number of drops we observe.

In [None]:
precise_raindrops_model = pm.Model()

with precise_raindrops_model:
    pm.Binomial(name="num_drops", n=num_pieces * 100 , p=q / 100 ) # 100 times finer grid

precise_raindrops_data = samples_to_dataframe(sample_from(precise_raindrops_model, 10000))

In [None]:
f = plt.figure();

In [None]:
plt.hist(raindrops_data["num_drops"],
         bins=range(max(raindrops_data["num_drops"]) + 2),
         histtype="step", linewidth=4, align="left", normed=True, label="less precise estimate");

In [None]:
plt.hist(precise_raindrops_data["num_drops"],
         bins=range(max(precise_raindrops_data["num_drops"]) + 2),
         histtype="step", linewidth=4, normed=True, align="left",
         alpha=0.85, color="C3", label="more precise estimate"); plt.legend(); plt.tight_layout();

## The Poisson distribution lets us model counts of events

Another French mathematician, [Siméon Denis Poisson](https://en.wikipedia.org/wiki/Sim%C3%A9on_Denis_Poisson), studied this sort of problem, in 1837. In his case, he was trying to model the rate and occurrence of wrongful convictions (somewhat more noble than gambling).

With some clever mathematics, he was able to demonstrate that the shape of the distribution is given by the following Python function:

In [None]:
def poisson_pmf(k, mu):
    return np.exp(-mu) * mu ** k / math.factorial(k)

or, in traditional math terms:

#### The Poisson Distribution: $$p(k; \mu) = \mathrm{e}^{-\mu} \cdot \frac{\mu^k}{k!}$$

where `mu` aka $\mu$ is the average value.

We can verify this by comparing the distribution to our sample histograms:

In [None]:
average_number_drops = np.mean(precise_raindrops_data["num_drops"])
poisson_pmf = scipy.stats.poisson(mu=average_number_drops).pmf
# scipy.stats has lots of distributions implemented, so we don't have to write our own!

ks = range(max(precise_raindrops_data["num_drops"]) + 2)
plt.plot(ks, poisson_pmf(ks), linewidth=4, marker=".", markersize=24, label="poisson model"); plt.legend()
f

This model of the rain has at least one distinct advantage over the binomial model: it's easier to match it to data.

The parameter, `mu`, is just the average number of raindrops in a second.
We could count raindrops for 60 seconds, divide by 60, and then be done.

Compare that to the parameters of the binomial model, `n` and `p`.
For the model to be accurate, `n` must be large and `p` must be small.
So to actually measure that `p` accurately,
we'd need to determine the chance of a raindrop falling on a very small area.
Since that chance is small, we'd need many trials to determine the chance accurately.

But again, we lose the ability to model what's happening on the individual parts of the umbrella, so if we later wanted to know, say, how big a difference in raindrop counts to expect on one half of the umbrella versus the other, a `Poisson` model wouldn't cut it.

## Raindrops another way

Let's consider another, more detailed model of the rain.

Instead of just counting raindrops, let's time them.
Every time a drop hits, we start a stopwatch.
When another drop hits, we stop the stopwatch and log the time.

That is, we measure the time in between rain drops.
In general, the amount of time between events is called an _inter-event interval_,
so we'll call the value we're measuring the _inter-drop interval_.

But what should the distribution of the times between rain drops be?
Creating a more detailed model requires us to use more sophisticated tools and make additional assumptions.

In building our binomial model, we already assumed that the number of raindrops in one area had no effect on the raindrops in another area.
Let's also assume that the raindrops at one point in time have no effect on raindrops at another point in time.

This is different from the behavior of, say, a bus.
If the Number 6 bus arrives, on average, at the corner of Telegraph and Haste every 10 minutes, and no bus has arrived for 9 minutes, you're more likely to see one in the next minute than in the minute right after a bus arrives.

It seems plausible, on the other hand,
that raindrops are not like buses.
They are, unfortunately for many weddings, not scheduled
and seem to fall relatively independently of one another.

There are many phenomena that behave (approximately) this way:
the time in between customer orders on Amazon (or orders of a particular product),
the time between radioactive decay events,
the time for a Bitcoin transaction to be completed.

They are known as _memoryless_ phenomena.
It can be shown that there is only one distribution
for memoryless processes:
the `Exponential` distribution.

It has one parameter, the rate, `lambda` aka $\lambda$, aka `lam`.
It is the average number of events per unit time, which we computed for our data above.

In [None]:
raindrop_time_model = pm.Model()

with raindrop_time_model:
    pm.Exponential(name="interdrop_intervals", lam=average_number_drops, shape=5000)

In [None]:
raindrop_time_data = samples_to_dataframe(sample_from(raindrop_time_model, 10))

In [None]:
plt.figure();

In [None]:
plt.hist(raindrop_time_data.sample()["interdrop_intervals"], alpha=0.1, color="k", normed=True);

Notice the exponential shape of the histograms -- that's where this random variable type gets its name.

Does this model of the rain match up with our previous models, which used the `Binomial` and the `Poisson` to count rain drops?

Because the `Poisson` model was less detailed, we can't probe it for raindrop arrival times -- that information isn't a part of that model.
But we can count how many rain drops occurred in each second of our `Exponential` model and see whether the distribution matches.

Let's first visualize the raindrop arrivals as a sequence.

In [None]:
raindrop_times = np.cumsum(raindrop_time_data.sample()["interdrop_intervals"].iloc[0])

In [None]:
plt.figure();

In [None]:
plt.vlines(raindrop_times[:100], 0, 0.1, alpha=0.5, color="#3b7ea1");

We can use the `hist` function to compute the number of raindrops in each one-second bin.

In [None]:
plt.figure();

In [None]:
counts, _, _ = plt.hist(raindrop_times, bins=range(math.ceil(max(raindrop_times) + 2)))

Note: the `np.histogram` function will compute the same values without producing a plot.

And then apply the histogram to those counts to see the distribution of numbers of raindrops per second.

In [None]:
plt.figure();

In [None]:
plt.hist(counts,
         bins=range(math.ceil(max(counts) + 2)), normed=True, align="left", histtype="step", lw=4);

Which is well-approximated by our `Poisson` probability mass function.

In [None]:
plt.plot(scipy.stats.poisson.pmf(range(math.ceil(max(counts) + 2)), mu=np.mean(counts)),
         color="C1", lw=4, marker='.', markersize=24);

Try adjusting the original `q` parameter for the `Binomial` model of the rain and re-executing all of the code.

Reducing it to something like `1 / 20` will let you see the other extreme of how a `Poisson`-distributed random variable can be distributed.

## Ask not for whom the bell curves

One of the powerful ideas examined in this course is to _apply our modeling methods to our modeling methods_.

As an example, consider what happens when we take the mean of some data generated by measuring a process. We would like to infer that the true mean of the process is close to the mean we measured. We can use our modeling tools to understand this inference: how close it likely is, the chance we are off by at least some amount, etc.

Every time we take a sample, we get a different value for the mean, and so the process of sampling the values of a random variable, or measuring the values of some phenomenon, is _also a phenomenon we can model with random variables_, where one of the random variables of interest is the _mean of the other random variables in the model_.

The distribution of the mean, measured across repeated measurements, is the _sampling distribution of the mean_, and statistics is, in one sentence, the mathematical study of sampling distributions.

### Normal distributions make statistics easy

This presents a problem: we've spent some time learning all of these distributions for our random variables, and now I'm telling you we need to learn also the sampling distributions for all of the statistics we care about (mean, standard deviation, etc.) Sounds like a tall order!

So statistics would be even harder than it is if it weren't for a cool fact:
many of those distributions have the same shape: the _bell curve_.

We'll demonstrate this below.

In [None]:
def build_mean_model(random_variable, parameters, sample_size=30):

    means_model = pm.Model()

    with means_model:
        for ii in range(sample_size):
            random_variable(name="sample_" + str(ii), **parameters)
            
    return means_model

In [None]:
# poisson
poisson_rv, poisson_params = pm.Poisson, {"mu": 1.}

# exponential
exponential_rv, exponential_params = pm.Exponential, {"lam": 0.5}

# binomial
binomial_rv, binomial_params = pm.Binomial, {"n": 2, "p": 0.1}

poisson_mean_model = build_mean_model(poisson_rv, poisson_params)
exponential_mean_model = build_mean_model(exponential_rv, exponential_params)
binomial_mean_model = build_mean_model(binomial_rv, binomial_params)

In [None]:
means_data = samples_to_dataframe(sample_from(poisson_mean_model, 500))

samples = np.asarray(means_data)

In [None]:
sample_cts = [1, 2, 5, 10, 20, 30]

f, axs = setup_clt_plot(sample_cts)

In [None]:
clt_plot(samples, sample_cts, axs, lims=[0, 5], stat_func=mean)

Try this again with `stat_func=var` and define `sample_cts = [5, 10, 20, 30]`.

You might also try larger sample counts to see how the approximation improves.

## The Normal distribution shows up frequently

The fact that the normal distribution shows up for sampling distributions is very important.

So important that that fact has a name: the _Central Limit Theorem_. Here "central" means "essential" or "important", as in "the central point of my essay". It was described by mathematician [J.F.C. Gauss](https://en.wikipedia.org/wiki/Carl_Friedrich_Gauss) in [1809](https://en.wikipedia.org/wiki/Normal_distribution#History).
For that reason, it's sometimes called the _Gaussian distribution_.

Gauss and his distribution are important enough to have once graced the 10 DM bill, before the advent of the euro:

In [None]:
Image("./img/ten_deutsche_marks.jpg")

source: https://www.kleinbottle.com/gauss.htm

The normal distribution has a somehwat mysterious-looking form when you first look at it:

#### The Normal Distribution: $$p(x; \mu, \sigma) = \frac{1}{\sqrt{2 \pi}\sigma} \mathrm{e}^{\frac{-(x -\mu)^2}{2\sigma^2}}$$

To make it more comprehensible, let's write it out in Python, giving the variables useful English names:

In [None]:
def normal_pdf(x, mu, sigma):
    normalizer = 1 / (np.sqrt(2 * pi) * sigma)
    distance_from_mean = np.square(x - mu)
    scale_of_distances = 2 * sigma ** 2
    return normalizer * np.exp(-distance_from_mean / scale_of_distances)

## Let's review our random variables.

- `pm.Binomial`: `N`, `q`. Total number of successes in `N` attempts if each attempts has a chance `q` of success. Examples: number of A's in a semester, number of grants that will be approved.

- `pm.Poisson`: `mu`. Total number of successes when individual chance is low, but there are many attempts. Examples: number of fish (_poisson_) that jump above the water in a second, number of Prussian soldiers killed by horse kicks on a given day. Result of taking `N` very large and `q` very small for a `Binomial`.

- `pm.Exponential`: `lam`. Elapsed time between events for a memoryless process. Time between Prussian soldiers killed by horse kicks. Time between radioactive decay events. If we count the events, we get a `Poisson`.

- `pm.Normal`: `mu`, `sigma`. Anything affected by a large number of uncontrolled, unrelated factors of around the same size. Human heights within a population. Many scientific measurements. `Binomial` with a large enough `N`. `Poisson` with a large `lam`.