# A Quick Primer on Probability and Distributions

This article gives a quick primer on probability and distributions as a lead into the introduction to copulas.

## The audience

This is an article for beginners. If you're an undergrad who took some basic courses in math and statistics, then you're in the right place.

## Definitions

We begin with some primer on probability and statistics. Not too much, just enough. If you're cool with it, just skip over to the next section.

### Probability Distribution Function 

Probability function is a mathematical function that denotes the probability of an occurrence from all the different possible outcomes. If the occurrences are discrete, as-is the case for rolling dices (can only integers beween 1 to 6 inclusive), then it is known as a **probability mass function (pmf)**. If the occurrence is continuous, as-is the case for measuring the height of a population, then it is known as a **probability density function (pdf)**. We are only concerned with **pdf** in this article.

Another thing to note is that in general, **pdf** is denoted in lower-case letters. For example, mathematically, we write the Normal (Gaussian) probability distribution function as

$$
  f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left( \frac{x-\mu}{\sigma} \right)^2}
$$

This is the (in)famous (dreaded) bell-curve, which is unfortunately how most students are "graded". If it helps solidify you understanding, it looks as such

In [1]:
from bokeh.io import output_notebook
from bokeh.models import HoverTool
from bokeh.plotting import figure, show

output_notebook()

In [2]:
import numpy as np
from scipy.stats import norm

def plot_normal_pdf():
    x = np.linspace(-5, 5, 10000)
    y = norm.pdf(x)
    
    h = HoverTool()
    h.tooltips = [
        ("Density", "$y"),
        ("X", "$x")
    ]
    h.mode = 'vline'
    
    p = figure(plot_width=800, plot_height=400, title="Normal Distribution PDF", tools='')
    p.line(x, y, line_color="#ff8888", line_width=4, alpha=0.7)
    p.y_range.start = 0
    p.add_tools(h)
    show(p)



plot_normal_pdf()

### Cumulative Distribution Function

Cumulative distribution function (**cdf**) is the probability that the random variable $X$ will have a value equal or less than $x$. For example, what is the probability that a person's height will be less than or equal to 180cm. Another way to think of it is that **cdf** is area under the **pdf** curve. 

One thing to note is that the **cdf** is denoted in the upper-case character. This is because it is the integral of the **pdf**. For example, we write the **cdf** of the normal distribution as

$$
F(x) = \frac{1}{\sigma \sqrt{2 \pi}} \int^x_{-\infty} e^{\left( \frac{x - \mu}{2\sigma} \right)^2} dx
$$

Using the normal distribution and height as an example. In the town of Rinseln, North Rhine-Westphalia, Germany, the height of males follow a normal distribution with mean 175cm and standard deviation 5cm. What is the probability that a random male will be less than 184cm? 

Graphically, we are trying to find the area under the curve in the **pdf** as seen in the figure below

In [3]:
def plot_aoc_pdf_height():
    x = np.linspace(150, 200, 10000)
    y = norm.pdf(x, 175, 5)
    
    p = figure(plot_width=800, plot_height=400, title="Height PDF", tools='')
    p.line(x, y, line_color="#ff8888", line_width=4, alpha=0.7)
    p.y_range.start = 0

    mask = x <= 184
    p.varea(x[mask], y[mask], color="#ff8888")
    show(p)

plot_aoc_pdf_height()

If we were to sum this area up, we'd arrive at the answer. If we plotted the area under curve with respect to (w.r.t.) the height, we would arrive at the following graph (which is a **cdf** graph).

In [4]:
def plot_cdf_height():
    x = np.linspace(150, 200, 10000)
    y = norm.cdf(x, 175, 5)
    
    p = figure(plot_width=800, plot_height=400, title="Height CDF", tools='')
    p.line(x, y, line_color="#ff8888", line_width=4, alpha=0.7)
    p.y_range.start = 0

    y_ = norm.cdf(184, 175, 5)
    p.line([184, 184], [0, y_])
    p.line([150, 184], [y_, y_])
    show(p)

plot_cdf_height()

Using the lines, we see that the probability is ~0.9641.

### Quantile functions and Inverse Transforms

Quantile functions or percent point functions (**ppf**) is essentially the inverse of the **cdf** function. Mathematically, if **cdf** is denoted like $F(x) = \dots$, then quantile functions are denoted as $F^{-1}(x) = \dots$.

Quantile functions and inverse transforms are possibly the most important parts of understanding copulas. Basically, given a uniform random variable, we can generate any random variables from other distributions using the distributions quantile function. From the snippet below, we see that we can "recover" the normal distribution even though we generated random variates from a uniform distribution.

In [5]:
from scipy.stats import uniform, gaussian_kde


def plot_ppf_normal_graph():
    u = uniform.rvs(0, 1, size=10000)
    x = norm.ppf(u)
    kernel = gaussian_kde(x)
    
    xx = np.linspace(-4, 4, 10000)
    yy = kernel(xx)  # plots the density value (height)
    
    h = HoverTool()
    h.tooltips = [
        ("Density", "$y"),
        ("X", "$x")
    ]
    h.mode = 'vline'
    
    p = figure(plot_width=800, plot_height=400, tools='',
               title="Normal Distribution PDF derived from random uniform variables")
    p.line(xx, yy, line_color="#ff8888", line_width=4, alpha=0.7)
    p.y_range.start = 0
    p.add_tools(h)
    show(p)


plot_ppf_normal_graph()

Based on the density plot above, we can see that we generated normal random variables from the uniform random variables.

### Joint distribution functions

The final thing you need to know before embarking on our copula journey is the concept of joint distributions. Simply put, the joint distribution is a combination of 2 or more **pdf**. Let's say we have 2 loaded 6-sided dice where each face rolls a value with probability described in the table below. 

| Value | Die 1 | Die 2 |
| :---- | :---: | :---: |
| 1     | 0.1   | 0.25  |
| 2     | 0.15  | 0.15  |
| 3     | 0.25  | 0.1   |
| 4     | 0.25  | 0.1   |
| 5     | 0.15  | 0.15  |
| 6     | 0.1   | 0.25  |

We say $f_1(2) = 0.15$ and $f_2(2) = 0.15$. $f_1(x)$ is known as the marginal distribution (**marginal pmf**) and we will usually denote it with a lowercase letter with subscript as such ($f_1$). The same for Die 2 ($f_2$). The joint probability distribution is then a function that describes the 2 events together. 

For example, if we are interested in the probability of the values each die take when rolling them together, we would have a joint probability distribution function as seen in the table below

| A / B | 1      | 2      | 3      | 4      | 5      | 6      |
| :---: | :----: | :----: | :----: | :----: | :----: | :----: |
|  1    | 0.0250 | 0.0150 | 0.0100 | 0.0100 | 0.0150 | 0.0250 |
|  2    | 0.0375 | 0.0225 | 0.0150 | 0.0150 | 0.0225 | 0.0375 |
|  3    | 0.0625 | 0.0375 | 0.0250 | 0.0250 | 0.0375 | 0.0625 |
|  4    | 0.0625 | 0.0375 | 0.0250 | 0.0250 | 0.0375 | 0.0625 |
|  5    | 0.0375 | 0.0225 | 0.0150 | 0.0150 | 0.0225 | 0.0375 |
|  6    | 0.0250 | 0.0150 | 0.0100 | 0.0100 | 0.0150 | 0.0250 |

So we see the chances of rolling 3 for A and 4 for B is 0.025.

Alternatively, if we wanted to know the total value of rolling 2 dies given that we roll Die A first, the probability distribution could look as such

| A / T | 2    | 3    | 4    | 5    | 6    | 7    | 8    | 9    | 10   | 11   | 12   |
| :---: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: | :--: |
| 1     | 0.25 | 0.15 | 0.1  | 0.1  | 0.15 | 0.25 | 0    | 0    | 0    | 0    | 0    |
| 2     | 0    | 0.25 | 0.15 | 0.1  | 0.1  | 0.15 | 0.25 | 0    | 0    | 0    | 0    |
| 3     | 0    | 0    | 0.25 | 0.15 | 0.1  | 0.1  | 0.15 | 0.25 | 0    | 0    | 0    |
| 4     | 0    | 0    | 0    | 0.25 | 0.15 | 0.1  | 0.1  | 0.15 | 0.25 | 0    | 0    |
| 5     | 0    | 0    | 0    | 0    | 0.25 | 0.15 | 0.1  | 0.1  | 0.15 | 0.25 | 0    |
| 6     | 0    | 0    | 0    | 0    | 0    | 0.25 | 0.15 | 0.1  | 0.1  | 0.15 | 0.25 |

So the total value given Die A's value is already taken (cast in stone) after the roll is simply the marginal distribution of Die B. 

If we were talking about continuous functions, then instead of tables, we will have an area/volume and the CDF is just the area within the area/volume. 

The reason why this joint distribution concept is important is because copulas are essentially single marginal distributions which are combined together (joint distribution). It is usually easy to measure the univariate data (for example the risk characteristics of a bond like the 10 year treasury or Microsoft's stock price), but how do we measure their relationship together? As an example, if you only held 10 year treasury and microsoft stocks, how would you determine the chances of your portfolio losing more than 30% in a single month? You could certainly say that the 10 year treaury's returns follow a normal distribution and Microsoft's returns follow a student-T distribution. However, how would you combine them together to form their joint distribution? 

Using a copula is but one of the myraid ways to do so.