# Distributions

In [1]:
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [3, 3]
import numpy as np

All distributions share the following common attributes:
* They have a denotation $X \sim  \text{Name(statistical parameters)}$
* They have a function $f(x|\text{sp})$ which gets you from the $x$ value to the probability of that $x$ value, i.e. $P(X=x)$
* They have an expected value $E[X]$, aka **mean**. It's the weighted average of all values $X$ can take, with the weighting done by the probability of the the values. 
* They have a variance $\text{Var}[X]$. This is a measure of how spread out its values are. In words it can be described as  how far you expect values of X to be from the mean squared, or $E[(X-\mu)^2]$.

$X$ here is a **random variable**, and $x$ denotes the set of possible values $X$ can take.

Put another way, $X$ is an event or measurement, the outcome or value of which we aren't certain of. 

So you can say the probability that $X$ is a particular $x$ is based on the probability function of $X$ and the various statistical parameters which describe the distribution.

$$P(X=x|\text{sp}) = f(x|\text{sp})$$

## Mathematical vs model approximation distributions

You can think of a distribution in two ways. First there is the mathematical distributions. These are like the platonic ideals, like the straight line or perfect circle: They exist only in our heads, not in the real world.

When we bring things into the real world we have to deal in approximations, like if you draw a circle on a piece of paper, that 'approximates' the platonic form of a circle. If we have a random event in the real world, that can be similar to one of the mythical distributions.

## Discrete and Continuous Distributions
Distributions can be divided into discrete or continuous distributions. This describes the nature of the input $X$. 

If the input is an event, in the form of the number of successes, counting the number of occurances, or events where the outcome is a set of integers, then these might be approximated with a discrete distribution. Discrete distributions include:

* Bernoulli (for two possible outcomes of a particular event, success or failure)
* Binomial (n repeated Bernoulli trials)
* Geometric (number of Bernoulli trials needed until you get the first success)
* Multinomial (Like a binomial where there are more than two outcomes) 
* Poisson (Use it when you are counting the occurance of events, either in time or space)

If the input is continuous (can be anything on a real number scale: heights etc.) then a continuous distribution is usually a better approximation. Note that the *result* of the probability function isn't necessarily continuous: see uniform distribution below.

An important difference between calculating probabilities in continuous distributions is that you do it *between two values* or over an interval - the probability that x is a specific real number is 0. 

* Uniform - the probability is evenly distributed over an interval (i.e. is equally likely to take any value in that interval) but is nil outside it.
* Exponential - describes events occuring at a given rate.  
* Normal / Gaussian
* T Distribution
* Beta 
* Gamma

## Probability Density or Mass Function (PDF/PMF)

Each distribution has (or more accrurately is defined by) a function which translates possible observations $x$ of an uncertain variable $X$ with the probability of getting that observation. The function is a function of $x$ and the statistical parameters of the distribution. 

$$P(X=x) = f(x \mid \text{statistical parameters})$$

For discrete distributions, where $x$ can be integer values, this function is referred to as the Probability Mass Function, PMF, and for continouous distributions as the Probability Density Function.

Using this function, it's easy to figure out $P(X=x)$.

## Cumulative Density Function

If the PDF is the probability that $X=x$, the CDF is the probability that $X$ will fall between two values.

For discretes this is simple to conceptualise: It's the sum of each probability in the range you are looking at:

$$P(a \le X \le b) = F(x) = \sum_{i=a}^{b} f(x)$$

For cumulative it's a big tricker. The best way to think about it is to picture the PDF as a graph of $x$ and $f(x)$. The probability that $x$ falls between two values is the area under the PDF line between two points, or in other words, the integral:

$$P(a \le X \le b) = F(x) = \int_{a}^{b} f(x) dx$$

The CDF that $X$ is any value in the population will always be 1, i.e.

$$P(1 \le X \le n) = 1$$

$$P(-\infty \le X \le \infty) = \int_{-\infty}^{\infty} f(x) dx = 1$$

## Expectation

The expected value, $E[X]$, or **mean**, of a distribution is the probability weighted average. For a discrete distribution you take all the possible values of X, multiply them by P(X), and sum up the results.

$$E[X] = \sum_{i=1}^n x_i P(X_i = x_i) = \sum_{i=1}^n x_i f(x_i)$$

$$E[X] = \int_{-\infty}^\infty x f(x) dx$$