# Basic Distributions

### A. Taylan Cemgil
### Boğaziçi University, Dept. of Computer Engineering


### Notebook Summary
* We review the notation and parametrization of densities of some basic distributions that are often encountered
* We show how random numbers are generated using python libraries
* We show some basic visualization methods such as displaying histograms 

# Sampling From Basic Distributions

Sampling from basic distribution is easy using the numpy library. 
 
Formally we will write

$x \sim p(X|\theta)$

where $\theta$ is the _parameter vector_, $p(X| \theta)$ denotes the _density_ of the random variable $X$ and $x$ is a _realization_, a particular draw from the density $p$.

The following distributions are building blocks from which more complicated processes may be constructed. It is important to have a basic understanding of these distributions.


### Continuous Univariate 
* Uniform $\mathcal{U}$
* Univariate Gaussian $\mathcal{N}$
* Gamma $\mathcal{G}$
* Inverse Gamma $\mathcal{IG}$
* Beta $\mathcal{B}$

### Discrete 
* Poisson $\mathcal{P}$
* Bernoulli $\mathcal{BE}$
* Binomial $\mathcal{BI}$
* Categorical $\mathcal{M}$
* Multinomial $\mathcal{M}$

### Continuous Multivariate (todo)
* Multivariate Gaussian $\mathcal{N}$
* Dirichlet $\mathcal{D}$

### Continuous Matrix-variate (todo)
* Wishart $\mathcal{W}$
* Inverse Wishart $\mathcal{IW}$
* Matrix Gaussian $\mathcal{N}$


## Sampling from the standard uniform $\mathcal{U}(0,1)$ 

For generating a single random number in the interval $[0, 1)$ we use the notation
$$
x_1 \sim \mathcal{U}(x; 0,1)
$$

In python, this is implemented as

In [None]:
import numpy as np

x_1 = np.random.rand()

print(x_1)

We can also generate an array of realizations $x_i$ for $i=1 \dots N$,
$$
x_i \sim \mathcal{U}(x; 0,1)
$$

In [None]:
import numpy as np

N = 5
x = np.random.rand(N)

print(x)


For large $N$, it is more informative to display an histogram of generated data: 

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

# Number of realizations
N = 50000
x = np.random.rand(N)

plt.hist(x, bins=20)
plt.xlabel('x')
plt.ylabel('Count')
plt.show()

$\newcommand{\indi}[1]{\left[{#1}\right]}$
$\newcommand{\E}[1]{\left\langle{#1}\right\rangle}$


We know that the density of the uniform distribution $\mathcal{U}(0,1)$ is

$$
\mathcal{U}(x; 0,1) = \left\{ \begin{array}{cc}  1 & 0 \leq x < 1 \\ 0 & \text{otherwise} \end{array}  \right.
$$
or using the indicator notation
$$
\mathcal{U}(x; 0,1) = \left[ x \in [0,1) \right]
$$

#### Indicator function

To write and manipulate discrete probability distributions in algebraic expression, the *indicator* function is useful:

$$ \left[x\right] = \left\{ \begin{array}{cc}
                                  1 & x\;\;\text{is true} \\
                                  0 & x\;\;\text{is false}
                                \end{array}
 \right.$$
This notation is also known as the Iverson's convention. 

#### Aside: How to plot the density and the histogram onto the same plot? 

In one dimension, the histogram is simply the count of the data points that fall to a given interval. Mathematically, we have
$j = 1\dots J$ intervals where $B_j = [b_{j-1}, b_j]$ and $b_j$ are bin boundries such that $b_0 < b_1 < \dots < b_J$. 
$$
h(x) =  \sum_{j=1}^J \sum_{i=1}^N \indi{x \in B_j} \indi{x_i \in B_j}
$$
This expression, at the first sight looks somewhat more complicated than it really is. The indicator product just encodes the logical condition $x \in B_j$ __and__ $x_i \in B_j$. The sum over $j$ is just a convenient way of writing the result instead of specifying the histogram as a case by case basis for each bin. It is important to get used to such nested sums.

When the density $p(x)$ is given, the probability that a single realization is in bin $B_j$ is given by
$$
\Pr\left\{x \in B_j\right\} = \int_{B_j} dx p(x) = \int_{-\infty}^{\infty} dx \indi{x\in B_j} p(x) = \E{\indi{x\in B_j}}
$$
In other words, the probability is just the expectation of the indicator.

The histogram can be written as follows 
$$
h(x) =  \sum_{j=1}^J \indi{x \in B_j} \sum_{i=1}^N \indi{x_i \in B_j}
$$

We define the counts at each bin as 
$$
c_j \equiv \sum_{i=1}^N \indi{x_i \in B_j}
$$

If all bins have the same width, i.e., $b_j - b_{j-1} = \Delta$ for $\forall j$, and if $\Delta$ is sufficiently small we have

$$
\E{\indi{x\in B_j}} \approx p(b_{j-1}+\Delta/2) \Delta
$$

i.e., the probability is roughly the interval width times the density evaluated at the middle point of the bin. The expected value of the counts is

$$
\E{c_j} = \sum_{i=1}^N \E{\indi{x_i \in B_j}} \approx N \Delta p(b_{j-1}+\Delta/2) 
$$

Hence, the density should be roughly 

$$
p(b_{j-1}+\Delta/2) \approx \frac{\E{c_j} }{N \Delta}
$$

The $N$ term is intuitive but the $\Delta$ term is easily forgotten. When plotting the histograms on top of the corresponding densities, we should scale the normalized histogram ${ c_j }/{N}$ by dividing by $\Delta$.