# Empirical Distribution
The value of the empirical distribution at a point is equal to the proportions of observations(sorted) from the sample that are less than or equal to that point.
## Formal Definition:
If $\xi_n = [x_1, x_2, x_3, ..., x_n]$ is a sample of size n, then empirical distribution of the sample $\xi_n$ is given by 

$$\mathbb{F}_n(x)=\frac{1}{n} \sum_{i=1} ^{n} 1_{x_i \leq x}$$

where $1_{x_i \leq x}$ is the indicator function, which takes a value $1$ at $x_i \leq x$ and $0$ otherwise

### Note:
1) The empirical distribution function is the distribution function for a discrete variable.
This is because, for a discrete variable, the distribution function is constant between sample points, and it jumps by a value of $\frac{1}{n}$ at the sample points, which is exactly what happens for the empirical distribution function. 

2) The empirical distribution function can itself be treated as a random variable. If treated so, then Under the hypothesis that all the random variables $X_1,...,X_n$ have the same distribution $\mathbb{F}_X (x)$, the expected value of $\mathbb{F}_n (x)$ is $\mathbb{F}_X (x)$.
Furthermore if $X_1, X_2, ..., X_n$ are mutually independent then $var(\mathbb{F}_n (x)=\frac{1}{n} F_X (x) [1-F_X (x)]$

3) As a consequence of Note 2, we get that for large samples the empirical distribution converges in mean-square to the true distribution function. 

Next we show an easy way to visualize the empirical distribution function.

In [4]:
import numpy as np
dataset1=np.random.normal(0,1,20)
dataset1.sort()

In [11]:
dataset1

array([-0.98534689, -0.81616941, -0.61393596, -0.57443876, -0.51972836,
       -0.43983862, -0.40597508, -0.27891672, -0.21495022, -0.06602983,
       -0.0176446 ,  0.08299753,  0.20757269,  0.23551045,  0.3417133 ,
        0.84613348,  0.89003914,  1.18655989,  1.68436845,  1.95998853])

In [15]:
def empirical_distribution(x, dataset):
    counter=0;
    for i in range(len(dataset)):
        if dataset[i]<=x:
            counter=counter+1;
        else:
            continue;
    return counter/len(dataset);

In [17]:
empirical_distribution(-0.5, dataset1)

0.25