# Measurements of Distance between distributions

## The questions

1. We have a master distribution, e.g. of values that occured on a stationary timeseries throughout the last two years on a daily base. We get a new timeseries for the recently past quarter.<br>**How similar is the distribution of new values in comparison to those of the past?**
<br>**Are the amount and quality of outliers suspicious?**

2. We know, that the _shapes_ of certain timeseries (e.g. imports) are very similar to each other. We get the current update on those.<br>**Is there any "outlier" regarding the amount of occurences of values?**

## "The" Answer: Kullback-Leibler-Divergence

[Wikipedia](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence):
> Is a measure of how one probability distribution is different from a second, reference probability distribution.

Perfect similarity between two different PDFs imply a KL Divergence of 0. The more dissimilarity between both, the higher the value. Can also be used to measure the randomness of a timeseries.

In [None]:
import numpy as np
from scipy.special import rel_entr

def plot(p,q):
    plt.plot(x, p)
    plt.plot(x, q)
    plt.fill_between(x, p, q, where=q>=p, interpolate=True)
    plt.fill_between(x, p, q, where=q<=p, interpolate=True)
    plt.show()

In [None]:
import matplotlib.pyplot as plt
x = np.arange(0, 20, 0.001)

p = stats.norm.pdf(x,10,2)
q = stats.norm.pdf(x,11,2)

plot(p,q)

In [None]:
p = p/p.sum()
q = q/q.sum()

In [None]:
sum(rel_entr(p, q))

Let's diverge the shapes a little bit:

In [None]:
q = stats.norm.pdf(x,13,5)
q = q / q.sum()

plot(p,q)

Observe KL-Divergence to raise to sill a low value, since the shapes are still similar:

In [None]:
sum(rel_entr(p, q))

### What about exponential distribution?

In [None]:
q = stats.expon.pdf(x)
q = q/q.sum()
plot(p,q)

Now, the divergence raises significantly:

In [None]:
sum(rel_entr(p, q))

## KL Divergence is not symmetrical!

In [None]:
sum(rel_entr(p, q))

In [None]:
sum(rel_entr(q, p))

## A symmetrical extension: Jennsen-Shannon-Divergence

In [None]:
from scipy.spatial.distance import jensenshannon

In [None]:
jensenshannon(p, q)

In [None]:
jensenshannon(p, q)