# `qp` Demo

In this notebook we use the `qp` module to approximate some standard 1-D PDFs using sets of quantiles, and assess the accuracy of the quantile parametrization(TM) in comparison to other popular parametrizations.

### Requirements

To run `qp`, you will need to first install the module. 

In [None]:
import numpy as np
import scipy.stats as sps
import scipy.interpolate as spi

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
%matplotlib inline

import qp

## The `qp.PDF` Class

This is the basic element of `qp` - an object representing a probability density function. This class is stored in the module `pdf.py`.

In [None]:
# ! cat qp/pdf.py

## Approximating a Gaussian

Let's summon a PDF object, and initialize it with a standard function - a Gaussian.

In [None]:
dist = sps.norm(loc=0, scale=1)
P = qp.PDF(truth=dist)

Let's sample the PDF to see how it looks.  When we plot the `PDF` object, both the true and sampled distributions are displayed.

In [None]:
samples = P.sample(1000, using='truth', vb=False)
S = qp.PDF(samples=samples)
P.plot()

Now, let's compute a set of evenly spaced quantiles. These will be carried by the `PDF` object as `p.quantiles`.  We also demonstrate the initialization of a `PDF` object with quantiles and no truth function.

In [None]:
quants = np.array([0.01,0.02,0.03,0.04,0.05,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.95,0.96,0.97,0.98,0.99])
quantiles = P.quantize(quants=quants)#(percent=10.)
Q = qp.PDF(quantiles=quantiles)

Let's also compute a histogram representation that will be carried by the `PDF` object as `p.histogram`.  We can similary initialize a `PDF` object with a histogram and no truth function.

In [None]:
T = qp.PDF(truth=dist)
histogram = T.histogramize(nbins=10, binrange=[-2., 2.])
R = qp.PDF(histogram=histogram)

Let's test interpolation at a single point for the quantile parametrization.

In [None]:
print(P.approximate(0.314, using='quantiles'))

Now, let's interpolate the function over an evenly spaced grid with points within and out of the quantile range.  

In [None]:
grid = np.linspace(-3., 3., 100)
P.approximate(grid, using='quantiles')

Let's visualize the PDF object in order to compare the truth and the approximations.  The solid, black line shows the true PDF evaluated between the bounds.  The green rugplot shows the locations of the 1000 samples we took.  The vertical, dotted, blue lines show the percentiles we asked for, and the hotizontal, dotted, red lines show the 10 equally spaced bins we asked for.  Note that the quantiles refer to the probability distribution *between the bounds*, because we are not able to integrate numerically over an infinite range. Interpolations of each parametrization are given as dashed lines in their corresponding colors.  Note that the interpolations of the quantile and histogram parametrizations are so close to each other that the difference is almost imperceptible.

In [None]:
P.plot()
T.plot()

Next, let's compare the different parametrizations to the truth using the Kullback-Leibler Divergence (KLD).  The KLD is a measure of how close two probability distributions are to one another -- a smaller value indicates closer agreement.  It is measured in units of bits of information, the information lost in going from the second distribution to the first distribution.  The KLD calculator here takes in a shared grid upon which to evaluate the true distribution and the interpolated approximation of that distribution and returns the KLD of the approximation relative to the truth, which is not in general the same as the KLD of the truth relative to the approximation.  Below, we'll calculate the KLD of the approximation relative to the truth over different ranges, showing that it increases as it includes areas where the true distribution and interpolated distributions diverge.

In [None]:
qD1 = qp.utils.calculate_kl_divergence(P, Q, limits=(-1.,1.))
qD2 = qp.utils.calculate_kl_divergence(P, Q, limits=(-2.,2.))
qD3 = qp.utils.calculate_kl_divergence(P, Q, limits=(-3.,3.))

hD1 = qp.utils.calculate_kl_divergence(T, R, limits=(-1.,1.))
hD2 = qp.utils.calculate_kl_divergence(T, R, limits=(-2.,2.))
hD3 = qp.utils.calculate_kl_divergence(T, R, limits=(-3.,3.))

sD1 = qp.utils.calculate_kl_divergence(P, S, limits=(-1.,1.))
sD2 = qp.utils.calculate_kl_divergence(P, S, limits=(-2.,2.))
sD3 = qp.utils.calculate_kl_divergence(P, S, limits=(-3.,3.))

print(qD1, qD2, qD3)
print(hD1, hD2, hD3)
print(sD1, sD2, sD3)

The progression of KLD values should follow that of the root mean square (RMS), another measure of how close two functions are to one another.  The RMS also increases as it includes areas where the true distribution and interpolated distribution diverge.  Unlike the KLD, the RMS is symmetric, meaning the distance measured is not that of one distribution from the other but of the symmetric distance between them.

In [None]:
qRMS1 = qp.utils.calculate_rms(P, Q, limits=(-1.,1.))
qRMS2 = qp.utils.calculate_rms(P, Q, limits=(-2.,2.))
qRMS3 = qp.utils.calculate_rms(P, Q, limits=(-3.,3.))

hRMS1 = qp.utils.calculate_rms(T, R, limits=(-1.,1.))
hRMS2 = qp.utils.calculate_rms(T, R, limits=(-2.,2.))
hRMS3 = qp.utils.calculate_rms(T, R, limits=(-3.,3.))

sRMS1 = qp.utils.calculate_rms(P, S, limits=(-1.,1.))
sRMS2 = qp.utils.calculate_rms(P, S, limits=(-2.,2.))
sRMS3 = qp.utils.calculate_rms(P, S, limits=(-3.,3.))

print(qRMS1, qRMS2, qRMS3)
print(hRMS1, hRMS2, hRMS3)
print(sRMS1, sRMS2, sRMS3)