# Sampling
In this notebook we want to investigate different sampling methods. We start with inverse transform sampling. 
Assume we have a pdf $f(x)$ with a corresponding cdf $F(x) = P(X <= x) = \int_{-\infty}^{x} f(y) dy$. The goal is to draw samples $X\propto f(x)$ from the pdf
distributions in order to compute sample statistics. In this way we can approximate the distribution $f(X)$.

## Motivation
Why do we want to draw samples from distributions:

1. Mathematical expression of f(x) is too complicated (e.g. complext integral definition, etc.) to compute statistics, hence we have to rely on sample statistics.
2. Using sampling techniques we can compute integrals, e.g. with MCMC.
3. In bayesian inference it gives us an approximation of the denominator (which is often intractable)
4. Also important for stochastical gradient descent methods (see https://arxiv.org/abs/1506.03431)

### Sampling from Uniform distribution
This is something we assume that we can do that. It basically deals with random number generators.

### Inverse Transform Sampling

#### Procedure:

1. Take random samples u of a uniform distribution $\mathcal{U}$ between 0 and 1
2. Interprete these samples u as probabilities
3. Return the largest number x from the pdf, such that ${P(-\infty < X < x )} \le u$
4. x is a sample of your pdf

For point 3 you need the inverse of the cdf in order to solve for x.

#### Intuitions
Assume $F(x) = P(X < x)$ is monotonically increasing function (not always true).
We start with a uniform random variate $U \propto \mathcal{U}(0,1)$. Let us define
$X = T(U)$, where $T$ is some function which maps $[0,1] \rightarrow \mathbb{R}$.
In some sense you can interprete $T$ as a function which maps probabilities to the real space.
The random variate then has a non uniform distribution. We want it to follow the distribution $f(x)$, hence
we want $P(T(U)<x) = F(X)$. It turns out that $T(U) = F^{-1}(U)$ does the trick:

$$P(F^{-1}(U)<x) = P(U<F(x)) = F(x)$$, where we used
$P(U < y) = y$ if $U$ is uniform: *The probability of drawing a number smaller than y from a uniform distribution is y*. (nice explanations are here: https://www.quora.com/What-is-an-intuitive-explanation-of-inverse-transform-sampling-method-in-statistics-and-how-does-it-relate-to-cumulative-distribution-function)



In [57]:
import numpy as np
import pandas as pd
from scipy import stats
import hvplot.pandas  # noqa
import math
from matplotlib import pyplot as plt
import panel as pn

#### 1.) Inverse Sampling: Example

$$f(x) = \lambda \exp(-\lambda x)$$

This leads to

$$F(x) = \int_{-\infty}^x \lambda \exp(-\lambda y) dy  = [\frac{-1}{\lambda}f(y)]_{-\infty}^{x} = 1- \exp(-\lambda x)$$.

The last line holds since it is only supported from 0 to $\infty$.

And hence the inverse will be:

$$1- \exp(-\lambda x) = u$$
$$1-u=\exp(-\lambda x)$$
$$\ln(1-u) = -\lambda x$$
$$\frac{\ln(1-u)}{-\lambda} = x$$

To summarize, we get:

$$F^{-1}(x) = \frac{\ln(1-u)}{-\lambda} = x$$

In [29]:
# function definition
def exp_pdf(x, lda=1):
    return lda*np.exp(-lda*x)

def exp_cdf(x, lda=1):
    return 1-np.exp(-lda*x)

def inv_exp_cdf(u, lda=1):
    return (np.log(1-u)/(-lda))

In [30]:
# small test
x=1
print(x)
print(exp_pdf(x))
u = exp_cdf(x)
print(u)
print(inv_exp_cdf(u))

1
0.36787944117144233
0.6321205588285577
1.0


In [43]:
### plot
x = np.linspace(0,5,100)
pdf = exp_pdf(x) 
cdf = exp_cdf(x) 

df = pd.DataFrame(zip(x,pdf,cdf), columns=["x", "pdf", "cdf"])

In [44]:
df.hvplot(x="x")

In [56]:
u = np.random.rand(1000)
hvplot.plot(pd.Series(u), kind="hist")

In [82]:
win_size = pn.widgets.slider.IntRangeSlider(end=100, start=1)

@pn.depends(win_size)
def plot_uniform(size):
    u = np.random.rand(size)
    return hvplot.plot(pd.Series(u), kind="hist")

In [83]:
plot_uniform()

TypeError: plot_uniform() missing 1 required positional argument: 'size'

In [89]:
import bokeh

In [91]:
#bokeh.sampledata.download()

In [92]:
from bokeh.sampledata import stocks

pn.extension()

In [93]:
title = '## Stock Explorer hvPlot'

tickers = ['AAPL', 'FB', 'GOOG', 'IBM', 'MSFT']

def get_df(ticker, window_size):
    df = pd.DataFrame(getattr(stocks, ticker))
    df['date'] = pd.to_datetime(df.date)
    return df.set_index('date').rolling(window=window_size).mean().reset_index()

def get_plot(ticker, window_size):
    df = get_df(ticker, window_size)
    return df.hvplot.line('date', 'close', grid=True)

In [94]:
ticker = pn.widgets.Select(name='Ticker', options=tickers)
window = pn.widgets.IntSlider(name='Window Size', value=6, start=1, end=21)

@pn.depends(ticker, window)
def get_plot(ticker, window_size):
    df = get_df(ticker, window_size)
    return df.hvplot.line('date', 'close', grid=True)

pn.Row(
    pn.Column(title, ticker, window),
    get_plot
)