# Inference with samples

In this notebook we demonstrate some of the basic ideas of how to do inference with samples from a probability distribution. We don't concern ourselves here with how to draw samples in general, only with what to do with the samples once we have them.

The starting point is the standard Monte Carlo estimator. We can approximate integrals over some probability density $p(\theta)$ given samples $\theta^{(1)}, \dots, \theta^{(N)}$ as follows.

$$
    \mathbb{E}(f(\theta)) = \int f(\theta) p(\theta) d\theta \approx \frac{1}{N}\sum_{n=1}^N f(\theta^{(n)})
$$

Most quantities of interest can be characterised in this way. For example, mean can be computed by setting $f(\theta) = \theta$, in which case we see that the Monte Carlo estimator for the mean is simply the sample mean.

$$
    \mathbb{E}(\theta) = \int \theta p(\theta) d\theta \approx \frac{1}{N} \sum_{n=1}^N \theta^{(n)}
$$

Probabilities can be characterised by integrating an appropriately chosen indicator function against the density. For example for the probability that $\theta \geq 0$ we would do

$$
    \mathrm{Pr}(\theta \geq 0) = \int I_{\theta \geq 0}(\theta) p(\theta) d\theta \approx \frac{\#\{\theta^{(i)} \geq 0\}}{N}
$$

We can also characterise quantiles such as the median as integrals, in which case the Monte Carlo estimator is simply the corresponding sample quantile.

Let's see some examples.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

To start we'll keep things simple, and suppose we were doing inference on a random variable $X \sim \mathcal{N}(10, 1)$. We first generate some samples.

In [None]:
x = np.random.normal(loc=10, scale=1, size=500)

The estimator for the mean is simply the sample mean. If the estimator is good, it should be close to 10.

In [None]:
x.mean()

What about the probability that $X \geq 10$? The true answer is 0.5 because of symmetry. Our estimate from the sample is

In [None]:
(x >= 10).sum() / x.size

Similarly the probability that $X \geq 11$? The true answer is about 0.159. Our estimate

In [None]:
(x >= 11).sum() / x.size

All of these will converge to the true answer as we add samples.

In [None]:
x = np.random.normal(loc=10, scale=1, size=1000000)
(x >= 11).sum() / x.size

The same kinds of estimates hold if we sample from multivariate distributions. Let's suppose $X_i \sim \mathcal{N}(\mu_i, 1)$ are independent with $\mu_1 = 0$ and $\mu_2 = 1$. What's the probability that $X_1 \geq X_2$? Given samples we would estimate as follows

In [None]:
x = np.random.multivariate_normal(np.array([0, 1]), np.identity(2), size=500)

(x[:, 0] >= x[:, 1]).sum() / x.shape[0]

In this case, the answer can be computed analytically also, because $X_1 - X_2 \sim \mathcal{N}(-1, 2)$, and so $\mathrm{Pr}(X_1 \geq X_2) = \mathrm{Pr}(X_1 - X_2 \geq 0) \approx 0.240$.

The same approach generalises to arbitrarily complicated distributions, where it may not be possible to write down the distribution of combinations of variables like this.

## Exercises

For each of the following exercises, draw 1000 samples from the appropriate distribution to produce estimates.

Note that we parametrise the normal distribution in terms of its variance $\mathcal{N}(\mu, \sigma^2)$, though NumPy uses standard deviation $\mathcal{N}(\mu, \sigma)$.

   1. Suppose that $X \sim \mathcal{N}(12, 4^2)$. Estimate $\mathrm{Pr}(X \geq 8)$.

In [None]:
# your code here

2. Suppose that $X \sim \mathrm{Beta}(4, 17)$. Estimate the median. Estimate the 2.5th, 25th, 75th and 97.5th percentiles.

In [None]:
# your code here

   3. Suppose that $X_1 \sim \mathcal N (2, 3^2)$, $X_2 \sim \mathcal N (1, 4^2)$ are **not** independent and that $\mathcal{cov}(X_1, X_2) = -8$. 
       
       a. Plot a scatter plot of samples of $X_1$ and $X_2$.

In [None]:
# your code here

  b. Estimate the probability that $3X_1 \leq 2X_2$.

In [None]:
# your code here

   4. *Challenge*. Suppose that $A \sim \mathrm{Poisson}(45)$, $B \sim \mathrm{Poisson}(30)$, and $X | A, B \sim \mathrm{Beta}(A, B)$. Estimate $\mathbb{E}\left(\tan\left(\frac{\pi X}{2}\right)\right)$ and $\mathrm{var}\left(\tan\left(\frac{\pi X}{2}\right)\right)$

In [None]:
# your code here