# Normal likelihood examples

Assume $$X_1, \ldots, X_n \overset{\text{iid}}{\sim} \text{Normal}(m, 1 / \tau)$$

The normal likelihood for one data point is
$$
f(x_i \mid m, 1/\tau^2) = \sqrt{\frac{\tau}{2 \pi}} \exp\left[ - \frac{\tau}{2}(x_i - m)^2 \right]
$$

The likelihood for all the data is

The normal likelihood for one data point is
$$
f(x_1, \ldots, x_n \mid m, 1/\tau^2) = \left(\frac{\tau}{2 \pi}\right)^{n/2} \exp\left[ - \frac{\tau}{2}\sum_i (x_i - m)^2 \right]
$$


NB: evaluating the above function can get so close to $0$, that the computer will incorrectly round down to $0$. This is numerical underflow. It's bad, so to avoid it we'll work with the natural log of the above:

$$
\log f(x_1, \ldots, x_n \mid m, 1/\tau^2) = (n/2)\log(\tau) - (n/2) \log (2 \pi) - \frac{\tau}{2}\sum_i (x_i - m)^2
$$

This isn't really an issue when you're dealing with conjugate priors, bt it will be when we aren't later on.


In [1]:
import numpy as np
n = 1000
x_data = np.random.normal(loc=0.0, scale=2.0, size=n)# pretend this data comes from real life and we don't know the parameters


In [2]:
# def normal_log_like(data, mean, precision):
#     n = data.shape[0]
#     return .5*n*np.log(precision) - .5*n*np.log(2*np.pi) - .5*precision*np.sum((data-mean)**2)
#normal_log_like(x_data, 0, 1)

### case 1: known precision/variance

let's say we know the precision of the data: $\tau = 3$

then we only have to put a prior on $m$. The conjugate prior is normal.

$$
p(m) = \text{Normal}(\mu_0, 1/\tau_0)
$$

we pick $\mu_0$ and $\tau_0$.

#### Bayes rule:

$$
p(m \mid x_1, \ldots, x_n) \propto p(x_1, \ldots, x_n \mid m)p(m) 
$$

See example in slides and screenshotted derivation for proof:
$$
p(m \mid x_1, \ldots, x_n) = \text{Normal}(\mu_0 \left(\frac{\tau_0}{\tau_0 + n\tau} \right) + \bar{x} \left(\frac{n \tau}{\tau_0 + n\tau} \right), [\tau_0 + n\tau]^{-1})
$$

In [3]:
# when we use conjugate priors, we only need to use Python like a simple calculator
xbar = np.average(x_data)
tau0 = 0.001 # chosen hyperparameter
mu0 = 0  # chosen hyperparameter
tau = 3 # assumed known
# "getting" the posterior is simply invoking the formula
posterior_mean = xbar*(n*tau/(tau0 + n*tau )) + mu0*(tau0/(tau0 + n*tau ))
posterior_precision = tau0 + n*tau 
posterior_variance = 1/posterior_precision

### case 2: unknown precision/variance

let's say we don't know the precision/variance of the data. Then we have to have a prior for both $m$ and $\tau$. 

(Carol, in the second screenshot you posted, they call $\tau$ $w$. Let's stick with $\tau$ for continuity, though.)

The conjugate prior is

$$
p(m \mid \tau)p(\tau) = \text{Normal}(\mu_0, \frac{1}{\tau\tau_0})\text{Gamma}(\alpha_0/2, \beta_0/2)
$$

we have to pick $\mu_0$, $\tau_0$, $\alpha_0$, and $\beta_0$. In Python, this will just be assigning four `float` variables.

#### Bayes rule:

We'll end up with a product posterior. Product one will be a normal, and product two will be a gamma. 
$$
p(m, \tau \mid x_1, \ldots, x_n) = p(m \mid \tau, x_1, \ldots, x_n)p( \tau \mid x_1, \ldots, x_n)
$$

Specifically
$$
p(m, \tau \mid x_1, \ldots, x_n) = \text{Normal}(m ; \mu_0\frac{\tau_0}{n + \tau_0} + \bar{x}\frac{n}{n + \tau_0 }, \tau^{-1}(n + \tau_0)^{-1}) \times 
\text{Gamma}(\tau ; (n+\alpha_0)/2, \frac{\beta_0 + \sum_i (x_i - \bar{x})^2 + \frac{ n \tau_0}{(n + \tau_0)} (\bar{x} - \mu_0)^2 }{2})
$$



#### things to notice:

The overall/joint posterior $p(m, \tau \mid x_1, \ldots, x_n)$ is not normal. It doesn't make sense to have a normal distribution for a thing that must be positive $\tau$.

The *conditional* posterior $p(m \mid \tau, x_1, \ldots, x_n)$ is normal, but it depends on a thing you don't really know $(\tau)$ 

The *conditional* posterior $p(m \mid \tau, x_1, \ldots, x_n)$ has the same form as the above situation with known variance/precision.

The *marginal* posterior $p(\tau \mid x_1, \ldots, x_n)$ is Gamma. 

The four *prior* hyperparameter were $\mu_0$, $\tau_0$, $\alpha_0$, and $\beta_0$. 

These turned into four *posterior* hyperparameters $\mu_0\frac{\tau_0}{n + \tau_0} + \bar{x}\frac{n}{n + \tau_0 }, (n + \tau_0 )^{-1}, n+\alpha_0, \beta_0 + \sum_i (x_i - \bar{x})^2 + \frac{ n \tau_0}{(n + \tau_0)} (\bar{x} - \mu_0)^2$ 

Sometimes they call the posterior hyper parameters with  $\mu_n$, $\tau_n$, $\alpha_n$, and $\beta_n$. This is shorter, but it doesn't tell you the update formulae.




In [4]:
# pick prior hyperparameters encoding prior knowledge about m, tau
mu_0 = 0
tau_0 = .01
alpha_0 = 1
beta_0 = 1
# calculate posterior hyperparameters
# I copy/pasted a lot of this from above
xbar = np.mean(x_data)
mu_n = xbar*(n/(tau_0 + n )) + mu_0*(tau_0/(tau_0 + n ))
tau_n = n + tau_0
alpha_n = n + alpha_0
beta_n = beta_0 +  np.var(x_data)*n + (n*tau_0)/(n + tau_0)*(xbar-mu_0)**2

#### How to pick prior

When you don't know your variance, you need to pick $\mu_0$, $\tau_0$, $\alpha_0$, and $\beta_0$. 

If you want the prior expectation to be a certain number, you set $\mu_0$ equal to it.

$$
\mathbb{E}[m] = \mathbb{E}[ \mathbb{E}[ m \mid \tau ]] = \mu_0
$$

One down three to go...

$\tau_0$ can be thought of an equivalent number of observations. So assume your brain has three data points for free. Then $\tau_0 = 3$. 

Finally, we have to pick the last two. There are a range of alpha/beta pairs that would give you $P(M > 9)$. The easiest way is to simulate guess and check. 


In [19]:
from scipy.stats import gamma, norm

def get_approx_prob_greater_than_nine(mu0, tau0, alpha0, beta0, num_samples):
    
    random_taus = gamma.rvs(a=alpha0,scale=1/beta0,size=num_samples)
    random_mus = norm.rvs(loc=mu0, scale = 1/np.sqrt(random_taus*tau0), size=num_samples)
    return np.average(random_mus > 9)

get_approx_prob_greater_than_nine(3, 3, .1, .1, 1000)

0.219

In [5]:
# proofs below 

Proof (feel free to ignore on a first reading)

First find part 1/2 $p(m \mid \tau, x_1, \ldots, x_n)$. We can ignore $\tau$ pieces here.

\begin{align*}
p(m \mid \tau, x_1, \ldots, x_n)
&\propto 
f(x_1, \ldots, x_n \mid m, 1/\tau^2) p(m \mid \tau)p(\tau) \\
&\propto
\exp\left[ - \frac{\tau}{2}\sum_i (x_i - m)^2 \right] 
\exp\left[ - \frac{\tau \tau_0}{2}(m - \mu_0)^2 \right]\\
&\propto 
\exp\left[ - \frac{\tau}{2}\sum_i (x_i - m)^2  - \frac{\tau \tau_0}{2}(m - \mu_0)^2 \right] \\
&\propto 
\exp\left[ - \frac{\tau}{2}\left\{ \sum_i (x_i - m)^2  + \tau_0(m - \mu_0)^2\right\} \right] \\
&\propto 
\exp\left[ - \frac{\tau}{2}\left\{ nm^2 - 2m n \bar{x}  + \tau_0(m^2 - 2m \mu_0)\right\} \right] \\
&\propto 
\exp\left[ - \frac{\tau}{2}\left\{ m^2 (n + \tau_0) - 2m (\mu_0\tau_0 + n\bar{x}) \right\} \right]
\end{align*}

So
$$
m \mid \tau, x_1, \ldots, x_n \sim \text{Normal}(\mu_0\frac{\tau_0}{n + \tau_0} + \bar{x}\frac{n}{n + \tau_0 }, \tau^{-1}(n + \tau_0)^{-1})
$$

This is the same formula as the above. It depends on $\tau$--we can do that because this is a *conditional* posterior. The mean is a weighted average, and the posterior precisions are additive. 



Now let's find the second part--the marginal posterior $p( \tau \mid x_1, \ldots, x_n)$. 


\begin{align*}
p(\tau \mid x_1, \ldots, x_n) 
&\propto
\int
p( x_1, \ldots, x_n \mid m, \tau)p(m \mid \tau) p(\tau) dm
\\
&=
\tau^{n/2}\tau^{1/2}\tau^{\alpha_0/2-1}\exp[\tau \beta_0/2]
\int
\exp\left[ - \frac{\tau}{2}\sum_i (x_i - m)^2 \right] 
\exp\left[ - \frac{\tau \tau_0}{2}(m - \mu_0)^2 \right]
dm \\
&=
\tau^{n/2}\tau^{1/2}\tau^{\alpha_0/2-1}\exp[\tau \beta_0/2]
\exp\left[ - \frac{\tau}{2}\sum_i (x_i - \bar{x})^2 \right] 
\int
\exp\left[ - \frac{\tau}{2}n (m - \bar{x})^2 \right] 
\exp\left[ - \frac{\tau \tau_0}{2}(m - \mu_0)^2 \right]
dm \\
&=
\tau^{n/2}\tau^{1/2}\tau^{\alpha_0/2-1}\exp[\tau \beta_0/2]
\exp\left[ - \frac{\tau}{2}\sum_i (x_i - \bar{x})^2 \right] 
\int
\exp\left[ - \frac{\tau}{2}\left(n (m - \bar{x})^2 + \tau_0(m - \mu_0)^2 \right)  \right] 
dm \\
&=
\tau^{n/2}\tau^{\alpha_0/2-1}\exp[\tau \beta_0/2]
\exp\left[ - \frac{\tau}{2}\sum_i (x_i - \bar{x})^2 \right] 
\exp\left[\frac{ \tau n \tau_0}{2(n + \tau_0} (\bar{x} - \mu_0)^2 \right]
\end{align*}

So the marginal posterior for $\tau$ is 

$$
\text{Gamma}\left(
\frac{n+\alpha_0}{2}, 
\frac{\beta_0 + \sum_i (x_i - \bar{x})^2 + \frac{ n \tau_0}{(n + \tau_0)} (\bar{x} - \mu_0)^2 }{2}
\right)
$$

The joint posterior is the product of these two functions.
