# LSE ST451: Bayesian Machine Learning
## Author: Kostas Kalogeropoulos

## Week 5: Variational Bayes

Topics covered 
 - Mean field approximation
 - Automatic Differentiation Variational Inference (ADVI) in RStan

Mean field approximation will be illustrated using basic libraries in Python. For ADVI we will use RStan via R Studio. It is possible to use PyStan too but it is not tested as much.

In [26]:
import pandas as pd
import numpy as np

### Mean Field approximation 

We will first simulate $100$ independent observations from the model 

$$
y\sim N(\mu, \tau^{-1})
$$

with $\mu=5$ and $\tau^{-1}=0.5$.

Then we will treat $\mu$ and $\tau^{-1}$ as unknown and will use the mean field approximation algorithm presented in the lecture to estimated them. 

#### Simulate Data

In [27]:
#Set parameters and simulate data
n = 100
mu = 3
tau = 2
std = np.sqrt(1/tau)
y = mu + std*np.random.randn(n)

#Set prior hyperparameters
mu0 = 0
lam0 = 1 #unit information prior
a0 = 0.001
b0 = 0.001


# get sufficient stats
Sy = np.sum(y)
Sy2 = np.sum(y**2)
m02 = mu0**2

#### Run the algorithm

In [28]:
#initialise parameters
muf = 0
tauf = 1
af = 1
bf = 1

#algorithmic parameters
maxiter = 1000
tol = 0.0000001

#objects to store the values of the parameters to be optimised.
Thetas = np.ones((maxiter,4))
Thetas[0,] = np.array([muf,tauf,af,bf])

#main while loop
i = 0
diff = 1 
while (i<maxiter) and (diff>tol):
    i = i+1
    af = a0+(n+1)/2
    Emu = muf   #E(mu)
    Emu2 = (1/tauf)+muf**2   #E(mu^2)
    bf = b0+0.5*(Sy2-2*muf*Sy+n*Emu2)+0.5*lam0*(m02 - 2*muf*mu0 + Emu2)
    tauf = (lam0+n)*af/bf   #E(tau)
    muf = (lam0*mu0+Sy)/(lam0+n)
    Thetas[i,] = np.array([muf,tauf,af,bf])
    dThetas = (Thetas[i,]-Thetas[i-1,])**2
    diff = np.max(dThetas)

#summarise output
muf = Thetas[i,0]
tauf = Thetas[i,1]
af = Thetas[i,2]
bf = Thetas[i,3]
print('Converged at ',i,' iterations')
results = np.array([[muf,af/bf,tauf],[np.mean(y),1/np.var(y),n/np.var(y)]])
col = ['muf','tauf','tau muf']
ind = ['VB','MLE']
results = pd.DataFrame(results,columns = col,index=ind)
results


Converged at  6  iterations


Unnamed: 0,muf,tauf,tau muf
VB,3.008337,1.779627,179.742336
MLE,3.038421,2.125395,212.5395


### Activity 1

Let $y=(y_1, \dots, y_n)$ be independent Poisson($\lambda$) observations. Assume that $\lambda$ follows the Gamma($2,\beta$) distribution, where $\beta$ follows the Exponential($1$) distribution. The aim is to draw inference from the posterior $\pi(\theta|y)$, where $\theta=(\lambda, \beta)$. 

#### Simulate data

In [36]:
#Set parameters and simulate data
n = 100
np.random.seed(1)
beta_true = np.random.exponential(1,1)
lambda_true  = np.random.gamma(2,1/beta_true,1)
print('beta value: ',beta_true,' lambda value: ',lambda_true)
y = np.random.poisson(lambda_true,n)

# get sufficient stats
sy = np.sum(y)
print('ybar: ',sy/n)

beta value:  [0.53960584]  lambda value:  [1.53955169]
ybar:  1.41


The variational Bayes algorithm **approximates** $\pi(\theta|y)$ using the mean field approximation 

$$
q(\theta|y, \phi)=q(\lambda|y, \phi)q(\beta|y, \phi)
$$

It can be shown (see exam paper of 2019, question 2a) that such an algorithm may consist of the following steps

 1. Initialise at $q(\lambda)$ to be the Gamma($a_{\lambda},b_{\lambda}$) and $q(\beta)$ to be the Gamma($a_{\beta},b_{\beta}$) distribution, setting $$a_{\lambda}=2+\sum_iy_i,\;\; b_{\lambda}=b_{\lambda}^0,\;\;\;a_{\beta}=3,\;\text{ and }\;\;b_{\beta}=b_{\beta}^0. $$
 2. Iteratively update $b_{\lambda}$ and $b_{\beta}$ until the parameters or the ELBO converge. At iteration $i$ will have:
 
    a. Set $$b_{\lambda}^{i}=n+\mathbb{E}_{q(\beta)}[\beta]=n+3/b_{\beta}^{i-1}$$
    
    b. Set $$b_{\beta}^{i}=1+\mathbb{E}_{q(\lambda)}[\lambda]=1+(2+\sum_iy_i)/b_{\lambda}^{i}$$


**Task:** Code the above algorithm and fit it to the simulated data. Check your answers in terms of the lambda estimates.

Put your code below