# Metropolis-Hastings 04b

## A mixture model, with JAGS

Example 04 is not actually a complete Bayesian analysis, since it gives just a target function to simulate. Here we 
study a model with an analogous posterior pdf.
We consider the example in [John Kruschke' blog (Doing Bayesian Data Analysis) - Mixture of Normal Distributions ](http://doingbayesiandataanalysis.blogspot.com/2012/06/mixture-of-normal-distributions.html).

In [1]:
#install.packages("R2jags", repos= "https://cloud.r-project.org")
require(R2jags)

Loading required package: R2jags
Loading required package: rjags
Loading required package: coda
Linked to JAGS 4.3.0
Loaded modules: basemod,bugs

Attaching package: 'R2jags'

The following object is masked from 'package:coda':

    traceplot



## Data

Samples from two normal pdf's with equal sd and different means. The actual model is prepared for a more general setting, as a mixture of `Nclust` normal pdf's.

In [2]:
# Given parameters for data
trueM1<-100
N1<-200
trueM2<-145 # 145 for first example below; 130 for second example
N2<-200
trueSD<-15
effsz<-abs( trueM2 - trueM1 ) / trueSD

In [3]:
# Generate random data from known parameter values:
set.seed(47405)
y1<-rnorm( N1 )
y1<-(y1-mean(y1))/sd(y1) * trueSD + trueM1
y2<-rnorm( N2 )
y2<-(y2-mean(y2))/sd(y2) * trueSD + trueM2
y<-c(y1,y2)
N<-length(y)

In [4]:
# clust is a length N vector, containing the assignment of each sample to one of the clusters. In the general setting,
# each entry in clust will be an integer comprised between 1 and Nclust.
Nclust<-2

In [5]:
# Initial values in clust
clust<-rep(NA,N)
# Must have at least one data point with fixed assignment to each cluster,
# otherwise some clusters will end up empty:
clust[which.min(y)]<-1 # smallest value assigned to cluster 1
clust[which.max(y)]<-2 # highest value assigned to cluster 2

### Mixture implementation

The mixture is implemented as a convex combination:

$$
    Y=I\cdot Y_{1} + (1-I)\cdot Y_{2},
$$

where $Y_{1}\sim\operatorname{N}(\mu_{1},\sigma^{2})$, $Y_{2}\sim\operatorname{N}(\mu_{2},\sigma^{2})$, and
$I\sim\operatorname{Bernoulli}(p)$. 
The vector of indicators $(I,1-I)$ is implemented, more generally, as a multivariate Bernoulli vector `pClust` of length `Nclust`. 

This parameter $p$ is assumed to be unknown, hence it must be given a prior pdf. Here we model the multivariate Bernoulli vector `pClust` with a conjugate prior Dirichlet distribution with parameter a vector of ones of length `Nclust`, the natural Non Informative Prior. 

This vector of ones `onesRepNclust` must be added to the initial data.

In [6]:
mixture.data<-list(
    y = y,
    N = N,
    Nclust = Nclust,
    clust = clust,
    onesRepNclust=rep(1,Nclust)
    )

### Likelihood

Each $i$-th observation, $1\leq i\leq N$  is assigned to the cluster $\alpha(i)$, and follows a normal distribution
with mean $\mu_{i}$ and common precision parameter $\tau$.

The mean $\mu$ is equal to the mean of its group: $\mu_{i}=\mu_{\alpha(i)}$ (observe the deterministic node below).
The assignment $\alpha(i)$ (the value `clust[i]`) is obtained from the multivariate Bernoulli distribution, modelled as
the `dcat` JAGS function, which has integer values from 1 to `Nclust`, and parameter a length `Nclust` vector `pClust` of _probabilites,_ that is, non negative values not required to be normalized. They will be internally.

### Prior

The common precision parameter is given a $\operatorname{Gamma}(0.01,0.01)$ prior.

Both group means $\mu_{\alpha(i)}$ are given normal priors with mean 0 and a very small precision (essentially a NIP precision prior).

The multivariate Bernoulli assignment distribution is given a Dirichlet NIP prior as described above.

In [15]:
cat("model {
    # Likelihood:
    for( i in 1 : N ) {
      y[i] ~ dnorm(mu[i],tau )
      mu[i] <- muOfClust[ clust[i] ]
      clust[i] ~ dcat(pClust[1:Nclust])
        }
    sigma<-1/sqrt(tau)
    # Prior:
    tau~dgamma(0.01,0.01)
    for(clustIdx in 1:Nclust){
        muOfClust[clustIdx]~dnorm(0,1.0E-10)
        }
    pClust[1:Nclust]~ddirch(onesRepNclust)
}",file="mixture.jags")

In [16]:
mixture.m1<-jags(data=mixture.data, n.chains=3,
        parameters.to.save=c("muOfClust", "sigma","pClust"), 
        model.file="mixture.jags",n.iter=10000,n.burnin=2000)

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 402
   Unobserved stochastic nodes: 402
   Total graph size: 1213

Initializing model



In [17]:
mixture.m1.mcmc<-as.mcmc(mixture.m1)

In [18]:
summary(mixture.m1.mcmc)


Iterations = 2001:9993
Thinning interval = 8 
Number of chains = 3 
Sample size per chain = 1000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                  Mean       SD  Naive SE Time-series SE
deviance     3332.1419 62.81562 1.1468510       7.694330
muOfClust[1]  100.7004  2.92192 0.0533467       0.285230
muOfClust[2]  144.6245  2.71700 0.0496054       0.165274
pClust[1]       0.5052  0.04446 0.0008117       0.002618
pClust[2]       0.4948  0.04446 0.0008117       0.002618
sigma          15.5705  1.63905 0.0299249       0.136092

2. Quantiles for each variable:

                  2.5%       25%      50%       75%     97.5%
deviance     3278.8548 3306.1200 3322.866 3341.7785 3401.6707
muOfClust[1]   97.5479   99.4521  100.416  101.3689  103.7181
muOfClust[2]  141.3512  143.8692  144.874  145.8702  147.7786
pClust[1]       0.4375    0.4818    0.503    0.5248    0.5775
pClust[2]       0.4225    0.4752    0.497    0.5182    0.562