-
This project described a Monte Carlo EM (MCEM) method to derive Maximum Likelihood Estimates (MLE) of the log-likelihood function.
-
In the E-step, perform K = 500 Gibbs sampling incorporated with a Metropolis-Hastings step, and drop the first 100 as a burn-in procedure.
-
Read article: Maximum Likelihood Algorithms for Generalized Linear Mixed Models (McCulloch 1997)
-
See
Project Summary.pdf
here
In this project, we consider a clustering problem. Suppose we have observed n observations, each observation is a binary process, i.e. the response . Here n is the number of subjects and T is the length of observation. In general, T might vary across subjects, time points may also be different. In this project, however, we simply assume that all subjects have common time length and time points. We also assume that these subjects belong to two clusters. For each cluster, the conditional expectation of response variable is
where is cluster membership, and are fixed and random effects, respectively. The link function is given. In a typical clustering problem, is usually unknown, and hence we treat as another random effect.
For random effects, we assume that and (then ). Then the parameter to be estimated is . Treating random effects as missing data, one can write the complete data likelihood function as
where is the density function of Normal distribution, . is the dummy variable of
the random effects and are called the latent varaibles and is called complete data. The distribution of depends on and the distribution of depends on , and .
Generate 100 simulations. In each simulation, set and . The true values of parameter are: and
We estimated the augmented posterior liklihood and computed the expected log-augmented posterior at each iteration.
Given the current guess to the posterior mode , we supply the method of Monte Carlo to calculate the function. In particular, the Monte Carlo E-step is given as
Since it is difficult to sample directly from the multivariate posterior distribution , We can use Gibbs Sampling, a Markov chain Monte Carlo (MCMC) algorithm to obtain a sequence of observations which are approximated from the multivariate distribution.
Then, we suppose that is the th component of the th sample, we want to draw the th component of the th sample from these two conditional distributions. We draw
Since still obey Bernoulli distribution in prior distribution, it is obtainable for normal sampling. However, for variable , it is still hard to directly sample from the posterior distribution because its intractable intergal denominator. So, we proposed Metropolis-Hastings methods to generate random numbers. More details in the following algorithm charts.
where the accept probability
we choose marginal distribution of to substitude the proposed density , we could simplify the accept probability as follow
Then we use the Monte Carlo Integrating to approximate the expectation of the log-likelihood.
Partial differentiate the complete log-likelihood of the parameters and set derivatives to 0, we get the maximum likelihood estimators
However the MLE of is hard to obtain as a close form. We can use Netwon-Raphson algorithm to iteratively approximate it.
Since the posterior predictions are hard to be estimated, we can apply some acceleration methods to better estimate them. Therefore, it helps to calibrate the estimating process and save time.
The convergence of EM algorithm is governed by the fraction of missing information. Thus, when the proportion of missing data is high, convergence can be quite slow. In this regard, Louis (1982) has proposed a device for accelerating the convergence of the EM algorithm.
The former iterative maximization of posterior predictive function could be written as
where is the mode of the augmented posterior predictive function.
Louis also notes that this approximation is the most useful in a neighborhood of the mode .
So, we have
Combine the Newton-Raphson with the acceleration results, we obtain
- EM Algorithm is sensitive to the initial values of parameters.
- For each parameter, we calculate the changing rate of it and let be as following
where is to assure that the denominator is positive. Setting the threshold , if then we will consider the simulation converges.
-
Our convergences are pretty good, all parameters are converged in less than 50 steps, which cost about 1 minute.
-
We monitor the convergence of the alogorithm by plotting vs. iteration number i and the plot reveals random flucuation about the line . So, we may continue with a large value of m to decrease the system variability.
Variables | True Value | Initial Value | Converged Value |
---|---|---|---|
1 | 0 | 0.9953680 | |
1 | 0 | 1.4076125 | |
2 | 1 | 1.387342 | |
10 | 5 | 9.132040 | |
0.6 | 0.8 | 0.480500 |
-
We conducted different number of simulations: and evaluate the corresponding MSE. From the result, we concluded that MCEM could obtain a fair results based on the intialization mentioned before.
-
We demonstrate the first 100 experiments of parameters and its estimation results below
- The MSE of and by MCEM are much bigger than other parameters. This may be the result of the difference of the magnitudes.
Louis Turbo EM accelerates the EM algorithm as we can see that LTEM attains better result with good fixed initialization