Topic Modeling using Latent Dirichlet Allocation
A Parallel Stochastic Collapsed Variational Bayesian Inference for LDA (SCVB0) implementation in C++ using OpenMP.
We have implemented a parallel implementation of SCVB0 algorithm proposed by James Foulds et al. We refer Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation. All the notations used are same as mentioned in this paper.
We have used New York Times dataset available at UCI Machine Learning Repository.
The dataset is divided into minibatches of size 100 documents each.
We have used OpenMp to parallelize the execution of algorithm. All the minibatches are divided among the available number of processors and then algorithm is executed parallelly. Results are updated in global matrices nPhi, nTheta and nZ
We have analyzed the perplexity convergence on KOS and NIPS datasets available on the same webpage as that of NYT dataset.
Use following commands to execute the code
$ ./fastLDA docword.txt iterations NumOfTopics