Fugue 1.0 Roadmap
Clone this wiki locally
The first version of Fugue would generally support a full-fledged implementation of latent Dirichlet allocation (LDA)  by using collapsed Gibbs sampling  as the inference algorithm.
Here, we list several major functionalities:
- Train and test using collapsed Gibbs sampling
- Using "Estimate theta" method for computing perplexity in test documents. (FG-M-6] [Section 5 of 6]
- Using multiple MCMC chains to average results. (FG-M-6] 
- Slice sampling for hyper-parameter tuning (FG-M-7) [4,5]
- Optimization for hyper-parameter tuning (FG-M-1) 
- Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). Lafferty, John, ed. "Latent Dirichlet Allocation". Journal of Machine Learning Research 3 (4–5): pp. 993–1022.
- Griffiths, Thomas L.; Steyvers, Mark (April 6, 2004). "Finding scientific topics". Proceedings of the National Academy of Sciences 101 (Suppl. 1): 5228–5235.
- Estimating a Dirichlet distribution
- Hanna Wallach’s course note.
- Chapter 2 of Hanna Wallach’s dissertation.
- Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009. Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09).
- Viet-An Nguyen, Jordan L. Boyd-Graber, Philip Resnik: Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling. EMNLP 2014: 1752-1757