Fugue 1.0 Roadmap

Liangjie Hong edited this page Apr 3, 2016 · 10 revisions
Clone this wiki locally

Overview

The first version of Fugue would generally support a full-fledged implementation of latent Dirichlet allocation (LDA) [1] by using collapsed Gibbs sampling [2] as the inference algorithm.

Release Date

June, 2016.

Features

Here, we list several major functionalities:

Models

LDA

  • Train and test using collapsed Gibbs sampling
  • Using "Estimate theta" method for computing perplexity in test documents. (FG-M-6] [Section 5 of 6]
  • Using multiple MCMC chains to average results. (FG-M-6] [7]
  • Slice sampling for hyper-parameter tuning (FG-M-7) [4,5]
  • Optimization for hyper-parameter tuning (FG-M-1) [3]

Utilities

  • Log-space sampling for Multinomial distribution (FG-M-2)
  • Multinomial tests (FG-M-4)

References

  1. Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). Lafferty, John, ed. "Latent Dirichlet Allocation". Journal of Machine Learning Research 3 (4–5): pp. 993–1022.
  2. Griffiths, Thomas L.; Steyvers, Mark (April 6, 2004). "Finding scientific topics". Proceedings of the National Academy of Sciences 101 (Suppl. 1): 5228–5235.
  3. Estimating a Dirichlet distribution
  4. Hanna Wallach’s course note.
  5. Chapter 2 of Hanna Wallach’s dissertation.
  6. Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009. Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09).
  7. Viet-An Nguyen, Jordan L. Boyd-Graber, Philip Resnik: Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling. EMNLP 2014: 1752-1757