Latent dirichlet allocation (LDA) for datamicroscopes
C++ Python CMake Makefile Shell

README.md

microscopes-lda

A Python package for finding unobserved structure in unstructed data.

This package contains an implementation of the nonparametric (HDP) latent Dirichlet allocation (LDA) model described by Teh et al in Hierarchal Dirichlet Processes (Journal of the American Statistical Association 101: pp. 1566–1581). Unlike the original LDA model, nonparametric LDA does not require the user to select a number of topics. Instead, the number of topics is inferred from the data using a hierarchal Dirichlet process prior.

The current kernel follows the sampling scheme described in Section 5.1 Posterior sampling in the Chinese restaurant franchise. In the future, we may support the other kernels described in Teh's paper.

Numerical computation is implemented in C++ for efficiency.

Installation

OS X and Linux builds of microscopes-lda are released to Anaconda.org. Installing them requires Conda. To install the current release version run:

$ conda install -c datamicroscopes -c distributions microscopes-lda