Latent dirichlet allocation (LDA) for datamicroscopes
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cmake
conda/microscopes-lda
include/microscopes/lda
microscopes
src/lda
test
.gitignore
.travis.yml
CHANGELOG.md
CMakeLists.txt
LICENSE.md
Makefile
README.md
setup.py

README.md

microscopes-lda

A Python package for finding unobserved structure in unstructed data.

This package contains an implementation of the nonparametric (HDP) latent Dirichlet allocation (LDA) model described by Teh et al in Hierarchal Dirichlet Processes (Journal of the American Statistical Association 101: pp. 1566–1581). Unlike the original LDA model, nonparametric LDA does not require the user to select a number of topics. Instead, the number of topics is inferred from the data using a hierarchal Dirichlet process prior.

The current kernel follows the sampling scheme described in Section 5.1 Posterior sampling in the Chinese restaurant franchise. In the future, we may support the other kernels described in Teh's paper.

Numerical computation is implemented in C++ for efficiency.

Installation

OS X and Linux builds of microscopes-lda are released to Anaconda.org. Installing them requires Conda. To install the current release version run:

$ conda install -c datamicroscopes -c distributions microscopes-lda