topics Models extension for Mallet & scikit-learn
Java Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
HDP/java move files to java folder Jun 9, 2013
data update by chyikwei Oct 1, 2012
hLDA Issue #3: fix stoplist path May 13, 2016

1. Mallet Extension

In Mallet package, it only contains two topic Models--LDA and Hierachical LDA. So I tried to implement some useful topic modeling methods on it.


  • Hierarchical Dirichlet Process with Gibbs Sampling. (in HDP folder)
  • Inference part for hLDA. (in hLDA folder)


  1. This is an extension for Mallet, so you need to have Mallet's source code first.
  2. put, and in src/cc/mallet/topics folder.
  3. If you are going to run HDP, make sure you include knowceans package in your project.
  4. run or will give you a demo for a small dataset in data folder.


2. Scikit-learn Extension

Scikit-learn doesn't have any topic models yet, so I modified Matthew D. Hoffman's onlineldavb into scikit-learn format.


  • online LDA with variational EM. (In LDA folder)


  1. Make sure numpy, scipy, and scikit-learn are installed.
  2. run python test in lda folder for unit test
  3. The onlineLDA model is in
  4. For a quick exmaple, run python online will fit a 10 topics model with 20 NewsGroup dataset. online means we use online update(or partial_fit method). Change online to batch will fit the model with batch update(or fit method).

I updated the currecnt code based on currrent scikit-learn development branch. Older scikit-learn version will thorw error, but I am pretty sure everyone can fixed this with little alteration. (2014/09/15)