topics Models extension for Mallet & scikit-learn
Java Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
HDP/java move files to java folder Jun 9, 2013
LDA
data update by chyikwei Oct 1, 2012
hLDA Issue #3: fix stoplist path May 13, 2016
README.md

README.md

1. Mallet Extension

In Mallet package, it only contains two topic Models--LDA and Hierachical LDA. So I tried to implement some useful topic modeling methods on it.

Model:

  • Hierarchical Dirichlet Process with Gibbs Sampling. (in HDP folder)
  • Inference part for hLDA. (in hLDA folder)

Usage:

  1. This is an extension for Mallet, so you need to have Mallet's source code first.
  2. put HDP.java,HDPInferencer.java and HierarchicalLDAInferencer.java in src/cc/mallet/topics folder.
  3. If you are going to run HDP, make sure you include knowceans package in your project.
  4. run HDPTest.java or hLDATest.java will give you a demo for a small dataset in data folder.

References:

2. Scikit-learn Extension

Scikit-learn doesn't have any topic models yet, so I modified Matthew D. Hoffman's onlineldavb into scikit-learn format.

Model:

  • online LDA with variational EM. (In LDA folder)

Usage:

  1. Make sure numpy, scipy, and scikit-learn are installed.
  2. run python test in lda folder for unit test
  3. The onlineLDA model is in lda.py.
  4. For a quick exmaple, run python lda_example.py online will fit a 10 topics model with 20 NewsGroup dataset. online means we use online update(or partial_fit method). Change online to batch will fit the model with batch update(or fit method).

Note:
I updated the currecnt code based on currrent scikit-learn development branch. Older scikit-learn version will thorw error, but I am pretty sure everyone can fixed this with little alteration. (2014/09/15)

Reference: