Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
topics Models extension for Mallet & scikit-learn
Java Python
branch: lda

This branch is 1 commit ahead, 33 commits behind master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.

In current Mallet package, it only contains two topic Models--LDA and Hierachical LDA.
So I tried to implement some useful topic modeling method on it:

  • Hierarchical Dirichlet Process
  • inference part for hLDA


  1. This is an extension for Mallet, so you need to have Mallet's source code first.
  2. put, and in src/cc/mallet/topics folder.
  3. If you are going to run HDP, make sure you have knowceans package.
  4. run or will give you a demo for a small dataset in data folder.


Update History:

2012/10/01 version 0.1

  • bug fix: print correct topic number in training
  • add cross validation in HDP
  • add inferencer class
  • add preplexity calculation in inferencer

2012/09/29 Version 0.1

  • bug fix: printed result are correct now
  • bug fix: empty topic are caused by initial topic number > 0
  • change initial topic assignment to uniform distribution and remove empty topics.

2012/09/28 Version 0.1

  • bug: Topic number and total word count not match in printed result
  • bug: some topics are empty but not removed

2012/09/27 Version 0.1

  • main algorithm work. not finished all function,
  • bug: auto update hyper-parameter doesn't work well. Disable it for now.
Something went wrong with that request. Please try again.