topics Models extension for Mallet & scikit-learn
In current Mallet package, it only contains two topic Models--LDA and Hierachical LDA.
So I tried to implement some useful topic modeling method on it:

  • Hierarchical Dirichlet Process
  • inference part for hLDA


  1. This is an extension for Mallet, so you need to have Mallet's source code first.
  2. put, and in src/cc/mallet/topics folder.
  3. If you are going to run HDP, make sure you have knowceans package.
  4. run or will give you a demo for a small dataset in data folder.


Update History:

2012/10/01 version 0.1

  • bug fix: print correct topic number in training
  • add cross validation in HDP
  • add inferencer class
  • add preplexity calculation in inferencer

2012/09/29 Version 0.1

  • bug fix: printed result are correct now
  • bug fix: empty topic are caused by initial topic number > 0
  • change initial topic assignment to uniform distribution and remove empty topics.

2012/09/28 Version 0.1

  • bug: Topic number and total word count not match in printed result
  • bug: some topics are empty but not removed

2012/09/27 Version 0.1

  • main algorithm work. not finished all function,
  • bug: auto update hyper-parameter doesn't work well. Disable it for now.
