Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
topics Models extension for Mallet & scikit-learn
Java Python
branch: lda

This branch is 1 commit ahead, 33 commits behind master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
HDP/java
LDA
data
hLDA
README.md

README.md

In current Mallet package, it only contains two topic Models--LDA and Hierachical LDA.
So I tried to implement some useful topic modeling method on it:

  • Hierarchical Dirichlet Process
  • inference part for hLDA

Usage:

  1. This is an extension for Mallet, so you need to have Mallet's source code first.
  2. put HDP.java,HDPInferencer.java and HierarchicalLDAInferencer.java in src/cc/mallet/topics folder.
  3. If you are going to run HDP, make sure you have knowceans package.
  4. run HDPTest.java or hLDATest.java will give you a demo for a small dataset in data folder.

References:


Update History:

2012/10/01 version 0.1

  • bug fix: print correct topic number in training
  • add cross validation in HDP
  • add inferencer class
  • add preplexity calculation in inferencer

2012/09/29 Version 0.1

  • bug fix: printed result are correct now
  • bug fix: empty topic are caused by initial topic number > 0
  • change initial topic assignment to uniform distribution and remove empty topics.

2012/09/28 Version 0.1

  • bug: Topic number and total word count not match in printed result
  • bug: some topics are empty but not removed

2012/09/27 Version 0.1

  • main algorithm work. not finished all function,
  • bug: auto update hyper-parameter doesn't work well. Disable it for now.
Something went wrong with that request. Please try again.