Skip to content

Latest commit

 

History

History
28 lines (19 loc) · 1.2 KB

File metadata and controls

28 lines (19 loc) · 1.2 KB

Stochastic Variational Inference for Latent Dirichlet Allocation

Code structure from the OnlineVB code provided by Matthew D. Hoffman (mdhoffma@cs.princeton.edu) and the algorithm is as described in Hoffman's paper below

Based on the following papers:

###Also aiming to implement SVI for HDP as described in the second paper above, work in progress

###How to Use See 'Help' using python stochastic_lda.py -h

You will need:

  • A file [dictionary.csv] containing your vocabular
  • A file [doclist.txt] containing the list of documents in the directory that you want to sample from
  • At the moment your documents can be just a normal txt file, no pre-processing required

For classwork, work in progress...

  • Basic initial implementation
  • Debug for common corpus
  • Support Command-Line Usage for user-defined test mode and normal mode
  • Run on own data
  • Implement HDP