Code structure from the OnlineVB code provided by Matthew D. Hoffman (mdhoffma@cs.princeton.edu) and the algorithm is as described in Hoffman's paper below
Based on the following papers:
- Latent Dirichlet Allocation by David M. Blei, Andrew Y. Ng and Michael I. Jordan
- Stochastic Variational Inference by Matthew D. Hoffman, David M. Blei, Chong Wang and John Paisley
###Also aiming to implement SVI for HDP as described in the second paper above, work in progress
###How to Use
See 'Help' using
python stochastic_lda.py -h
You will need:
- A file [dictionary.csv] containing your vocabular
- A file [doclist.txt] containing the list of documents in the directory that you want to sample from
- At the moment your documents can be just a normal txt file, no pre-processing required
For classwork, work in progress...
- Basic initial implementation
- Debug for common corpus
- Support Command-Line Usage for user-defined test mode and normal mode
- Run on own data
- Implement HDP