Every repository with this icon (
Every repository with this icon (
| name | age | message | |
|---|---|---|---|
| |
.gitignore | Fri Jul 24 10:30:07 -0700 2009 | |
| |
CHANGELOG | Fri Aug 14 13:13:16 -0700 2009 | |
| |
README | Thu Sep 24 14:19:29 -0700 2009 | |
| |
README.markdown | Wed Nov 18 16:13:04 -0800 2009 | |
| |
Rakefile | Tue Jul 21 08:12:55 -0700 2009 | |
| |
VERSION.yml | Mon Aug 10 23:08:30 -0700 2009 | |
| |
ext/ | Fri Jul 24 07:48:25 -0700 2009 | |
| |
lda-ruby.gemspec | Mon Aug 10 23:08:30 -0700 2009 | |
| |
lib/ | Fri Aug 14 18:11:26 -0700 2009 | |
| |
license.txt | Fri Nov 14 13:33:23 -0800 2008 | |
| |
test/ | Tue Oct 13 11:43:55 -0700 2009 |
Latent Dirichlet Allocation – Ruby Wrapper
What is LDA-Ruby?
This wrapper is based on C-code by David M. Blei. In a nutshell, it can be used to automatically cluster documents into topics. The number of topics are chosen beforehand and the topics found are usually fairly intuitive. Details of the implementation can be found in the paper by Blei, Ng, and Jordan.
The original C code relied on files for the input and output. We felt it was necessary to depart from that model and use Ruby objects for these steps instead. The only file necessary will be the data file (in a format similar to that used by SVMlight). Optionally you may need a vocabulary file to be able to extract the words belonging to topics.
Example usage:
require 'lda-ruby'
corpus = Lda::DataCorpus.new("data/data_file.dat")
lda = Lda::Lda.new(corpus) # create an Lda object for training
lda.em("random") # run EM algorithm using random starting points
lda.load_vocabulary("data/vocab.txt")
lda.print_topics(20) # print the topic 20 words per topic
If you have general questions about Latent Dirichlet Allocation, I urge you to use the topic models mailing list, since the people who monitor that are very knowledgeable. If you encounter bugs specific to lda-ruby, please post an issue on the Github project.
Resources
References
Blei, David M., Ng, Andrew Y., and Jordan, Michael I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. 3 (Mar. 2003), 993-1022 [pdf].







