Latent Dirichlet Allocation – Ruby Wrapper

What is LDA-Ruby?

This wrapper is based on C-code by David M. Blei. In a nutshell, it can be used to automatically cluster documents into topics. The number of topics are chosen beforehand and the topics found are usually fairly intuitive. Details of the implementation can be found in the paper by Blei, Ng, and Jordan.

The original C code relied on files for the input and output. We felt it was necessary to depart from that model and use Ruby objects for these steps instead. The only file necessary will be the data file (in a format similar to that used by SVMlight). Optionally you may need a vocabulary file to be able to extract the words belonging to topics.

Example usage:

require 'lda-ruby'
corpus = Lda::DataCorpus.new("data/data_file.dat")
lda = Lda::Lda.new(corpus)    # create an Lda object for training
lda.em("random")              # run EM algorithm using random starting points
lda.load_vocabulary("data/vocab.txt")
lda.print_topics(20)          # print all topics with up to 20 words per topic

If you have general questions about Latent Dirichlet Allocation, I urge you to use the topic models mailing list, since the people who monitor that are very knowledgeable. If you encounter bugs specific to lda-ruby, please post an issue on the Github project.

Resources

References

Blei, David M., Ng, Andrew Y., and Jordan, Michael I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. 3 (Mar. 2003), 993-1022 [pdf].

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
ext/lda-ruby		ext/lda-ruby
lib		lib
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
Rakefile		Rakefile
VERSION.yml		VERSION.yml
lda-ruby.gemspec		lda-ruby.gemspec
license.txt		license.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Dirichlet Allocation – Ruby Wrapper

What is LDA-Ruby?

Example usage:

Resources

References

About

Releases

Packages

Contributors 7

Languages

License

ealdent/lda-ruby

Folders and files

Latest commit

History

Repository files navigation

Latent Dirichlet Allocation – Ruby Wrapper

What is LDA-Ruby?

Example usage:

Resources

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages