github
Advanced Search
  • Home
  • Pricing and Signup
  • Explore GitHub
  • Blog
  • Login

ealdent / lda-ruby

  • Admin
  • Watch Unwatch
  • Fork
  • Your Fork
  • Pull Request
  • Download Source
    • 12
    • 2
  • Source
  • Commits
  • Network (2)
  • Issues (0)
  • Downloads (0)
  • Wiki (2)
  • Graphs
  • Branch: master

click here to add a description

click here to add a homepage

  • Branches (3)
    • TRY-refactor1
    • experimental
    • master ✓
  • Tags (0)
Sending Request…
Enable Donations

Pledgie Donations

Once activated, we'll place the following badge in your repository's detail box:
Pledgie_example
This service is courtesy of Pledgie.

A Ruby wrapper for Latent Dirichlet Allocation (LDA). — Read more

  cancel

lda-ruby

  cancel
  • Private
  • Read-Only
  • HTTP Read-Only

This URL has Read+Write access

remove mention of defunct mailing list 
ealdent (author)
Wed Nov 18 16:13:04 -0800 2009
commit  eb49089226ce33ded148221b4b986af87db2323a
tree    a1523f763c8ed2ed45e37efa9f4045db988b6452
parent  3259c928a26b83e218f41d8ccf29f8a07f0ca938
lda-ruby /
name age
history
message
file .gitignore Fri Jul 24 10:30:07 -0700 2009 update gitignore [ealdent]
file CHANGELOG Fri Aug 14 13:13:16 -0700 2009 add to changelog [ealdent]
file README Thu Sep 24 14:19:29 -0700 2009 - added a sample file that shows how you would ... [ealdent]
file README.markdown Wed Nov 18 16:13:04 -0800 2009 remove mention of defunct mailing list [ealdent]
file Rakefile Tue Jul 21 08:12:55 -0700 2009 - debug gem [ealdent]
file VERSION.yml Mon Aug 10 23:08:30 -0700 2009 - add reverse lookup for vocabulary - change to... [ealdent]
directory ext/ Fri Jul 24 07:48:25 -0700 2009 - add spec coverage for corpora, lda, and vocab... [ealdent]
file lda-ruby.gemspec Mon Aug 10 23:08:30 -0700 2009 - add reverse lookup for vocabulary - change to... [ealdent]
directory lib/ Fri Aug 14 18:11:26 -0700 2009 - when building a text document from file, keep... [ealdent]
file license.txt Fri Nov 14 13:33:23 -0800 2008 importing code [ealdent]
directory test/ Tue Oct 13 11:43:55 -0700 2009 - fix specs so that "rake test" works [ealdent]
README.markdown

Latent Dirichlet Allocation – Ruby Wrapper

What is LDA-Ruby?

This wrapper is based on C-code by David M. Blei. In a nutshell, it can be used to automatically cluster documents into topics. The number of topics are chosen beforehand and the topics found are usually fairly intuitive. Details of the implementation can be found in the paper by Blei, Ng, and Jordan.

The original C code relied on files for the input and output. We felt it was necessary to depart from that model and use Ruby objects for these steps instead. The only file necessary will be the data file (in a format similar to that used by SVMlight). Optionally you may need a vocabulary file to be able to extract the words belonging to topics.

Example usage:

require 'lda-ruby'
corpus = Lda::DataCorpus.new("data/data_file.dat")
lda = Lda::Lda.new(corpus)    # create an Lda object for training
lda.em("random")              # run EM algorithm using random starting points
lda.load_vocabulary("data/vocab.txt")
lda.print_topics(20)          # print the topic 20 words per topic

If you have general questions about Latent Dirichlet Allocation, I urge you to use the topic models mailing list, since the people who monitor that are very knowledgeable. If you encounter bugs specific to lda-ruby, please post an issue on the Github project.

Resources

  • Blog post about LDA-Ruby
  • David Blei's lda-c code
  • Wikipedia article on LDA
  • Sample AP data

References

Blei, David M., Ng, Andrew Y., and Jordan, Michael I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. 3 (Mar. 2003), 993-1022 [pdf].

Blog | Support | Training | Contact | API | Status | Twitter | Help | Security
© 2010 GitHub Inc. All rights reserved. | Terms of Service | Privacy Policy
Powered by the Dedicated Servers and
Cloud Computing of Rackspace Hosting®
Dedicated Server