A multi-step feed parser using JRuby, MRI, beanstalk and MongoDB
Ruby
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
lib
.gitignore
README.textile
Rakefile
VERSION
feed-processor.gemspec

README.textile

Feed Processor is a multi-stage feed processor built with JRuby, MRI, beanstalk and MongoDB.

There are two steps to the feed processing:

  1. Step 1: Download feed content using non-blocking IO and insert the raw data into MongoDB. A message is sent via Beanstalk notifying the parser stage that the feed data is ready for a specific feed.
  2. Step 2: A multi-processor feed parser pulls the raw data from MongoDB, parses it and inserts the resulting parsed record into MongoDB.

Dependencies

  • MongoDB
  • beanstalkd
  • JRuby
  • MRI

Gems (for JRuby):

  • jruby-http-reactor
  • threadify
  • beanstalk-client
  • mongo_mapper

Gems (for MRI):

  • beanstalk-client
  • mongo_mapper
  • feedzirra

Executing

Each of the following commands should be executed in a separate console or executed to run as a background process.

Start MongoDB and Beanstalk:

mongod beanstalkd

Run the fetch processor:

jruby -rubygems -Ilib bin/fetch urls.txt

Run the parse processor:

ruby -rubygems -Ilib bin/parse