Skip to content
This repository was archived by the owner on May 10, 2018. It is now read-only.

aeden/feed-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feed Processor is a multi-stage feed processor built with JRuby, MRI, beanstalk and MongoDB.

There are two steps to the feed processing:

  1. Step 1: Download feed content using non-blocking IO and insert the raw data into MongoDB. A message is sent via Beanstalk notifying the parser stage that the feed data is ready for a specific feed.
  2. Step 2: A multi-processor feed parser pulls the raw data from MongoDB, parses it and inserts the resulting parsed record into MongoDB.

Dependencies

  • MongoDB
  • beanstalkd
  • JRuby
  • MRI

Gems (for JRuby):

  • jruby-http-reactor
  • threadify
  • beanstalk-client
  • mongo_mapper

Gems (for MRI):

  • beanstalk-client
  • mongo_mapper
  • feedzirra

Executing

Each of the following commands should be executed in a separate console or executed to run as a background process.

Start MongoDB and Beanstalk:

mongod beanstalkd

Run the fetch processor:

jruby -rubygems -Ilib bin/fetch urls.txt

Run the parse processor:

ruby -rubygems -Ilib bin/parse

About

A multi-step feed parser using JRuby, MRI, beanstalk and MongoDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages