Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Tree: c1dc9da67c
Fetching contributors…

Cannot retrieve contributors at this time

59 lines (41 sloc) 1.913 kB

A distributed data scraping and processing engine for Node.JS

To install, use npm:

$ npm install

For usage details, run

$ --help    


  • Create modular and extensible jobs for scraping and processing data
  • Seamlessly distribute work among child processes and other servers (soon)
  • Written in Node.JS = FAST
  • Handles a variety of input / output situations
    • Reading / writing lines to and from files
    • Reading all files in a directory (and recursing if specified)
    • To / from a database
    • Piping between jobs
    • Custom IO / any combination of the above
  • Includes a robust framework for scraping and selecting web data
  • Support for a variety of proxies when making requests
  • Includes a data validation and sanitization framework
  • Provides support for retries, timeouts, dynamically adding input, etc.
  • Create a MapReduce cluster


See ./examples


Coming soon. See for updates


  • Automatically handle HTTP codes, e.g. redirect on 3xx or call fail() on 4xx/5xx
  • Nested requests inherit referrer / cookies if to the same domain
  • Add more DOM selector / traversal methods
  • Test proxy callbacks
  • Add distributed processing
  • Installation without NPM (
  • Refactoring

Credits uses the following awesome libraries:


MIT License

Jump to Line
Something went wrong with that request. Please try again.