Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Fetching contributors…
Cannot retrieve contributors at this time
69 lines (48 sloc) 2.61 KB

To install, use npm:

$ npm install

For usage details, run

$ --help    

What is is a framework for scraping and processing data. A job typically consists of a) taking some input, b) using or transforming it, and c) outputting something. can simplify the process of:

  • Filtering / sanitizing a list
  • MapReduce
  • Loading a list of URLs and scraping and saving some data from each
  • Parsing log files
  • Transforming data from one format to another, e.g. from CSV to a database
  • Recursively load all files in a directory and it's subdirectories and execute a command on each file


  • Create modular and extensible jobs for scraping and processing data
  • Written in Node.js and Javascript - jobs are concise, asynchronous and FAST
  • Speed up execution by distributing work among child processes and other servers (soon)
  • Easily handle a variety of input / output situations
    • Reading / writing lines to and from files
    • Reading all files in a directory (and optionally recursing)
    • Reading / writing rows to and from a database
    • Piping between other jobs
    • Any combination of the above, or completely custom IO
  • Includes a robust framework for scraping and selecting web data
  • Support for a variety of proxies when making requests
  • Includes a data validation and sanitization framework
  • Provides support for retries, timeouts, dynamically adding input, etc.


Initial documentation is available here.

Better documentation will be available once I have time to write it.. See for updates.


See ./examples


  • Automatically handle HTTP codes, e.g. redirect on 3xx or call fail() on 4xx/5xx
  • Nested requests inherit referrer / cookies if to the same domain
  • Add more DOM selector / traversal methods
  • Test proxy callbacks
  • Add distributed processing
  • Installation without NPM (
  • Refactoring

Credits uses the following awesome libraries:


MIT License

Jump to Line
Something went wrong with that request. Please try again.