Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
tree: 22f85350ff
Fetching contributors…

Cannot retrieve contributors at this time

58 lines (40 sloc) 1.87 kB

node.io

A distributed data scraping and processing engine for Node.JS

To install node.io, use npm:

$ npm install node.io

For usage details, run

$ node.io --help    

Why node.io?

  • Create modular and extensible jobs for scraping and processing data
  • Seamlessly distribute work among child processes and other servers (soon)
  • Written in Node.JS = FAST
  • Handles a variety of input / output situations
    • Reading / writing lines to and from files
    • Reading all files in a directory (and recursing if specified)
    • To / from a database
    • STDIN / STDOUT
    • Piping between node.io jobs
    • Custom IO / any combination of the above
  • Includes a robust framework for scraping and selecting web data
  • Support for a variety of proxies when making requests
  • Includes a data validation and sanitization framework
  • Provides support for retries, timeouts, dynamically adding input, etc.
  • Create a MapReduce cluster

Examples

See ./examples

Documentation

Coming soon. See http://node.io/ for updates

Roadmap

  • Automatically handle HTTP codes, e.g. redirect on 3** or call fail() on 4/5
  • Nested requests inherit referrer / cookies if to the same domain
  • Add more DOM selector / traversal methods
  • Test proxy callbacks
  • Add distributed processing
  • Refactor

Credits

node.io uses the following awesome libraries:

License

MIT License

Jump to Line
Something went wrong with that request. Please try again.