Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Tree: 459525b5b4
Fetching contributors…

Cannot retrieve contributors at this time

70 lines (50 sloc) 2.834 kB

To install node.io, use npm:

$ npm install node.io

For usage details, run

$ node.io --help    

To get started, see the documentation, examples, or API.

What is node.io?

node.io is a framework for scraping and processing data. A node.io job typically consists of a) taking some input, b) using or transforming it, and c) outputting something.

node.io can simplify the process of:

  • Filtering / sanitizing a list
  • MapReduce
  • Loading a list of URLs and scraping some data from each
  • Parsing log files
  • Transforming data from one format to another, e.g. from CSV to a database
  • Recursively load all files in a directory and execute a command on each

Why node.io?

  • Create modular and extensible jobs for scraping and processing data
  • Written in Node.js and Javascript - jobs are concise, asynchronous and FAST
  • Speed up execution by distributing work among child processes and other servers (soon)
  • Easily handle a variety of input / output situations
    • Reading / writing lines to and from files
    • Reading all files in a directory (and optionally recursing)
    • Reading / writing rows to and from a database
    • STDIN / STDOUT
    • Piping between other node.io jobs
    • Any combination of the above, or your own IO
  • Includes a robust framework for scraping and selecting web data
  • Support for a variety of proxies when making requests
  • Includes a data validation and sanitization framework
  • Provides support for retries, timeouts, dynamically adding input, etc.

Documentation

Initial documentation is available here.

Better documentation will be available once I have time to write it.

Roadmap

  • Fix up the http://node.io/ site
  • Automatically handle HTTP codes, e.g. redirect on 3xx or call fail() on 4xx/5xx
  • Nested requests inherit referrer / cookies if to the same domain
  • Add more DOM selector / traversal methods
  • Test proxy callbacks
  • Add distributed processing
  • Installation without NPM (install.sh)
  • Refactoring
  • More test coverage

Credits

node.io uses the following libraries

License

MIT License

Jump to Line
Something went wrong with that request. Please try again.