To install node.io, use npm:
$ npm install node.io
For usage details, run
$ node.io --help
node.io is a framework for scraping and processing data. A node.io job typically consists of a) taking some input, b) using or transforming it, and c) outputting something.
node.io can simplify the process of:
- Filtering / sanitizing a list
- Loading a list of URLs and scraping and saving some data from each
- Parsing log files
- Transforming data from one format to another, e.g. from CSV to a database
- Recursively load all files in a directory and it's subdirectories and execute a command on each file
- Create modular and extensible jobs for scraping and processing data
- Speed up execution by distributing work among child processes and other servers (soon)
- Easily handle a variety of input / output situations
- Reading / writing lines to and from files
- Reading all files in a directory (and optionally recursing)
- Reading / writing rows to and from a database
- STDIN / STDOUT
- Piping between other node.io jobs
- Any combination of the above, or completely custom IO
- Includes a robust framework for scraping and selecting web data
- Support for a variety of proxies when making requests
- Includes a data validation and sanitization framework
- Provides support for retries, timeouts, dynamically adding input, etc.
Initial documentation is available here.
Better documentation will be available once I have time to write it.. See http://node.io/ for updates.
- Automatically handle HTTP codes, e.g. redirect on 3xx or call fail() on 4xx/5xx
- Nested requests inherit referrer / cookies if to the same domain
- Add more DOM selector / traversal methods
- Test proxy callbacks
- Add distributed processing
- Installation without NPM (install.sh)
node.io uses the following awesome libraries: