Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Updated README and API

  • Loading branch information...
commit b3309bedc70946cd99c8ebb280f18bed49f2041e 1 parent 05a8d90
@chriso authored
Showing with 17 additions and 18 deletions.
  1. +11 −12 README.md
  2. +6 −6 docs/api.md
View
23 README.md
@@ -2,32 +2,31 @@
node.io is a data scraping and processing framework for [node.js](http://nodejs.org/).
-A node.io job typically consists of 1) taking some input, 2) using or transforming it, and 3) outputting something.
-
node.io can simplify the process of:
- Filtering / sanitizing a list
- MapReduce
-- Loading a list of URLs and scraping some data from each
+- Scraping data from the web using with familiar CSS selectors / traversal methods
+- Scraping web data through a proxy
- Parsing log files
- Transforming data from one format to another, e.g. from CSV to a database
-- Recursively load all files in a directory and execute a command on each
-- etc. etc.
+- Recursively load all files in a directory and its subdirs and execute a command on each
+- ETC
## Why node.io?
- Create modular and extensible jobs for scraping and processing data
-- Written in Node.js and Javascript - jobs are concise, asynchronous and FAST
+- Jobs are written in Javascript or Coffeescript and run in Node.js - jobs are concise, asynchronous and FAST
- Speed up execution by distributing work among child processes and other servers (soon)
- Easily handle a variety of input / output situations
* Reading / writing lines to and from files
- * Reading all files in a directory (and optionally recursing)
+ * Traversing files in a directory
* Reading / writing rows to and from a database
- * STDIN / STDOUT
- * Piping between other node.io jobs
+ * STDIN / STDOUT / Custom streams
+ * Piping between other node.io jobs
* Any combination of the above, or your own IO
- Includes a robust framework for scraping and selecting web data
-- Support for a variety of proxies when making requests
+- Support for a variety of proxies when scraping web data
- Includes a data validation and sanitization framework
- Provides support for retries, timeouts, dynamically adding input, etc.
@@ -43,7 +42,7 @@ For usage details, run
## Documentation
-To get started, see the [documentation](https://github.com/chriso/node.io/blob/master/docs/README.md), [examples](https://github.com/chriso/node.io/tree/master/examples/), or [API](https://github.com/chriso/node.io/blob/master/docs/api.md).
+To get started, see the [documentation](https://github.com/chriso/node.io/blob/master/docs/README.md), [API](https://github.com/chriso/node.io/blob/master/docs/api.md), and [examples](https://github.com/chriso/node.io/tree/master/examples/).
Better documentation will be available once I have time to write it.
@@ -53,7 +52,7 @@ Better documentation will be available once I have time to write it.
- Automatically handle HTTP codes, e.g. redirect on 3xx or call fail() on 4xx/5xx
- Nested requests inherit referrer / cookies if to the same domain
- Add more DOM selector / traversal methods
-- Test proxy callbacks
+- Test proxy callbacks and write proxy documentation
- Add distributed processing
- Installation without NPM (install.sh)
- Refactoring
View
12 docs/api.md
@@ -217,7 +217,7 @@ To read or write to a file inside a job, use the following methods. Both methods
To make a request, use the following methods.
-**this.get(url, [headers], callback, [parse])** _headers and parse are optional_
+**this.get(url, _[headers]_, callback, _[parse]_)** _headers and parse are optional_
Makes a GET request to the URL and returns the result - callback takes `err, data, headers`
@@ -229,17 +229,17 @@ Example
console.log(data);
});
-**this.getHtml(url, [headers], callback, [parse])
+**this.getHtml(url, _[headers]_, callback, _[parse]_)**
The same as above, except callback takes `err, $, data, headers` where `$` is the dom selector / traversal object (see DOM selection / traversal below)
-**this.post(url, body, [headers], callback, [parse])**
+**this.post(url, body, _[headers]_, callback, _[parse]_)**
-***this.postHtml(url, body, [headers], callback, [parse])**
+***this.postHtml(url, body, _[headers]_, callback, _[parse]_)**
Makes a POST request. If body is an object, it is encoded using the builtin querystring module. postHtml returns the `$` object.
-**this.doRequest(method, url, body, [headers], callback, [parse])**
+**this.doRequest(method, url, body, _[headers]_, callback, _[parse]_)**
Makes general a request with the specified options.
@@ -249,7 +249,7 @@ _Documentation coming soon. For now, see [./lib/node.io/request.js](https://gith
## DOM selection / traversal
-`getHtml` and `postHtml` return a special object `$` that wraps [node-soupselect](https://github.com/harryf/node-soupselect) and provides methods to aid in traversing the DOM.
+`getHtml` and `postHtml` return a special object `$` that wraps [node-soupselect](https://github.com/harryf/node-soupselect) and provides methods to aid in traversing the returned DOM.
`$(selector)` returns an element or collection of elements.
Please sign in to comment.
Something went wrong with that request. Please try again.