Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100755 107 lines (74 sloc) 4.366 kb
32e830e @chriso Updated README
authored
1 **[node.io](http://node.io/) is a distributed data scraping and processing framework**
5fab6c3 @chriso Updated README
authored
2
32e830e @chriso Updated README
authored
3 - Jobs are written in Javascript or [Coffeescript](http://jashkenas.github.com/coffee-script/) and run in [Node.JS](http://nodejs.org/) - jobs are concise, asynchronous and _FAST_
60b3894 @chriso Updated README
authored
4 - Includes a robust framework for scraping, selecting and traversing data from the web
9eb941a @chriso Updated README
authored
5 - Includes a data validation and sanitization framework
5528529 @chriso Updated README
authored
6 - Easily handle a variety of input / output - files, databases, streams, stdin/stdout, etc.
421e595 @chriso Updated README
authored
7 - Speed up execution by distributing work across multiple processes and (soon) other servers
60b3894 @chriso Updated README
authored
8 - Manage & run jobs through a web interface
bc05fa1 @chriso Updated README
authored
9
d3274e9 @chriso Updated README
authored
10 Follow [@nodeio](http://twitter.com/nodeio) or visit [http://node.io/](http://node.io/) for updates.
7552dac @chriso Updated README
authored
11
be339bf @chriso Added high-level scrape example
authored
12 ## Scrape example
13
14 Let's pull the front page storied from reddit using the high-level scrape() method.
15
16 require('node.io').scrape(function() {
17 var self = this;
18 this.getHtml('http://www.reddit.com/', function(err, $) {
19 if (err) {
20 self.exit(err);
21 } else {
22 $('a.title').each(function(title) {
23 console.log(title.text);
24 });
25 self.skip();
26 }
27 });
28 });
29
d3274e9 @chriso Updated README
authored
30 If you want to incorporate timeouts, retries, batch-type jobs, etc. head over the [the wiki](https://github.com/chriso/node.io/wiki) for documentation.
f5340fe @chriso Updated README
authored
31
be339bf @chriso Added high-level scrape example
authored
32 ## Built-in module example
710ac50 @chriso Updated README and fixed pagerank bug
authored
33
34 Find the pagerank of a domain
35
36 $ echo "mastercard.com" | node.io pagerank
37 => mastercard.com,7
38
39 ..or a list of domains
40
41 $ node.io pagerank < domains.txt
42
c7472c3 @chriso Updated README
authored
43 ## Installation
60b3894 @chriso Updated README
authored
44
32e830e @chriso Updated README
authored
45 To install node.io, use [npm](http://github.com/isaacs/npm)
7552dac @chriso Updated README
authored
46
47 $ npm install node.io
48
d3274e9 @chriso Updated README
authored
49 If you do not have npm or Node.JS, [see this page](https://github.com/chriso/node.io/wiki/Installation).
60b3894 @chriso Updated README
authored
50
b31d561 @chriso Updated README
authored
51 ## Getting started
bc05fa1 @chriso Updated README
authored
52
710ac50 @chriso Updated README and fixed pagerank bug
authored
53 If you want to create your own scraping / processing jobs, head over to [the wiki](https://github.com/chriso/node.io/wiki) for documentation, examples and the API.
e7281e2 @chriso Updated README
authored
54
7842c15 @chriso Updated roadmap
authored
55 node.io comes bundled with several modules (including the pagerank example from above). See [this page](https://github.com/chriso/node.io/blob/master/builtin/README.md) for usage details.
22f8535 @chriso Added Roadmap
authored
56
7842c15 @chriso Updated roadmap
authored
57 ## Roadmap
32e830e @chriso Updated README
authored
58
c7472c3 @chriso Updated README
authored
59 - Finish writing up the wiki
421e595 @chriso Updated README
authored
60 - Fix up the [http://node.io/](http://node.io/) page
7842c15 @chriso Updated roadmap
authored
61 - More tests & improve coverage
e7281e2 @chriso Updated README
authored
62 - Add more DOM [selector](http://api.jquery.com/category/selectors/) / [traversal](http://api.jquery.com/category/traversing/) methods
22f8535 @chriso Added Roadmap
authored
63 - Add distributed processing
7842c15 @chriso Updated roadmap
authored
64 - Cookie jar for persistent cookies
65 - Generic proxy manager
ddaff54 @chriso Updated roadmap
authored
66 - Speed improvements
22f8535 @chriso Added Roadmap
authored
67
7842c15 @chriso Updated roadmap
authored
68 [history.md](https://github.com/chriso/node.io/blob/master/HISTORY.md) lists recent changes.
69
32e830e @chriso Updated README
authored
70 If you want to contribute, please [fork/pull](https://github.com/chriso/node.io/fork).
8d30aba @chriso Updated README
authored
71
72 If you find a bug, please report the issue [here](https://github.com/chriso/node.io/issues).
73
d9da9dc @chriso Updated README
authored
74 ## Credits
75
5aa225e @chriso Updated README
authored
76 node.io wouldn't be possible without
d9da9dc @chriso Updated README
authored
77
459525b @chriso Updated README
authored
78 - [ry's](https://github.com/ry) [node.js](http://nodejs.org/)
1f8d7e7 @chriso Updated README
authored
79 - [tautologistics'](https://github.com/tautologistics) [node-htmlparser](https://github.com/tautologistics/node-htmlparser)
80 - [harryf's](https://github.com/harryf) [soupselect](https://github.com/harryf/node-soupselect)
81 - [kriszyp's](https://github.com/kriszyp) [multi-node](https://github.com/kriszyp/multi-node)
d9da9dc @chriso Updated README
authored
82
83 ## License
84
f11a5bb @chriso Updated README
authored
85 (MIT License)
86
87 Copyright (c) 2010 Chris O'Hara <cohara87@gmail.com>
88
89 Permission is hereby granted, free of charge, to any person obtaining
90 a copy of this software and associated documentation files (the
91 "Software"), to deal in the Software without restriction, including
92 without limitation the rights to use, copy, modify, merge, publish,
93 distribute, sublicense, and/or sell copies of the Software, and to
94 permit persons to whom the Software is furnished to do so, subject to
95 the following conditions:
96
97 The above copyright notice and this permission notice shall be
98 included in all copies or substantial portions of the Software.
99
100 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
101 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
102 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
103 NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
104 LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
105 OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
106 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Something went wrong with that request. Please try again.