Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100755 115 lines (77 sloc) 4.546 kb
32e830e @chriso Updated README
authored
1 **[node.io](http://node.io/) is a distributed data scraping and processing framework**
5fab6c3 @chriso Updated README
authored
2
32e830e @chriso Updated README
authored
3 - Jobs are written in Javascript or [Coffeescript](http://jashkenas.github.com/coffee-script/) and run in [Node.JS](http://nodejs.org/) - jobs are concise, asynchronous and _FAST_
4279e53 @chriso Updated README
authored
4 - Includes a robust framework for scraping, selecting and traversing data from the web (choose between jQuery or SoupSelect)
9eb941a @chriso Updated README
authored
5 - Includes a data validation and sanitization framework
5528529 @chriso Updated README
authored
6 - Easily handle a variety of input / output - files, databases, streams, stdin/stdout, etc.
421e595 @chriso Updated README
authored
7 - Speed up execution by distributing work across multiple processes and (soon) other servers
60b3894 @chriso Updated README
authored
8 - Manage & run jobs through a web interface
bc05fa1 @chriso Updated README
authored
9
d3274e9 @chriso Updated README
authored
10 Follow [@nodeio](http://twitter.com/nodeio) or visit [http://node.io/](http://node.io/) for updates.
7552dac @chriso Updated README
authored
11
be339bf @chriso Added high-level scrape example
authored
12 ## Scrape example
13
14 Let's pull the front page storied from reddit using the high-level scrape() method.
15
16 require('node.io').scrape(function() {
17 var self = this;
18 this.getHtml('http://www.reddit.com/', function(err, $) {
19 if (err) {
20 self.exit(err);
21 } else {
22 $('a.title').each(function(title) {
23 console.log(title.text);
24 });
25 self.skip();
26 }
27 });
28 });
29
d3274e9 @chriso Updated README
authored
30 If you want to incorporate timeouts, retries, batch-type jobs, etc. head over the [the wiki](https://github.com/chriso/node.io/wiki) for documentation.
f5340fe @chriso Updated README
authored
31
4279e53 @chriso Updated README
authored
32 ## Built-in modules
33
34 node.io comes with some [built-in scraping modules](https://github.com/chriso/node.io/tree/master/builtin).
710ac50 @chriso Updated README and fixed pagerank bug
authored
35
36 Find the pagerank of a domain
37
38 $ echo "mastercard.com" | node.io pagerank
39 => mastercard.com,7
40
4279e53 @chriso Updated README
authored
41 ..or a list of URLs
42
43 $ node.io pagerank < urls.txt
44
45 Quickly check the http code for each URL in a list
46
47 $ node.io statuscode < urls.txt
48
49 Grab the front page stories from [reddit](http://www.reddit.com)
50
51 $ node.io query "http://www.reddit.com/" a.title
710ac50 @chriso Updated README and fixed pagerank bug
authored
52
c7472c3 @chriso Updated README
authored
53 ## Installation
60b3894 @chriso Updated README
authored
54
32e830e @chriso Updated README
authored
55 To install node.io, use [npm](http://github.com/isaacs/npm)
7552dac @chriso Updated README
authored
56
57 $ npm install node.io
58
d3274e9 @chriso Updated README
authored
59 If you do not have npm or Node.JS, [see this page](https://github.com/chriso/node.io/wiki/Installation).
60b3894 @chriso Updated README
authored
60
b31d561 @chriso Updated README
authored
61 ## Getting started
bc05fa1 @chriso Updated README
authored
62
710ac50 @chriso Updated README and fixed pagerank bug
authored
63 If you want to create your own scraping / processing jobs, head over to [the wiki](https://github.com/chriso/node.io/wiki) for documentation, examples and the API.
e7281e2 @chriso Updated README
authored
64
7842c15 @chriso Updated roadmap
authored
65 node.io comes bundled with several modules (including the pagerank example from above). See [this page](https://github.com/chriso/node.io/blob/master/builtin/README.md) for usage details.
22f8535 @chriso Added Roadmap
authored
66
7842c15 @chriso Updated roadmap
authored
67 ## Roadmap
32e830e @chriso Updated README
authored
68
c7472c3 @chriso Updated README
authored
69 - Finish writing up the wiki
7842c15 @chriso Updated roadmap
authored
70 - More tests & improve coverage
22f8535 @chriso Added Roadmap
authored
71 - Add distributed processing
4279e53 @chriso Updated README
authored
72 - Fix up the [http://node.io/](http://node.io/) page
7842c15 @chriso Updated roadmap
authored
73 - Cookie jar for persistent cookies
ddaff54 @chriso Updated roadmap
authored
74 - Speed improvements
22f8535 @chriso Added Roadmap
authored
75
7842c15 @chriso Updated roadmap
authored
76 [history.md](https://github.com/chriso/node.io/blob/master/HISTORY.md) lists recent changes.
77
32e830e @chriso Updated README
authored
78 If you want to contribute, please [fork/pull](https://github.com/chriso/node.io/fork).
8d30aba @chriso Updated README
authored
79
80 If you find a bug, please report the issue [here](https://github.com/chriso/node.io/issues).
81
d9da9dc @chriso Updated README
authored
82 ## Credits
83
5aa225e @chriso Updated README
authored
84 node.io wouldn't be possible without
d9da9dc @chriso Updated README
authored
85
459525b @chriso Updated README
authored
86 - [ry's](https://github.com/ry) [node.js](http://nodejs.org/)
1f8d7e7 @chriso Updated README
authored
87 - [tautologistics'](https://github.com/tautologistics) [node-htmlparser](https://github.com/tautologistics/node-htmlparser)
88 - [harryf's](https://github.com/harryf) [soupselect](https://github.com/harryf/node-soupselect)
89 - [kriszyp's](https://github.com/kriszyp) [multi-node](https://github.com/kriszyp/multi-node)
d9da9dc @chriso Updated README
authored
90
91 ## License
92
f11a5bb @chriso Updated README
authored
93 (MIT License)
94
95 Copyright (c) 2010 Chris O'Hara <cohara87@gmail.com>
96
97 Permission is hereby granted, free of charge, to any person obtaining
98 a copy of this software and associated documentation files (the
99 "Software"), to deal in the Software without restriction, including
100 without limitation the rights to use, copy, modify, merge, publish,
101 distribute, sublicense, and/or sell copies of the Software, and to
102 permit persons to whom the Software is furnished to do so, subject to
103 the following conditions:
104
105 The above copyright notice and this permission notice shall be
106 included in all copies or substantial portions of the Software.
107
108 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
109 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
110 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
111 NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
112 LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
113 OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
114 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Something went wrong with that request. Please try again.