Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100755 111 lines (73 sloc) 4.408 kb
32e830e @chriso Updated README
authored
1 **[node.io](http://node.io/) is a distributed data scraping and processing framework**
5fab6c3 @chriso Updated README
authored
2
32e830e @chriso Updated README
authored
3 - Jobs are written in Javascript or [Coffeescript](http://jashkenas.github.com/coffee-script/) and run in [Node.JS](http://nodejs.org/) - jobs are concise, asynchronous and _FAST_
4279e53 @chriso Updated README
authored
4 - Includes a robust framework for scraping, selecting and traversing data from the web (choose between jQuery or SoupSelect)
9eb941a @chriso Updated README
authored
5 - Includes a data validation and sanitization framework
5528529 @chriso Updated README
authored
6 - Easily handle a variety of input / output - files, databases, streams, stdin/stdout, etc.
421e595 @chriso Updated README
authored
7 - Speed up execution by distributing work across multiple processes and (soon) other servers
60b3894 @chriso Updated README
authored
8 - Manage & run jobs through a web interface
bc05fa1 @chriso Updated README
authored
9
d3274e9 @chriso Updated README
authored
10 Follow [@nodeio](http://twitter.com/nodeio) or visit [http://node.io/](http://node.io/) for updates.
7552dac @chriso Updated README
authored
11
be339bf @chriso Added high-level scrape example
authored
12 ## Scrape example
13
4139a99 @chriso Updated the README
authored
14 Let's pull the front page stories from reddit
be339bf @chriso Added high-level scrape example
authored
15
16 require('node.io').scrape(function() {
17 this.getHtml('http://www.reddit.com/', function(err, $) {
e003486 @chriso Updated the first example
authored
18 var stories = [];
19 $('a.title').each(function(title) {
20 stories.push(title.text);
21 });
22 this.emit(stories);
be339bf @chriso Added high-level scrape example
authored
23 });
24 });
e003486 @chriso Updated the first example
authored
25
d3274e9 @chriso Updated README
authored
26 If you want to incorporate timeouts, retries, batch-type jobs, etc. head over the [the wiki](https://github.com/chriso/node.io/wiki) for documentation.
f5340fe @chriso Updated README
authored
27
4279e53 @chriso Updated README
authored
28 ## Built-in modules
29
30 node.io comes with some [built-in scraping modules](https://github.com/chriso/node.io/tree/master/builtin).
710ac50 @chriso Updated README and fixed pagerank bug
authored
31
32 Find the pagerank of a domain
33
34 $ echo "mastercard.com" | node.io pagerank
35 => mastercard.com,7
36
4279e53 @chriso Updated README
authored
37 ..or a list of URLs
38
39 $ node.io pagerank < urls.txt
40
41 Quickly check the http code for each URL in a list
42
43 $ node.io statuscode < urls.txt
44
45 Grab the front page stories from [reddit](http://www.reddit.com)
46
47 $ node.io query "http://www.reddit.com/" a.title
710ac50 @chriso Updated README and fixed pagerank bug
authored
48
c7472c3 @chriso Updated README
authored
49 ## Installation
60b3894 @chriso Updated README
authored
50
32e830e @chriso Updated README
authored
51 To install node.io, use [npm](http://github.com/isaacs/npm)
7552dac @chriso Updated README
authored
52
619ea73 @chriso Updated for latest version of npm
authored
53 $ npm install -g node.io
7552dac @chriso Updated README
authored
54
d3274e9 @chriso Updated README
authored
55 If you do not have npm or Node.JS, [see this page](https://github.com/chriso/node.io/wiki/Installation).
4139a99 @chriso Updated the README
authored
56
b31d561 @chriso Updated README
authored
57 ## Getting started
bc05fa1 @chriso Updated README
authored
58
710ac50 @chriso Updated README and fixed pagerank bug
authored
59 If you want to create your own scraping / processing jobs, head over to [the wiki](https://github.com/chriso/node.io/wiki) for documentation, examples and the API.
e7281e2 @chriso Updated README
authored
60
7842c15 @chriso Updated roadmap
authored
61 node.io comes bundled with several modules (including the pagerank example from above). See [this page](https://github.com/chriso/node.io/blob/master/builtin/README.md) for usage details.
22f8535 @chriso Added Roadmap
authored
62
7842c15 @chriso Updated roadmap
authored
63 ## Roadmap
32e830e @chriso Updated README
authored
64
c7472c3 @chriso Updated README
authored
65 - Finish writing up the wiki
7842c15 @chriso Updated roadmap
authored
66 - More tests & improve coverage
22f8535 @chriso Added Roadmap
authored
67 - Add distributed processing
4279e53 @chriso Updated README
authored
68 - Fix up the [http://node.io/](http://node.io/) page
7842c15 @chriso Updated roadmap
authored
69 - Cookie jar for persistent cookies
ddaff54 @chriso Updated roadmap
authored
70 - Speed improvements
22f8535 @chriso Added Roadmap
authored
71
7842c15 @chriso Updated roadmap
authored
72 [history.md](https://github.com/chriso/node.io/blob/master/HISTORY.md) lists recent changes.
73
32e830e @chriso Updated README
authored
74 If you want to contribute, please [fork/pull](https://github.com/chriso/node.io/fork).
8d30aba @chriso Updated README
authored
75
76 If you find a bug, please report the issue [here](https://github.com/chriso/node.io/issues).
77
d9da9dc @chriso Updated README
authored
78 ## Credits
79
5aa225e @chriso Updated README
authored
80 node.io wouldn't be possible without
d9da9dc @chriso Updated README
authored
81
459525b @chriso Updated README
authored
82 - [ry's](https://github.com/ry) [node.js](http://nodejs.org/)
1f8d7e7 @chriso Updated README
authored
83 - [tautologistics'](https://github.com/tautologistics) [node-htmlparser](https://github.com/tautologistics/node-htmlparser)
84 - [harryf's](https://github.com/harryf) [soupselect](https://github.com/harryf/node-soupselect)
85 - [kriszyp's](https://github.com/kriszyp) [multi-node](https://github.com/kriszyp/multi-node)
d9da9dc @chriso Updated README
authored
86
87 ## License
88
f11a5bb @chriso Updated README
authored
89 (MIT License)
90
91 Copyright (c) 2010 Chris O'Hara <cohara87@gmail.com>
92
93 Permission is hereby granted, free of charge, to any person obtaining
94 a copy of this software and associated documentation files (the
95 "Software"), to deal in the Software without restriction, including
96 without limitation the rights to use, copy, modify, merge, publish,
97 distribute, sublicense, and/or sell copies of the Software, and to
98 permit persons to whom the Software is furnished to do so, subject to
99 the following conditions:
100
101 The above copyright notice and this permission notice shall be
102 included in all copies or substantial portions of the Software.
103
104 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
105 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
106 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
107 NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
108 LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
109 OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
110 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Something went wrong with that request. Please try again.