Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 99 lines (71 sloc) 4.479 kb
2a78533 @chriso Fixed IO cases where job is to run once or forever
authored
1 # What is [node.io](http://node.io/)?
8f797fb @chriso Updated README and documentation
authored
2
e7281e2 @chriso Updated README
authored
3 node.io is a data scraping and processing framework for [Node.js](http://nodejs.org/) inspired by [Google's MapReduce](http://labs.google.com/papers/mapreduce.html).
4
bcab815 @chriso Updated README
authored
5 node.io can streamline the process of:
77811bb @chriso Updated README
authored
6
bcab815 @chriso Updated README
authored
7 - Parsing / filtering / sanitizing large amounts of data
8 - Scraping data from the web using familiar CSS selectors and traversal methods
e7281e2 @chriso Updated README
authored
9 - Map Reduce
bcab815 @chriso Updated README
authored
10 - Transforming data from one format to another, e.g. from CSV => a database
e7281e2 @chriso Updated README
authored
11 - Distributing work across multiple processes, and multiple servers (soon)
97c9bc3 @chriso Updated README
authored
12 - Recursively traversing a directory and using each file as input
5fab6c3 @chriso Updated README
authored
13
e535de9 @chriso Updated README
authored
14 ## Why node.io?
58de877 @chriso Updated README
authored
15
a3e56c1 @chriso Updated README and made some JSLint fixes
authored
16 - Create modular and extensible jobs for scraping and processing data
bcab815 @chriso Updated README
authored
17 - Jobs are written in Javascript or Coffeescript and run in Node.js - jobs are concise, asynchronous and _FAST_
18 - Seamlessly speed up execution by distributing work among child processes and other servers (soon)
97c9bc3 @chriso Updated README
authored
19 - Easily handle a variety of input / output situations - node.io does the heavy lifting
bc05fa1 @chriso Updated README
authored
20 * Reading / writing lines to and from files
b3309be @chriso Updated README and API
authored
21 * Traversing files in a directory
8f797fb @chriso Updated README and documentation
authored
22 * Reading / writing rows to and from a database
b3309be @chriso Updated README and API
authored
23 * STDIN / STDOUT / Custom streams
e7281e2 @chriso Updated README
authored
24 * Piping data between multiple node.io jobs
24ee8df @chriso Updated documentation
authored
25 * Any combination of the above, or your own IO
e7281e2 @chriso Updated README
authored
26 - Includes a robust framework for scraping, selecting and traversing web data
b3309be @chriso Updated README and API
authored
27 - Support for a variety of proxies when scraping web data
a3e56c1 @chriso Updated README and made some JSLint fixes
authored
28 - Includes a data validation and sanitization framework
29 - Provides support for retries, timeouts, dynamically adding input, etc.
bc05fa1 @chriso Updated README
authored
30
7552dac @chriso Updated README
authored
31 ## Installation
32
33 To install node.io, use [npm](http://github.com/isaacs/npm):
34
35 $ npm install node.io
36
37 For usage details, run
38
39 $ node.io --help
40
01eb509 @chriso Updated README
authored
41 ## Documentation
bc05fa1 @chriso Updated README
authored
42
b3309be @chriso Updated README and API
authored
43 To get started, see the [documentation](https://github.com/chriso/node.io/blob/master/docs/README.md), [API](https://github.com/chriso/node.io/blob/master/docs/api.md), and [examples](https://github.com/chriso/node.io/tree/master/examples/).
bc05fa1 @chriso Updated README
authored
44
5fab6c3 @chriso Updated README
authored
45 Better documentation will be available once I have time to write it.
bc05fa1 @chriso Updated README
authored
46
e7281e2 @chriso Updated README
authored
47 node.io is an _ALPHA_ release. There will no doubt be some bugs and oddities.
48
22f8535 @chriso Added Roadmap
authored
49 ## Roadmap
50
5fab6c3 @chriso Updated README
authored
51 - Fix up the [http://node.io/](http://node.io/) site
e7281e2 @chriso Updated README
authored
52 - Handle HTTP codes, e.g. automatically redirect on 3xx or call `fail()` on 4xx/5xx
22f8535 @chriso Added Roadmap
authored
53 - Nested requests inherit referrer / cookies if to the same domain
e7281e2 @chriso Updated README
authored
54 - Add more DOM [selector](http://api.jquery.com/category/selectors/) / [traversal](http://api.jquery.com/category/traversing/) methods
97c9bc3 @chriso Updated README
authored
55 - ..or attempt a full port of jQuery that's compatible with [htmlparser](https://github.com/tautologistics/node-htmlparser) (I know a port already exists, but it uses the far less forgiving [JSDOM](https://github.com/tmpvar/jsdom))
b3309be @chriso Updated README and API
authored
56 - Test proxy callbacks and write proxy documentation
22f8535 @chriso Added Roadmap
authored
57 - Add distributed processing
c1dc9da @chriso Added initial documentation
authored
58 - Installation without NPM (install.sh)
59 - Refactoring
fab397c @chriso Updated README
authored
60 - More tests / better test coverage
22f8535 @chriso Added Roadmap
authored
61
d9da9dc @chriso Updated README
authored
62 ## Credits
63
5aa225e @chriso Updated README
authored
64 node.io wouldn't be possible without
d9da9dc @chriso Updated README
authored
65
459525b @chriso Updated README
authored
66 - [ry's](https://github.com/ry) [node.js](http://nodejs.org/)
1f8d7e7 @chriso Updated README
authored
67 - [tautologistics'](https://github.com/tautologistics) [node-htmlparser](https://github.com/tautologistics/node-htmlparser)
68 - [harryf's](https://github.com/harryf) [soupselect](https://github.com/harryf/node-soupselect)
69 - [kriszyp's](https://github.com/kriszyp) [multi-node](https://github.com/kriszyp/multi-node)
d9da9dc @chriso Updated README
authored
70
e7281e2 @chriso Updated README
authored
71 ## Contributing
72
73 [Fork / pull](https://github.com/chriso/node.io/fork).
74
d9da9dc @chriso Updated README
authored
75 ## License
76
f11a5bb @chriso Updated README
authored
77 (MIT License)
78
79 Copyright (c) 2010 Chris O'Hara <cohara87@gmail.com>
80
81 Permission is hereby granted, free of charge, to any person obtaining
82 a copy of this software and associated documentation files (the
83 "Software"), to deal in the Software without restriction, including
84 without limitation the rights to use, copy, modify, merge, publish,
85 distribute, sublicense, and/or sell copies of the Software, and to
86 permit persons to whom the Software is furnished to do so, subject to
87 the following conditions:
88
89 The above copyright notice and this permission notice shall be
90 included in all copies or substantial portions of the Software.
91
92 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
93 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
94 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
95 NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
96 LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
97 OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
98 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Something went wrong with that request. Please try again.