-
Notifications
You must be signed in to change notification settings - Fork 140
Getting Started
Data scraping and processing code is organised into modular and extendable jobs written in JavaScript or CoffeeScript. A typical node.io job consists of of taking some input, processing / reducing it in some way, and then outputting the emitted results, although no step is compulsory. Some jobs may not require any input, etc.
Jobs can be run from the command line or through a web interface. To run a job from the command line (extension can be omitted), run
$ node.io myjob
To run jobs through the web interface, copy your jobs to ~/.node_modules
and run
$ node.io-web -p 8080
The web interface can be accessed at http://localhost:8080/
Each example includes a JavaScript and CoffeeScript version and omits the required var nodeio = require('node.io');
Example 1: Hello World!
hello.js
exports.job = new nodeio.Job({
input: false,
run: function () {
this.emit('Hello World!');
}
});
hello.coffee
class Hello extends nodeio.JobClass
input: false
run: (num) -> @emit 'Hello World!'
@class = Hello
@job = new Hello()
Example 2: Double each element of input
double.js
exports.job = new nodeio.Job({
input: [0,1,2],
run: function (num) {
this.emit(num * 2);
}
});
double.coffee
class Double extends nodeio.JobClass
input: [0,1,2]
run: (num) -> @emit num * 2
@class = Double
@job = new Double()
Example 3: Inheritance
quad.js
var double = require('./double').job;
exports.job = double.extend({
run: function (num) {
this.__super__.run(num * 2);
//Same as: this.emit(num * 4)
}
});
quad.coffee
Double = require('./double').Class
class Quad extends Double
run: (num) -> super num * 2
@class = Quad
@job = new Quad()
Job options
Options allow you to easily incorporate common or complex behavior. A full list of options can be found in the API.
Options are specified as an object containing key:value pairs
var options = {
timeout: 10, //Timeout after 10 seconds
max: 20, //Run 20 threads concurrently (when run() is async)
retries: 3 //Threads can retry 3 times before failing
};
exports.job = new nodeio.Job(options, methods);
Determining when a job is complete
Being asynchronous, node.io needs to be able to determine when each thread (a call to run()
) is complete, and when the entire job is complete.
A thread is complete after:
-
emit()
,fail()
,retry()
orskip()
has been called - any subsequent calls in the same thread are ignored - An option, such as timeout, cause the thread to automatically call one of the methods above
The job is complete when:
- All of the input has been consumed, or in the case of
input:false
, when one thread has completed -
exit()
is called
Parsing arguments to jobs
Sometimes it may be desirable to be able to specify arguments to a job, e.g.
$ node.io myjob arg1 arg2 arg3
Arguments can be accessed through this.options.args
, e.g.
run: function() {
console.log(this.options.args[0]); //"arg1"
}