Skip to content
chriso edited this page Dec 20, 2010 · 32 revisions

Data scraping and processing code is organised into modular and extendable jobs written in JavaScript or CoffeeScript. A typical node.io job consists of of taking some input, processing / reducing it in some way, and then outputting the emitted results, although no step is compulsory. Some jobs may not require any input, etc.

Running a job

Jobs can be run from the command line or through a web interface. To run a job from the command line (extension can be omitted), run

$ node.io myjob

To run jobs through the web interface, copy your jobs to ~/.node_modules and run

$ node.io-web -p 8080

The web interface can be accessed at http://localhost:8080/

The anatomy of a job

Each example includes a JavaScript and CoffeeScript version and omits the required var nodeio = require('node.io');

Example 1: Hello World!

hello.js

exports.job = new nodeio.Job({
    input: false,
    run: function () {
        this.emit('Hello World!');
    }
});

hello.coffee

class Hello extends nodeio.JobClass
    input: false
    run: (num) -> @emit 'Hello World!'
    
@class = Hello
@job = new Hello()

Example 2: Double each element of input

double.js

exports.job = new nodeio.Job({
    input: [0,1,2],
    run: function (num) {
        this.emit(num * 2);
    }
});

double.coffee

class Double extends nodeio.JobClass
    input: [0,1,2]
    run: (num) -> @emit num * 2
    
@class = Double
@job = new Double()

Example 3: Inheritance

quad.js

var double = require('./double').job;

exports.job = double.extend({
    run: function (num) {
        this.__super__.run(num * 2);
        //Same as: this.emit(num * 4)
    }
});

quad.coffee

Double = require('./double').Class

class Quad extends Double
    run: (num) -> super num * 2
    
@class = Quad
@job = new Quad()

Other basic concepts

Job options

Options allow you to easily incorporate common or complex behavior. A full list of options can be found in the API.

Options are specified as an object containing key:value pairs

var options = {
    timeout: 10,    //Timeout after 10 seconds
    max: 20,        //Run 20 threads concurrently (when run() is async)
    retries: 3        //Threads can retry 3 times before failing
};
exports.job = new nodeio.Job(options, methods);

Determining when a job is complete

Being asynchronous, node.io needs to be able to determine when each thread (a call to run()) is complete, and when the entire job is complete.

A thread is complete after:

  • emit(), fail(), retry() or skip() has been called - any subsequent calls in the same thread are ignored
  • An option, such as timeout, cause the thread to automatically call one of the methods above

The job is complete when:

  • All of the input has been consumed, or in the case of input:false, when one thread has completed
  • exit() is called

Parsing arguments to jobs

Sometimes it may be desirable to be able to specify arguments to a job, e.g.

$ node.io myjob arg1 arg2 arg3

Arguments can be accessed through this.options.args, e.g.

run: function() {
    console.log(this.options.args[0]); //"arg1"
}

Goto the next tutorial: Working with input / output