Skip to content


Subversion checkout URL

You can clone with
Download ZIP

Getting Started

chriso edited this page · 32 revisions

Data scraping and processing code is organised into modular and extendable jobs written in JavaScript or CoffeeScript. A typical job consists of of taking some input, processing / reducing it in some way, and then outputting the emitted results, although no step is compulsory. Some jobs may not require any input, etc.

Jobs can be run from the command line or through a web interface. To run a job from the command line (extension can be omitted), run

$ myjob

To run jobs through the web interface, copy your jobs to ~/.node_modules and run


Basic examples

Let's run through some simple examples highlighting the anatomy of a job. Each example includes a JavaScript and CoffeeScript version and omits the required var nodeio = require('');

Example 1: Hello World!


exports.job = new nodeio.Job({
    input: false,
    run: function () {
        this.emit('Hello World!');

class Hello extends nodeio.JobClass
    input: false
    run: (num) -> @emit 'Hello World!'

@class = Hello
@job = new Hello()

Example 2: Double each element of input


exports.job = new nodeio.Job({
    input: [0,1,2],
    run: function (num) {
        this.emit(num * 2);

class Double extends nodeio.JobClass
    input: [0,1,2]
    run: (num) -> @emit num * 2

@class = Double
@job = new Double()

Example 3: Extend the previous example to quadruple elements


var double = require('./double').job;

exports.job = double.extend({
    run: function (num) { * 2);
        //Same as: this.emit(num * 4)

Double = require('./double').Class

class Quad extends Double
    run: (num) -> super num * 2

@class = Quad
@job = new Quad()

Working with different IO

Example 1: Files

Files can be read/written in two ways, (1) they can be specified inside of the job, or (2) specified at the command line, since reads input from stdin (elements are separated by \n or \r\n) and writes output to stdout.


exports.job = new nodeio.Job({
    input: 'input.csv',
    run: function (row) {
        this.emit(row.replace(',', '\t'));
    output: 'output.tsv',
}); - (input / output must be specified at the command line)

class CsvTsv extends nodeio.JobClass
    run: (row) -> @emit row.replace ',' '\t'

@class = CsvTsv
@job = new CsvTsv()

The following two commands are equivalent

$ csv_to_csv.js
$ < input.csv > output.csv

Example 2: Databases & custom IO

To read rows from a database, use the following template. start begins at 0 and num is the number of rows to return. When there are no more rows, return false.


exports.job = new nodeio.Job({
    input: function (start, num, callback) {
    run: function (row) {
    output: function (rows) {
          //Note: this method always receives multiple rows as an array

Example 3: Stream

To read from read_stream and write to write_stream, use the following example


exports.job = new nodeio.Job({
    input: function () {
        this.input.apply(this, arguments);
    run: function (line) {
    output: function (lines) {
Something went wrong with that request. Please try again.