Skip to content

Getting Started

chriso edited this page · 32 revisions

Data scraping and processing code is organised into modular and extendable jobs written in JavaScript or CoffeeScript. A typical job consists of of taking some input, processing / reducing it in some way, and then outputting the emitted results, although no step is compulsory. Some scraping jobs don't require input, etc.

Running a job

Jobs can be run from the command line or through a web interface. To run a job from the command line (extension can be omitted), run

$ myjob

To run jobs through the web interface, copy your jobs to ~/.node_modules and run

$ -p 8080

The web interface can be accessed at http://localhost:8080/

The anatomy of a job

Each example includes a JavaScript and CoffeeScript version and omits the required var nodeio = require('');

Example 1: Hello World!


exports.job = new nodeio.Job({
    input: false,
    run: function () {
        this.emit('Hello World!');

class Hello extends nodeio.JobClass
    input: false
    run: (num) -> @emit 'Hello World!'

@class = Hello
@job = new Hello()

To run the example

$ -s hello
     => Hello World!

Note: the -s switch omits status messages from output

Example 2: Double each element of input


exports.job = new nodeio.Job({
    input: [0,1,2],
    run: function (num) {
        this.emit(num * 2);

class Double extends nodeio.JobClass
    input: [0,1,2]
    run: (num) -> @emit num * 2

@class = Double
@job = new Double()

Example 3: Inheritance


var double = require('./double').job;

exports.job = double.extend({
    run: function (num) { * 2);
        //Same as: this.emit(num * 4)

Double = require('./double').Class

class Quad extends Double
    run: (num) -> super num * 2

@class = Quad
@job = new Quad()

Goto part 2: Basic concepts

Goto part 3: Working with input / output

Goto part 4: Scraping data from the web

Something went wrong with that request. Please try again.