Skip to content


chriso edited this page · 21 revisions includes a robust framework for scraping data from the web. The primary methods for scraping data are get and getHtml, although there are methods for making any type of request, modifying headers, etc. See the API for a full list of methods.

Note that var nodeio = require(''); is omitted in each example.

Example 1: Save a web page to disk

The following script is equivalent to

$ curl "" > google.html


exports.job = new nodeio.Job({
    run: function (url) {
        var self = this;
        this.get(url, function(err, data) {
            if (err) {
            } else {

class SavePage extends nodeio.JobClass
    run: (url) -> 
        @get url, (error, data) =>
            if err? then @exit err else @emit data

To save a page to disk, run

$ echo "" | -s save > google.html

Example 1: Get the number of Google results for a list of keywords

To use effectively, try and encapsulate common scraping code in run() so that the resulting job is as generic and versatile as possible.


var options = {timeout: 10};

exports.job = new nodeio.Job(options, {
    input: ['hello', 'foobar','weather'],
    run: function (keyword) {
        var self = this, results;
        this.getHtml('' + encodeURIComponent(keyword), function (err, $) {
            results = $('#resultStats').text.toLowerCase();
            self.emit(keyword + ' has ' + results);

*Note: you could also comment out input: ['hello', 'foobar','weather'], and specify a list of keywords through the web interface or at the command line, e.g.

$ keywords < list_of_words.js

Example 1: Scraping one page


var options = {timeout: 10}; //Timeout after 10s

exports.job = new nodeio.Job(options, {
    input: false,
    run: function (row) {
        this.emit(row.replace(',', '\t'));
    output: 'output.tsv',
Something went wrong with that request. Please try again.