No description or website provided.
JavaScript
Latest commit 19b697b Apr 15, 2016 @MauriceButler 0.0.7
Permalink
Failed to load latest commit information.
tests added basic test Sep 22, 2013
.gitattributes filtering out some non valuable urls Sep 21, 2013
.gitignore inital functionality Sep 21, 2013
LICENSE Initial commit Sep 21, 2013
README.md updated to lastest simple crawler Sep 25, 2013
gretel.js cleaned up empty uri Sep 23, 2013
index.js clean up Sep 23, 2013
package.json 0.0.7 Apr 15, 2016

README.md

Gretel

Follows and collects breadcrumbs across the web.

Heavily relies on Christopher Giffard's node-simplecrawler

Usage

CLI

gretel [options]

Options:

  -h, --help                  output usage information
  -V, --version               output the version number
  -s, --startUri [uri]        Uri to start crawling from
  -q, --queuePath [filePath]  File path to load / save queue from

Module

var gretel = require('gretel')('www.example.com');

gretel.start();

Optionally load / save breadcrumb queue state

gretel.load('./breadcrumbs.json', function(error){
    if(error){
        return console.log(error.stack || error);
    }

    gretel.start();
});

gretel.queue.freeze("./breadcrumbs.json", function(error){
    if(error){
        console.log(error.stack || error);
    }
});

Other settings on gretel are the same as node-simplecrawler (she is actually an instance of Crawler) for more info and examples see the readme for node-simplecrawler

// sync processing
gretel.on('fetchcomplete', function(queueItem, data, response) {
    console.log(queueItem.url);
});

// async processing
gretel.on("fetchcomplete", function(queueItem, data, response) {
    var continue = this.wait();
    doSomethingAsync(data, function(){
        console.log(queueItem.url);
        continue();
    });
});