Skip to content
This repository has been archived by the owner on Apr 6, 2018. It is now read-only.


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Sengi Web Crawler

A web crawler using Ruby and Redis.


First, run:

gem install rake bundler nokogiri hiredis


Redis is used to store everything. So it's always be needed to run Sengi.

Start Redis:


Start Resque -- Scheduler and Worker:


To get a Resque web dashboard at http://localhost:8282, run:


Init Sengi. This sets default variables to Redis and a blacklist of the deepweb.

RUBYOPT=-rbundler/setup ruby ./bin/config --init



To queue a URL to be crawled, run:

RUBYOPT=-rbundler/setup ruby ./bin/crawler -q

Relative Links Only

To crawl only relative links on

RUBYOPT=-rbundler/setup ruby ./bin/crawler -r


Crawl only one URL at a time. The latest datetime will be stored into Redis key urls:schedule:last. A new URL to crawl will be scheduled for a new datetime calculated by urls:schedule:last + url_delay. Where url_delay is the number of seconds between the scheduled URLs.

RUBYOPT=-rbundler/setup ruby ./bin/crawler -s


Copyright (C) 2016 Christian Mayer

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see