Skip to content

ssoroka/spider_bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spider bot



a non-threaded spider bot that spiders a site with response time stats. easily extendable

Installation

sudo gem install ssoroka-spider_bot

Usage



Usable in code or from terminal:

$ spider_bot http://www.example.com
$ spider_bot http://0.0.0.0:3000

Code example of a custom spider script in script/spider:

#!/usr/bin/env ruby
require 'rubygems'
require 'spider_bot'

class MySpider < SpiderBot
  # override these for handling events
  def on_page(page)
  end

  def on_404(link)
  end

  def on_500(link)
  end

  # override these for changing how urls are classified as links
  def off_site?(url)
    url !~ /^\// # urls not starting with a /
  end

  def ignorable?(url)
    url =~ /\/.*\..+/ && # files with extensions
      url !~ /\.html$/ # but not html files
  end
end

spider = MySpider.new(:quiet => false)
spider.start(ARGV[1])

Gem Requirements

  • ssoroka-ansi

  • mechanize

About

a non-threaded spider bot that spiders a site with response time stats. easily extendable.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages