A simple tool for checking references on your website
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
gem_tasks
lib
spec
.gitignore
.rbenv-version
.rspec
.travis.yml
Gemfile
Guardfile
History.md
LICENSE
README.md
Rakefile
site_checker.gemspec

README.md

###Site Checker

Gem Version Build Status Dependency Status Code Climate Coverage Status

Site Checker is a simple ruby gem, which helps you check the integrity of your website by recursively visiting the referenced pages and images. I use it in my test environments to make sure that my websites don't have any dead links.

Install

gem install site_checker

Usage

In Test Code

First, you have to load the site_checker by adding this line to the file where you would like to use it:

require 'site_checker'

If you want to use it for testing, the line should goto the test_helper.rb.

The usage is quite simple:

check_site("http://localhost:3000/app", "http://localhost:3000")
puts collected_remote_pages.inspect
puts collected_local_pages.inspect
puts collected_remote_images.inspect
puts collected_local_images.inspect
puts collected_problems.inspect

The snippet above will open the http://localhost:3000/app link and will look for links and images. If it finds a link to a local page, it will recursively checkout out that page, too. The second argument - http://localhost:3000 - defines the starting reference of your website.

In case you don't want to use a DSL like API you can still do the following:

SiteChecker.check("http://localhost:3000/app", "http://localhost:3000")
puts SiteChecker.remote_pages.inspect
puts SiteChecker.local_pages.inspect
puts SiteChecker.remote_images.inspect
puts SiteChecker.local_images.inspect
puts SiteChecker.problems.inspect
Using on Generated Content

If you have a static website (e.g. generated by octopress) you can tell site_checker to use folders from the file system. With this approach, you don't need a webserver for verifying your website:

check_site("./public", "./public")
puts collected_problems.inspect
Configuration

You can instruct site_checker to ignore certain links:

SiteChecker.configure do |config|
  config.ignore_list = ["/", "/atom.xml"]
end

By default it won't check the conditions of the remote links and images - e.g. 404 or 500 -, but you can change it like this:

SiteChecker.configure do |config|
  config.visit_references = true
end

Too deep recursive calls may be expensive, so you can configure the maximum depth of the recursion with the following attribute:

SiteChecker.configure do |config|
  config.max_recursion_depth = 3
end
Examples

Make sure that there are no local dead links on the website (I'm using rspec syntax):

before(:each) do
  SiteChecker.configure do |config|
    config.ignore_list = ["/atom.xml", "/rss"]
  end
end

it "should not have dead local links" do
  check_site("http://localhost:3000", "http://localhost:3000")
  # this will print out the difference and I don't have to re-run with print
  collected_problems.should be_empty
end

Check that all the local pages can be reached with maximum two steps:

before(:each) do
  SiteChecker.configure do |config|
    config.ignore_list = ["/atom.xml", "/rss"]
    config.max_recursion_depth = 2
  end

  @number_of_local_pages = 100
end

it "all the local pages have to be visited" do
  check_site("http://localhost:3000", "http://localhost:3000")
  collected_local_pages.size.should eq @number_of_local_pages
end

Command line

From version 0.3.0 the site checker can be used from the command line as well. Here is the list of the available options:

~ % site_checker -h
Visits the <site_url> and prints out the list of those URLs which cannot be found

Usage: site_checker [options] <site_url>
-e, --visit-external-references  Visit external references (may take a bit longer)
-m, --max-recursion-depth N      Set the depth of the recursion
-r, --root URL                   The root URL of the path
-i, --ignore URL                 Ignore the provided URL (can be applied several times)
-p, --print-local-pages          Prints the list of the URLs of the collected local pages
-x, --print-remote-pages         Prints the list of the URLs of the collected remote pages
-y, --print-local-images         Prints the list of the URLs of the collected local images
-z, --print-remote-images        Prints the list of the URLs of the collected remote images
-h, --help                       Show a short description and this message
-v, --version                    Show version

Troubleshooting

undefined method 'new' for SiteChecker:Module

This error occurs when the test code calls v0.1.1 methods, but a newer version of the gem has already been installed. Update your test code following the examples above.

Copyright

Copyright (c) 2013 Zsolt Fabok and Contributors. See LICENSE for details.