aiaio / basilisk

basil(isk): a front-end for the anemone web crawler.

This URL has Read+Write access

name age message
file .gitignore Mon Aug 24 15:09:54 -0700 2009 Initial commit [banker]
file HISTORY Mon Aug 24 15:09:54 -0700 2009 Initial commit [banker]
file LICENSE Mon Aug 24 15:09:54 -0700 2009 Initial commit [banker]
file README.rdoc Tue Aug 25 16:07:12 -0700 2009 Added image processor [banker]
file basilisk.gemspec Tue Aug 25 18:30:24 -0700 2009 Bumped version [banker]
directory bin/ Tue Aug 25 16:07:12 -0700 2009 Added image processor [banker]
directory lib/ Tue Aug 25 16:07:12 -0700 2009 Added image processor [banker]
directory test/ Mon Aug 24 15:09:54 -0700 2009 Initial commit [banker]
README.rdoc

basilisk

a command-line front-end for the anemone web-crawler (github.com/chriskite/anemone). basilisk produces useful reports for qa-ing websites. It also features an extensible page processor class for writing your own page processors.

Included page processors:

  • seo: generates a csv with the following columns: url, title, description, keywords, h1s, h2s
  • sitemap: generates an xml sitemap
  • image: generates a list of broken images and images lacking an alt tag.
  • error: generates a csv of urls returning html response codes other than success and redirect.

See the generated yml config file for even more options.

install

  sudo gem install basilisk

usage

To create a new search:

  basil create [search_name] [url]
  • Creates a search config file ([search_name].yml), which you may edit to change the default options, specify which page process you want to run, any regex and css terms for searching across the site, and regexes for skipping urls.

To run the search:

  basil run [search_name]
  • Runs the specified search. Note: you must create a search before running it. Files generated by the page processors will reside in a folder called [search_name].

author & license

basilisk is licensed under a modified MIT licence. See LICENCE.txt.

basilisk was written by Kyle Banker, largely dependent on the anemone web-crawler by Chris Kite.

Copyright 2009 Alexander Interactive, Inc.