mislav / anemone forked from chriskite/anemone

Anemone web-spider framework

This URL has Read+Write access

Chris Kite (author)
Sun Oct 18 14:19:54 -0700 2009
mislav (committer)
Tue Oct 20 00:25:38 -0700 2009
name age message
file CHANGELOG.md Tue Oct 20 00:25:24 -0700 2009 grammar in docs; write changelog since v0.2.0 [mislav]
file LICENSE.txt Tue Apr 14 12:14:47 -0700 2009 initial import [Chris Kite]
file README.rdoc Mon Oct 05 08:32:44 -0700 2009 update readme [mislav]
file anemone.gemspec Mon Sep 07 14:30:40 -0700 2009 up version to 0.2.0 [Chris Kite]
directory bin/ Thu Oct 01 07:39:13 -0700 2009 make old `anemone_*.rb` scripts available throu... [mislav]
directory lib/ Tue Oct 20 00:25:38 -0700 2009 added support for ssl (without cert verification) [Chris Kite]
directory spec/ Tue Oct 20 00:23:49 -0700 2009 add ":allowed_urls", ":skip_urls" options Used... [mislav]
README.rdoc

Anemone

Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.

Features:

  • Multi-threaded design for high performance
  • Tracks 301 HTTP redirects to understand a page’s aliases
  • Built-in BFS algorithm for determining page depth
  • Allows exclusion of URLs based on regular expressions

Examples

See the scripts under lib/anemone/cli directory for examples of several useful Anemone tasks.

REQUIREMENTS

  • nokogiri