Skip to content
This repository


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Anemone web-spider framework

branch: master

Fetching latest commit…


Cannot retrieve the latest commit at this time

Octocat-spinner-32 bin
Octocat-spinner-32 lib
Octocat-spinner-32 spec
Octocat-spinner-32 CHANGELOG.rdoc
Octocat-spinner-32 CONTRIBUTORS
Octocat-spinner-32 LICENSE.txt
Octocat-spinner-32 README.rdoc
Octocat-spinner-32 Rakefile
Octocat-spinner-32 VERSION
Octocat-spinner-32 anemone.gemspec


Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.

See for more information.


  • Multi-threaded design for high performance

  • Tracks 301 HTTP redirects

  • Built-in BFS algorithm for determining page depth

  • Allows exclusion of URLs based on regular expressions

  • Choose the links to follow on each page with focus_crawl()

  • HTTPS support

  • Records response time for each page

  • CLI program can list all pages in a domain, calculate page depths, and more

  • Obey robots.txt

  • In-memory or persistent storage of pages during crawl, using TokyoCabinet, MongoDB, or Redis


See the scripts under the lib/anemone/cli directory for examples of several useful Anemone tasks.


  • nokogiri

  • robots


To test and develop this gem, additional requirements are:

  • rspec

  • fakeweb

  • tokyocabinet

  • mongo

  • redis

You will need to have Tokyo Cabinet, MongoDB, and Redis installed on your system and running.

Something went wrong with that request. Please try again.