This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
commit e475a64e6bbf4e2c7a0bcc5e6407dfa7881b9ad3
tree e67425fb16e4548d28120658140cbdcfe4757418
parent 32153103240b1c34b8384b5eb691164c83efd1d6
tree e67425fb16e4548d28120658140cbdcfe4757418
parent 32153103240b1c34b8384b5eb691164c83efd1d6
anemone /
| name | age | message | |
|---|---|---|---|
| |
CHANGELOG.rdoc | ||
| |
LICENSE.txt | Tue Apr 14 12:14:47 -0700 2009 | |
| |
README.rdoc | ||
| |
anemone.gemspec | ||
| |
bin/ | Thu Oct 22 20:51:37 -0700 2009 | |
| |
lib/ | ||
| |
spec/ |
README.rdoc
Anemone
Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.
See anemone.rubyforge.org for more information.
Features
- Multi-threaded design for high performance
- Tracks 301 HTTP redirects to understand a page’s aliases
- Built-in BFS algorithm for determining page depth
- Allows exclusion of URLs based on regular expressions
- Choose the links to follow on each page with focus_crawl()
- HTTPS support
- Records response time for each page
- CLI program can list all pages in a domain, calculate page depths, and more
Examples
See the scripts under the lib/anemone/cli directory for examples of several useful Anemone tasks.
Requirements
- nokogiri
Optional
- fizx-robots (required if obey_robots_txt is set to true)








