This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
commit 6d94ba0f8d5f3f276a546addb526824129eea0d0
tree f0a206d6ebd265525c1b096e5f49801e9b38b1ce
parent de45d03fdbce6198ceba1af6a5e9885c99018e13
tree f0a206d6ebd265525c1b096e5f49801e9b38b1ce
parent de45d03fdbce6198ceba1af6a5e9885c99018e13
scraper /
| name | age | message | |
|---|---|---|---|
| |
README.md | ||
| |
Rakefile | Sat Oct 24 06:56:03 -0700 2009 | |
| |
examples/ | Sat Oct 24 06:56:03 -0700 2009 | |
| |
lib/ |
README.md
Scraper
Scraper is a cute HTML screen-scraping tool.
require 'scraper'
require 'open-uri'
class BlogScraper < Scraper
element :title
elements 'div.hentry' => :articles do
element 'h2' => :title
element 'a/@href' => :url
end
end
blog = BlogScraper.parse open('http://example.com')
blog.title
#=> "My blog title"
blog.articles.first.title
#=> "First article title"
blog.articles.first.url
#=> "http://example.com/article"
There are sample scripts in the "examples/" directory; run them with:
ruby -rubygems examples/<script>.rb
See the wiki for more on how to use Scraper.
Requirements
None. Well, Nokogiri is a requirement if you pass in HTML content that needs to be parsed, like in the example above. Otherwise you can initialize the scaper with an Hpricot document or anything else that implements at(selector) and search(selector) methods.








