public
Fork of mislav/scraper
Description: A minimalistic, declarative HTML scraper written in an hour at Ljubljana Ruby Tuesday meeting
Homepage:
Clone URL: git://github.com/davidhq/scraper.git
name age message
file README.md Loading commit data...
file Rakefile
directory examples/
file scraper.rb
README.md

Scraper

Scraper is a cute HTML screen-scraping tool.

require 'scraper'
require 'open-uri'

class BlogScraper < Scraper
  element :title

  elements 'div.hentry' => :articles do
    element :title => h2'
    element :url => 'a/@href'
  end
end

blog = BlogScraper.parse open('http://example.com')

blog.title
#=> "My blog title"

blog.articles.first.title
#=> "First article title"

blog.articles.first.url
#=> "http://example.com/article"

There are sample scripts in the "examples/" directory; run them with:

ruby -rubygems examples/<script>.rb

See the wiki for more on how to use Scraper.

Requirements

None. Well, Nokogiri is a requirement if you pass in HTML content that needs to be parsed, like in the example above. Otherwise you can initialize the scaper with an Hpricot document or anything else that implements at(selector) and search(selector) methods.