Add first scraper which scrapes the archieve for test purposes

codeformuenster · Jan 28, 2015 · 4567251 · 4567251
1 parent 2e7b0a3
commit 4567251
Showing 1 changed file with 12 additions and 22 deletions.
diff --git a/scraper.rb b/scraper.rb
@@ -1,24 +1,14 @@
-# This is a template for a Ruby scraper on Morph (https://morph.io)
-# including some code snippets below that you should find helpful
+require 'wombat'
+require 'scraperwiki'
 
-# require 'scraperwiki'
-# require 'mechanize'
-#
-# agent = Mechanize.new
-#
-# # Read in a page
-# page = agent.get("http://foo.com")
-#
-# # Find somehing on the page using css selectors
-# p page.at('div.content')
-#
-# # Write out to the sqlite database using scraperwiki library
-# ScraperWiki.save_sqlite(["name"], {"name" => "susan", "occupation" => "software developer"})
-#
-# # An arbitrary query against the database
-# ScraperWiki.select("* from data where 'name'='peter'")
+class BuergerbueroScraper
+  include Wombat::Crawler
 
-# You don't have to do things with the Mechanize or ScraperWiki libraries. You can use whatever gems are installed
-# on Morph for Ruby (https://github.com/openaustralia/morph-docker-ruby/blob/master/Gemfile) and all that matters
-# is that your final data is written to an Sqlite database called data.sqlite in the current working directory which
-# has at least a table called data.
+  base_url "http://web.archive.org"
+  path "/web/20131226080706/http://www.muenster.de/stadt/buergeramt/mobil-wartezeit.shtml"
+
+  wartezeit 'xpath=//*[@id="seite"]/div[2]/p[2]/strong'
+  wartende 'xpath=//*[@id="seite"]/div[2]/p[1]/strong'
+  naechste_nummer 'xpath=//*[@id="seite"]/div[2]/p[3]/strong'
+end
+ScraperWiki.save_sqlite(["naechste_nummer"], BuergerbueroScraper.new.crawl)