Skip to content

Commit

Permalink
Gardeners World 'What to do now' checklist scraper
Browse files Browse the repository at this point in the history
  • Loading branch information
chrismytton committed Mar 12, 2016
0 parents commit 81d0226
Show file tree
Hide file tree
Showing 4 changed files with 69 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
data.sqlite
.cache
6 changes: 6 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
source 'https://rubygems.org'

gem 'pry'
gem 'scraperwiki', git: 'https://github.com/openaustralia/scraperwiki-ruby.git', branch: 'morph_defaults'
gem 'nokogiri'
gem 'open-uri-cached', require: 'open-uri/cached'
39 changes: 39 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
GIT
remote: https://github.com/openaustralia/scraperwiki-ruby.git
revision: fc50176812505e463077d5c673d504a6a234aa78
branch: morph_defaults
specs:
scraperwiki (3.0.1)
httpclient
sqlite_magic

GEM
remote: https://rubygems.org/
specs:
coderay (1.1.1)
httpclient (2.7.1)
method_source (0.8.2)
mini_portile2 (2.0.0)
nokogiri (1.6.7.2)
mini_portile2 (~> 2.0.0.rc2)
open-uri-cached (0.0.5)
pry (0.10.3)
coderay (~> 1.1.0)
method_source (~> 0.8.1)
slop (~> 3.4)
slop (3.6.0)
sqlite3 (1.3.11)
sqlite_magic (0.0.6)
sqlite3

PLATFORMS
ruby

DEPENDENCIES
nokogiri
open-uri-cached
pry
scraperwiki!

BUNDLED WITH
1.11.2
22 changes: 22 additions & 0 deletions scraper.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
require 'bundler'
Bundler.require
OpenURI::Cache.cache_path = '.cache'

require 'open-uri'

sections = %w(flowers-checklist fruit-veg-checklist greenhouse-checklist around-garden-checklist)
1.upto(52) do |week_number|
sections.each do |section|
url = "http://www.gardenersworld.com/what-to-do-now/week#{week_number}/#{section}/?print=true"
warn "Fetching: #{url}"
page = Nokogiri::HTML(open(url))
page.css('.checklist ul:first li').each do |job|
data = {
week: week_number,
section: section,
job: job.text
}
ScraperWiki.save_sqlite([:week, :section, :job], data)
end
end
end

0 comments on commit 81d0226

Please sign in to comment.