Skip to content
This repository has been archived by the owner on Apr 6, 2019. It is now read-only.

opennorth-archive/scraperwiki-api-ruby

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The ScraperWiki API Ruby Gem

A Ruby wrapper for the ScraperWiki API.

Build Status Dependency Status Coverage Status Code Climate

Installation

gem install scraperwiki-api

API Examples

>> require 'scraperwiki-api'

>> api = ScraperWiki::API.new 'my-api-key' # API key is optional

>> api.datastore_sqlite 'example-scraper', 'SELECT * FROM swdata LIMIT 10'
=> [{"fieldA"=>"valueA", "fieldB"=>"valueB", "fieldC"=>"valueC"}, ...]

>> api.scraper_getinfo 'example-scraper'
=> [{"code"=>"require 'nokogiri'\n...", "datasummary"=>...}]

>> api.scraper_getruninfo 'example-scraper'
=> [{"run_ended"=>"1970-01-01T00:00:00", "first_url_scraped"=>...}]

>> api.scraper_getuserinfo 'johndoe'
=> [{"username"=>"johndoe", "profilename"=>"John Doe", "coderoles"=>...}]

>> api.scraper_search searchquery: 'search terms'
=> [{"description"=>"Scrapes websites for data.", "language"=>"ruby", ...]

>> api.scraper_usersearch searchquery: 'search terms'
=> [{"username"=>"johndoe", "profilename"=>"John Doe", "date_joined"=>...}]

More documentation at RubyDoc.info.

Scraper validations

If your project uses a lot of scrapers – for example, OpenCorporates, which scrapes company registries around the world, or Represent, which scrapes information on elected officials from government websites in Canada – you'll want to check that your scrapers behave the way you expect them to. This gem defines RSpec matchers to do just that.

You can validate a scraper's metadata (how often it runs, what fields it stores, etc.) like so:

require 'scraperwiki-api'
api = ScraperWiki::API.new

info = api.scraper_getinfo('example-scraper').first

describe 'example-scraper' do
  include ScraperWiki::API::Matchers
  subject {info}

  it {should be_protected}
  it {should be_editable_by('frabcus')}
  it {should run(:daily)}
  it {should_not be_broken}

  # Validate the properties of a SQLite table by chaining on a +on+.
  it {should have_a_row_count_of(42).on('swdata')}

  # Ensure that the scraper sets required fields.
  it {should have_at_least_the_keys(['name', 'email']).on('swdata')}

  # Ensure that the scraper doesn't set too many fields.
  it {should have_at_most_the_keys(['name', 'email', 'tel', 'fax']).on('swdata')}
end

And you can validate the scraped data like so:

require 'scraperwiki-api'
api = ScraperWiki::API.new

data = api.datastore_sqlite('example-scraper', 'SELECT * from `swdata`')

describe 'example-scraper' do
  include ScraperWiki::API::Matchers
  subject {data}

  # If you need at least one of a set of fields to be set:
  it {should set_any_of(['name', 'first_name', 'last_name'])}

  # Validate the values of individual fields by chaining on an +in+.
  it {should_not have_blank_values.in('name')}
  it {should have_unique_values.in('email')}
  it {should have_values_of(['M', 'F']).in('gender')}
  it {should have_values_matching(/\A[^@\s]+@[^@\s]+\z/).in('email')}
  it {should have_values_starting_with('http://').in('url')}
  it {should have_values_ending_with('Inc.').in('company_name')}
  it {should have_integer_values.in('year')}

  # If you store a hash or an array of hashes in a field as a JSON string,
  # you can validate the values of these subfields by chaining on an +at+.
  it {should have_values_of(['M', 'F']).in('extra').at('gender')}

  # Check for missing keys within subfields.
  it {should have_values_with_at_least_the_keys(['subfield1', 'subfield2']).in('fieldA')}

  # Check for extra keys within subfields.
  it {should have_values_with_at_most_the_keys(['subfield1', 'subfield2', 'subfield3', 'subfield4']).in('fieldA')}
end

More documentation at RubyDoc.info.

Bugs? Questions?

This gem's main repository is on GitHub: http://github.com/opennorth/scraperwiki-api-ruby, where your contributions, forks, bug reports, feature requests, and feedback are greatly welcomed.

Copyright (c) 2011 Open North Inc., released under the MIT license

About

Ruby gem that wraps the ScraperWiki classic API (obsolete)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages