github
Advanced Search
  • Home
  • Pricing and Signup
  • Explore GitHub
  • Blog
  • Login

postmodern / gscraper

  • Admin
  • Watch Unwatch
  • Fork
  • Your Fork
  • Pull Request
  • Download Source
    • 4
    • 0
  • Source
  • Commits
  • Network (0)
  • Issues (0)
  • Downloads (7)
  • Wiki (2)
  • Graphs
  • Branch: master

click here to add a description

click here to add a homepage

  • Branches (1)
    • master ✓
  • Tags (7)
    • 0.2.4
    • 0.2.3
    • 0.2.2
    • 0.2.1
    • 0.1.7
    • 0.1.3
    • 0.1.2
Sending Request…
Enable Donations

Pledgie Donations

Once activated, we'll place the following badge in your repository's detail box:
Pledgie_example
This service is courtesy of Pledgie.

A web-scraping interface to various Google Services. — Read more

  cancel

http://gscraper.rubyforge.org/

  cancel
  • Private
  • Read-Only
  • HTTP Read-Only

This URL has Read+Write access

Require rspec >= 1.3.0. 
postmodern (author)
Mon Feb 01 14:22:09 -0800 2010
commit  f23a0f14cf61e41f81a0e3b31f84e12d7d28dca9
tree    a39e9fb43d05eb063bc45eab807332d97e8cb559
parent  a5ef99fe17c2f77ceefbbff583b1531aaa44e21b
gscraper /
name age
history
message
file .gitignore Loading commit data...
file COPYING.txt
file History.txt
file Manifest.txt
file README.txt
file Rakefile
directory lib/
directory spec/
README.txt
= GScraper

* http://gscraper.rubyforge.org/
* http://github.com/postmodern/gscraper/
* Postmodern (postmodern.mod3 at gmail.com)

== DESCRIPTION:
  
GScraper is a web-scraping interface to various Google Services.

== FEATURES/PROBLEMS:
  
* Supports the Google Search service.
  * Provides access to search results and ranks.
  * Provides access to the Sponsored Links.
* Provides HTTP access with custom User-Agent strings.
* Provides proxy settings for HTTP access.

== REQUIREMENTS:

* mechanize >= 0.9.0

== INSTALL:

  $ sudo gem install gscraper

== EXAMPLES:

* Basic query:

    q = GScraper::Search.query(:query => 'ruby')

* Advanced query:

    q = GScraper::Search.query(:query => 'ruby') do |q|
      q.without_words = 'is'
      q.within_past_day = true
      q.numeric_range = 2..10
    end

* Queries from URLs:

    q = 
    GScraper::Search.query_from_url('http://www.google.com/search?as_q=ruby&as_epq=&as_oq=rails&as_ft=i&as_qdr=all&as_oc
    ct=body&as_rights=%28cc_publicdomain%7Ccc_attribute%7Ccc_sharealike%7Ccc_noncommercial%29.-%28cc_nonderived%29')

    q.query # => "ruby"
    q.with_words # => "rails"
    q.occurrs_within # => :title
    q.rights # => :cc_by_nc

* Getting the search results:

    q.first_page.select do |result|
      result.title =~ /Blog/
    end

    q.page(2).map do |result|
      result.title.reverse
    end

    q.result_at(25) # => Result

    q.top_result # => Result

* A Result object contains the rank, title, summary, cahced URL, similiar
  query URL and link URL of the search result.

    page = q.page(2)

    page.urls # => [...]
    page.summaries # => [...]
    page.ranks_of { |result| result.url =~ /^https/ } # => [...]
    page.titles_of { |result| result.summary =~ /password/ } # => [...]
    page.cached_pages # => [...]
    page.similar_queries # => [...]

* Iterating over the search results:

    q.each_on_page(2) do |result|
      puts result.title
    end

    page.each do |result|
      puts result.url
    end

* Iterating over the data within the search results:

    page.each_title do |title|
      puts title
    end

    page.each_summary do |text|
      puts text
    end

* Selecting search results:

    page.results_with do |result|
      ((result.rank > 2) && (result.rank < 10))
    end

    page.results_with_title(/Ruby/i) # => [...]

* Selecting data within the search results:

    page.titles # => [...]

    page.summaries # => [...]

* Selecting the data of search results based on the search result:

    page.urls_of do |result|
      result.description.length > 10
    end

* Selecting the Sponsored Links of a Query:

    q.sponsored_links # => [...]

    q.top_sponsored_link # => SponsoredAd

* Setting the User-Agent globally:

    GScraper.user_agent # => nil
    GScraper.user_agent = 'Awesome Browser v1.2'

== LICENSE:

GScraper - A web-scraping interface to various Google Services.

Copyright (c) 2007-2009 Hal Brodigan (postmodern.mod3 at gmail.com)

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
Blog | Support | Training | Contact | API | Status | Twitter | Help | Security
© 2010 GitHub Inc. All rights reserved. | Terms of Service | Privacy Policy
Powered by the Dedicated Servers and
Cloud Computing of Rackspace Hosting®
Dedicated Server