This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
gscraper /
| name | age | message | |
|---|---|---|---|
| |
.gitignore | ||
| |
COPYING.txt | ||
| |
History.txt | ||
| |
Manifest.txt | ||
| |
README.txt | ||
| |
Rakefile | ||
| |
lib/ | ||
| |
spec/ | ||
| |
tasks/ |
README.txt
= GScraper * http://gscraper.rubyforge.org/ * http://github.com/postmodern/gscraper/ * Postmodern (postmodern.mod3 at gmail.com) == DESCRIPTION: GScraper is a web-scraping interface to various Google Services. == FEATURES/PROBLEMS: * Supports the Google Search service. * Provides access to search results and ranks. * Provides access to the Sponsored Links. * Provides HTTP access with custom User-Agent strings. * Provides proxy settings for HTTP access. == REQUIREMENTS: * mechanize >= 0.9.0 == INSTALL: $ sudo gem install gscraper == EXAMPLES: * Basic query: q = GScraper::Search.query(:query => 'ruby') * Advanced query: q = GScraper::Search.query(:query => 'ruby') do |q| q.without_words = 'is' q.within_past_day = true q.numeric_range = 2..10 end * Queries from URLs: q = GScraper::Search.query_from_url('http://www.google.com/search?as_q=ruby&as_epq=&as_oq=rails&as_ft=i&as_qdr=all&as_oc ct=body&as_rights=%28cc_publicdomain%7Ccc_attribute%7Ccc_sharealike%7Ccc_noncommercial%29.-%28cc_nonderived%29') q.query # => "ruby" q.with_words # => "rails" q.occurrs_within # => :title q.rights # => :cc_by_nc * Getting the search results: q.first_page.select do |result| result.title =~ /Blog/ end q.page(2).map do |result| result.title.reverse end q.result_at(25) # => Result q.top_result # => Result * A Result object contains the rank, title, summary, cahced URL, similiar query URL and link URL of the search result. page = q.page(2) page.urls # => [...] page.summaries # => [...] page.ranks_of { |result| result.url =~ /^https/ } # => [...] page.titles_of { |result| result.summary =~ /password/ } # => [...] page.cached_pages # => [...] page.similar_queries # => [...] * Iterating over the search results: q.each_on_page(2) do |result| puts result.title end page.each do |result| puts result.url end * Iterating over the data within the search results: page.each_title do |title| puts title end page.each_summary do |text| puts text end * Selecting search results: page.results_with do |result| ((result.rank > 2) && (result.rank < 10)) end page.results_with_title(/Ruby/i) # => [...] * Selecting data within the search results: page.titles # => [...] page.summaries # => [...] * Selecting the data of search results based on the search result: page.urls_of do |result| result.description.length > 10 end * Selecting the Sponsored Links of a Query: q.sponsored_links # => [...] q.top_sponsored_link # => SponsoredAd * Setting the User-Agent globally: GScraper.user_agent # => nil GScraper.user_agent = 'Awesome Browser v1.2' == LICENSE: GScraper - A web-scraping interface to various Google Services. Copyright (c) 2007-2009 Hal Brodigan (postmodern.mod3 at gmail.com) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA








