Skip to content

Commit

Permalink
Document that Mechanize::Page#search accepts an XPath or CSS expressi…
Browse files Browse the repository at this point in the history
…on. Fixes sparklemotion#199
  • Loading branch information
drbrain committed Feb 13, 2012
1 parent 05be267 commit 30eb161
Show file tree
Hide file tree
Showing 4 changed files with 55 additions and 26 deletions.
4 changes: 3 additions & 1 deletion CHANGELOG.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@
filename.
* Added Mechanize::DirectorySaver which saves responses in a single
directory. Issue #187 by yoshie902a.
* Added Link#noreferrer?.
* Added Mechanize::Page::Link#noreferrer?
* The documentation for Mechanize::Page#search and #at now show that both
XPath and CSS expressions are allowed. Issue #199 by Shane Becker.

* Bug fixes
* Fixed handling of a HEAD request with Accept-Encoding: gzip. Issue #198
Expand Down
41 changes: 23 additions & 18 deletions EXAMPLES.rdoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
= Mechanize examples

Note: Several examples show methods chained to the end of do/end blocks.
Do...end is the same as curly braces ({...}). For example, do ... end.submit
is the same as { ... }.submit.
<code>do...end</code> is the same as curly braces (<code>{...}</code>). For
example, <code>do ... end.submit</code> is the same as <code>{ ...
}.submit</code>.

== Google

Expand Down Expand Up @@ -81,7 +82,8 @@ Upload a file to flickr.
end

== Pluggable Parsers
Lets say you want html pages to automatically be parsed with Rubyful Soup.

Lets say you want HTML pages to automatically be parsed with Rubyful Soup.
This example shows you how:

require 'rubygems'
Expand Down Expand Up @@ -115,10 +117,10 @@ Beautiful Soup for that page.

== The transact method

transact runs the given block and then resets the page history. I.e. after the
block has been executed, you're back at the original page; no need count how
many times to call the back method at the end of a loop (while accounting for
possible exceptions).
Mechanize#transact runs the given block and then resets the page history. I.e.
after the block has been executed, you're back at the original page; no need
count how many times to call the back method at the end of a loop (while
accounting for possible exceptions).

This example also demonstrates subclassing Mechanize.

Expand Down Expand Up @@ -154,17 +156,12 @@ This example also demonstrates subclassing Mechanize.

== Client Certificate Authentication (Mutual Auth)

In most cases a client certificate is created as an additional layer of security
for certain websites. The specific case that this was initially tested on was
for automating the download of archived images from a banks (Wachovia) lockbox
system. Once the certificate is installed into your browser you will have to
export it and split the certificate and private key into separate files.
Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands for
PKCS #12. You can convert them from p12 to pem format by using the following
commands:

openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys
In most cases a client certificate is created as an additional layer of
security for certain websites. The specific case that this was initially
tested on was for automating the download of archived images from a banks
(Wachovia) lockbox system. Once the certificate is installed into your
browser you will have to export it and split the certificate and private key
into separate files.

require 'rubygems'
require 'mechanize'
Expand All @@ -185,3 +182,11 @@ openssl.exe pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys

# submit login form
agent.submit(login_form, login_form.buttons.first)

Exported files are usually in .p12 format (IE 7 & Firefox 2.0) which stands
for PKCS #12. You can convert them from p12 to pem format by using the
following commands:

openssl pkcs12 -in input_file.p12 -clcerts -out example.key -nocerts -nodes
openssl pkcs12 -in input_file.p12 -clcerts -out example.cer -nokeys

15 changes: 10 additions & 5 deletions GUIDE.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ In this section, I want to touch on using the different types in input fields
possible with a form. Password and textarea fields can be treated just like
text input fields. Select fields are very similar to text fields, but they
have many options associated with them. If you select one option, mechanize
will deselect the other options (unless it is a multi select!).
will de-select the other options (unless it is a multi select!).

For example, lets select an option on a list:

Expand All @@ -154,10 +154,15 @@ tell it what file name you want to upload:

== Scraping Data

Mechanize uses nokogiri[http://nokogiri.org/] to parse
html. What does this mean for you? You can treat a mechanize page like
an nokogiri object. After you have used Mechanize to navigate to the page
that you need to scrape, then scrape it using nokogiri methods:
Mechanize uses nokogiri[http://nokogiri.org/] to parse HTML. What does this
mean for you? You can treat a mechanize page like an nokogiri object. After
you have used Mechanize to navigate to the page that you need to scrape, then
scrape it using nokogiri methods:

agent.get('http://someurl.com/').search("p.posted")

The expression given to Mechanize::Page#search may be a CSS expression or an
XPath expression:

agent.get('http://someurl.com/').search(".//p[@class='posted']")

21 changes: 19 additions & 2 deletions lib/mechanize/page.rb
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,26 @@ def content_type
@meta_content_type || response['content-type']
end

# Search through the page like HPricot
##
# :method: search
#
# Search for +paths+ in the page using Nokogiri's #search. The +paths+ can
# be XPath or CSS and an optional Hash of namespaces may be appended.
#
# See Nokogiri::XML::Node#search for further details.

def_delegator :parser, :search, :search
def_delegator :parser, :/, :/

alias / search

##
# :method: at
#
# Search through the page for +path+ under +namespace+ using Nokogiri's #at.
# The +path+ may be either a CSS or XPath expression.
#
# See also Nokogiri::XML::Node#at

def_delegator :parser, :at, :at

##
Expand Down

0 comments on commit 30eb161

Please sign in to comment.