diff --git a/History.rdoc b/History.md
similarity index 91%
rename from History.rdoc
rename to History.md
index 22bd8f84..935669e1 100644
--- a/History.rdoc
+++ b/History.md
@@ -1,4 +1,4 @@
-=== 0.2.2 / 2010-01-06
+### 0.2.2 / 2010-01-06
* Require Web Spider Obstacle Course (WSOC) >= 0.1.1.
* Integrated the new WSOC into the specs.
@@ -15,7 +15,7 @@
* Renamed Spidr::Agent#get_session to {Spidr::SessionCache#[]}.
* Renamed Spidr::Agent#kill_session to {Spidr::SessionCache#kill!}.
-=== 0.2.1 / 2009-11-25
+### 0.2.1 / 2009-11-25
* Added {Spidr::Events#every_ok_page}.
* Added {Spidr::Events#every_redirect_page}.
@@ -44,9 +44,9 @@
* Added {Spidr::Events#every_zip_page}.
* Fixed a bug where {Spidr::Agent#delay} was not being used to delay
requesting pages.
-* Spider +link+ and +script+ tags in HTML pages (thanks Nick Plante).
+* Spider `link` and `script` tags in HTML pages (thanks Nick Plante).
-=== 0.2.0 / 2009-10-10
+### 0.2.0 / 2009-10-10
* Added {URI.expand_path}.
* Added {Spidr::Page#search}.
@@ -91,7 +91,7 @@
* Made {Spidr::Agent#visit_page} public.
* Moved to YARD based documentation.
-=== 0.1.9 / 2009-06-13
+### 0.1.9 / 2009-06-13
* Upgraded to Hoe 2.0.0.
* Use Hoe.spec instead of Hoe.new.
@@ -108,7 +108,7 @@
could not be loaded.
* Removed Spidr::Agent::SCHEMES.
-=== 0.1.8 / 2009-05-27
+### 0.1.8 / 2009-05-27
* Added the Spidr::Agent#pause! and Spidr::Agent#continue! methods.
* Added the Spidr::Agent#running? and Spidr::Agent#paused? methods.
@@ -121,15 +121,15 @@
* Made {Spidr::Agent#enqueue} and {Spidr::Agent#queued?} public.
* Added more specs.
-=== 0.1.7 / 2009-04-24
+### 0.1.7 / 2009-04-24
* Added Spidr::Agent#all_headers.
-* Fixed a bug where Page#headers was always +nil+.
+* Fixed a bug where Page#headers was always `nil`.
* {Spidr::Spidr::Agent} will now follow the Location header in HTTP 300,
301, 302, 303 and 307 Redirects.
* {Spidr::Agent} will now follow iframe and frame tags.
-=== 0.1.6 / 2009-04-14
+### 0.1.6 / 2009-04-14
* Added {Spidr::Agent#failures}, a list of URLs which could not be visited.
* Added {Spidr::Agent#failed?}.
@@ -143,27 +143,27 @@
* Updated the Web Spider Obstacle Course with links that always fail to be
visited.
-=== 0.1.5 / 2009-03-22
+### 0.1.5 / 2009-03-22
-* Catch malformed URIs in {Spidr::Page#to_absolute} and return +nil+.
-* Filter out +nil+ URIs in {Spidr::Page#urls}.
+* Catch malformed URIs in {Spidr::Page#to_absolute} and return `nil`.
+* Filter out `nil` URIs in {Spidr::Page#urls}.
-=== 0.1.4 / 2009-01-15
+### 0.1.4 / 2009-01-15
* Use Nokogiri for HTML and XML parsing.
-=== 0.1.3 / 2009-01-10
+### 0.1.3 / 2009-01-10
* Added the :host options to {Spidr::Agent#initialize}.
* Added the Web Spider Obstacle Course files to the Manifest.
* Aliased {Spidr::Agent#visited_urls} to {Spidr::Agent#history}.
-=== 0.1.2 / 2008-11-06
+### 0.1.2 / 2008-11-06
* Fixed a bug in {Spidr::Page#to_absolute} where URLs with no path were not
- receiving a default path of /.
+ receiving a default path of `/`.
* Fixed a bug in {Spidr::Page#to_absolute} where URL paths were not being
- expanded, in order to remove .. and . directories.
+ expanded, in order to remove `..` and `.` directories.
* Fixed a bug where absolute URLs could have a blank path, thus causing
{Spidr::Agent#get_page} to crash when it performed the HTTP request.
* Added RSpec spec tests.
@@ -171,12 +171,12 @@
(http://spidr.rubyforge.org/course/start.html) which is used in the spec
tests.
-=== 0.1.1 / 2008-10-04
+### 0.1.1 / 2008-10-04
* Added a reader method for the response instance variable in Page.
* Fixed a bug in {Spidr::Page#method_missing}.
-=== 0.1.0 / 2008-05-23
+### 0.1.0 / 2008-05-23
* Initial release.
* Black-list or white-list URLs based upon:
diff --git a/README.rdoc b/README.md
similarity index 80%
rename from README.rdoc
rename to README.md
index fe78ca1f..9ccb3ca2 100644
--- a/README.rdoc
+++ b/README.md
@@ -1,18 +1,18 @@
-= Spidr
+# Spidr
-* http://spidr.rubyforge.org
-* http://github.com/postmodern/spidr
-* http://github.com/postmodern/spidr/issues
-* http://groups.google.com/group/spidr
+* [spidr.rubyforge.org](http://spidr.rubyforge.org/)
+* [github.com/postmodern/spidr](http://github.com/postmodern/spidr)
+* [github.com/postmodern/spidr/issues](http://github.com/postmodern/spidr/issues)
+* [groups.google.com/group/spidr](http://groups.google.com/group/spidr)
* irc.freenode.net #spidr
-== DESCRIPTION:
+## DESCRIPTION:
Spidr is a versatile Ruby web spidering library that can spider a site,
multiple domains, certain links or infinitely. Spidr is designed to be fast
and easy to use.
-== FEATURES:
+## FEATURES:
* Follows:
* a tags.
@@ -41,21 +41,21 @@ and easy to use.
* Custom proxy settings.
* HTTPS support.
-== EXAMPLES:
+## EXAMPLES:
-* Start spidering from a URL:
+Start spidering from a URL:
Spidr.start_at('http://tenderlovemaking.com/')
-* Spider a host:
+Spider a host:
Spidr.host('coderrr.wordpress.com')
-* Spider a site:
+Spider a site:
Spidr.site('http://rubyflow.com/')
-* Spider multiple hosts:
+Spider multiple hosts:
Spidr.start_at(
'http://company.com/',
@@ -65,30 +65,30 @@ and easy to use.
]
)
-* Do not spider certain links:
+Do not spider certain links:
Spidr.site('http://matasano.com/', :ignore_links => [/log/])
-* Do not spider links on certain ports:
+Do not spider links on certain ports:
Spidr.site(
'http://sketchy.content.com/',
:ignore_ports => [8000, 8010, 8080]
)
-* Print out visited URLs:
+Print out visited URLs:
Spidr.site('http://rubyinside.org/') do |spider|
spider.every_url { |url| puts url }
end
-* Print out the URLs that could not be requested:
+Print out the URLs that could not be requested:
Spidr.site('http://sketchy.content.com/') do |spider|
spider.every_failed_url { |url| puts url }
end
-* Search HTML and XML pages:
+Search HTML and XML pages:
Spidr.site('http://company.withablog.com/') do |spider|
spider.every_page do |page|
@@ -99,11 +99,11 @@ and easy to use.
value = meta.attributes['content']
puts " #{name} = #{value}"
- end
+ end
end
end
-* Print out the titles from every page:
+Print out the titles from every page:
Spidr.site('http://www.rubypulse.com/') do |spider|
spider.every_html_page do |page|
@@ -111,7 +111,7 @@ and easy to use.
end
end
-* Find what kinds of web servers a host is using, by accessing the headers:
+Find what kinds of web servers a host is using, by accessing the headers:
servers = Set[]
@@ -121,7 +121,7 @@ and easy to use.
end
end
-* Pause the spider on a forbidden page:
+Pause the spider on a forbidden page:
spider = Spidr.host('overnight.startup.com') do |spider|
spider.every_forbidden_page do |page|
@@ -129,7 +129,7 @@ and easy to use.
end
end
-* Skip the processing of a page:
+Skip the processing of a page:
Spidr.host('sketchy.content.com') do |spider|
spider.every_missing_page do |page|
@@ -137,7 +137,7 @@ and easy to use.
end
end
-* Skip the processing of links:
+Skip the processing of links:
Spidr.host('sketchy.content.com') do |spider|
spider.every_url do |url|
@@ -147,15 +147,15 @@ and easy to use.
end
end
-== REQUIREMENTS:
+## REQUIREMENTS:
-* {nokogiri}[http://nokogiri.rubyforge.org/] >= 1.2.0
+* [nokogiri](http://nokogiri.rubyforge.org/) >= 1.2.0
-== INSTALL:
+## INSTALL:
- $ sudo gem install spidr
+ $ sudo gem install spidr
-== LICENSE:
+## LICENSE:
The MIT License
diff --git a/Rakefile b/Rakefile
index b71f2185..37632686 100644
--- a/Rakefile
+++ b/Rakefile
@@ -11,7 +11,7 @@ Hoe.spec('spidr') do
self.rspec_options += ['--colour', '--format', 'specdoc']
- self.yard_options += ['--protected']
+ self.yard_options += ['--markup', 'markdown', '--protected']
self.remote_yard_dir = 'docs'
self.extra_deps = [
diff --git a/lib/spidr/agent.rb b/lib/spidr/agent.rb
index 05201d68..c201c21e 100644
--- a/lib/spidr/agent.rb
+++ b/lib/spidr/agent.rb
@@ -492,7 +492,7 @@ def enqueue(url)
# The page for the response.
#
# @return [Page, nil]
- # The page for the response, or +nil+ if the request failed.
+ # The page for the response, or `nil` if the request failed.
#
def get_page(url,&block)
url = URI(url.to_s)
@@ -525,7 +525,7 @@ def get_page(url,&block)
# The page for the response.
#
# @return [Page, nil]
- # The page for the response, or +nil+ if the request failed.
+ # The page for the response, or `nil` if the request failed.
#
# @since 0.2.2
#
@@ -557,7 +557,7 @@ def post_page(url,post_data='',&block)
# The page which was visited.
#
# @return [Page, nil]
- # The page that was visited. If +nil+ is returned, either the request
+ # The page that was visited. If `nil` is returned, either the request
# for the page failed, or the page was skipped.
#
def visit_page(url,&block)
@@ -585,8 +585,8 @@ def visit_page(url,&block)
# Converts the agent into a Hash.
#
# @return [Hash]
- # The agent represented as a Hash containing the +history+ and
- # the +queue+ of the agent.
+ # The agent represented as a Hash containing the `history` and
+ # the `queue` of the agent.
#
def to_hash
{:history => @history, :queue => @queue}
diff --git a/lib/spidr/auth_store.rb b/lib/spidr/auth_store.rb
index 7aa09df5..868aa531 100644
--- a/lib/spidr/auth_store.rb
+++ b/lib/spidr/auth_store.rb
@@ -24,7 +24,7 @@ def initialize
#
# @return [AuthCredential, nil]
# Closest matching {AuthCredential} values for the URL,
- # or +nil+ if nothing matches.
+ # or `nil` if nothing matches.
#
# @since 0.2.2
#
@@ -102,13 +102,13 @@ def add(url,username,password)
#
# Returns the base64 encoded authorization string for the URL
- # or +nil+ if no authorization exists.
+ # or `nil` if no authorization exists.
#
# @param [URI] url
# The url.
#
# @return [String, nil]
- # The base64 encoded authorizatio string or +nil+.
+ # The base64 encoded authorizatio string or `nil`.
#
# @since 0.2.2
#
diff --git a/lib/spidr/cookie_jar.rb b/lib/spidr/cookie_jar.rb
index 2994e8b1..2eb59190 100644
--- a/lib/spidr/cookie_jar.rb
+++ b/lib/spidr/cookie_jar.rb
@@ -47,7 +47,7 @@ def each(&block)
# Host or domain name for cookies.
#
# @return [String, nil]
- # The cookie values or +nil+ if the host does not have a cookie in the
+ # The cookie values or `nil` if the host does not have a cookie in the
# jar.
#
# @since 0.2.2
diff --git a/lib/spidr/filters.rb b/lib/spidr/filters.rb
index 5962e7c6..59ea47d1 100644
--- a/lib/spidr/filters.rb
+++ b/lib/spidr/filters.rb
@@ -17,7 +17,7 @@ def self.included(base)
#
# @option options [Array] :schemes (['http', 'https'])
# The list of acceptable URI schemes to visit.
- # The +https+ scheme will be ignored if +net/https+ cannot be loaded.
+ # The `https` scheme will be ignored if `net/https` cannot be loaded.
#
# @option options [String] :host
# The host-name to visit.
diff --git a/lib/spidr/page.rb b/lib/spidr/page.rb
index 8af257cb..42cb57b9 100644
--- a/lib/spidr/page.rb
+++ b/lib/spidr/page.rb
@@ -46,10 +46,10 @@ def code
end
#
- # Determines if the response code is +200+.
+ # Determines if the response code is `200`.
#
# @return [Boolean]
- # Specifies whether the response code is +200+.
+ # Specifies whether the response code is `200`.
#
def is_ok?
code == 200
@@ -58,10 +58,10 @@ def is_ok?
alias ok? is_ok?
#
- # Determines if the response code is +301+ or +307+.
+ # Determines if the response code is `301` or `307`.
#
# @return [Boolean]
- # Specifies whether the response code is +301+ or +307+.
+ # Specifies whether the response code is `301` or `307`.
#
def is_redirect?
(code == 301 || code == 307)
@@ -70,30 +70,30 @@ def is_redirect?
alias redirect? is_redirect?
#
- # Determines if the response code is +308+.
+ # Determines if the response code is `308`.
#
# @return [Boolean]
- # Specifies whether the response code is +308+.
+ # Specifies whether the response code is `308`.
#
def timedout?
code == 308
end
#
- # Determines if the response code is +400+.
+ # Determines if the response code is `400`.
#
# @return [Boolean]
- # Specifies whether the response code is +400+.
+ # Specifies whether the response code is `400`.
#
def bad_request?
code == 400
end
#
- # Determines if the response code is +401+.
+ # Determines if the response code is `401`.
#
# @return [Boolean]
- # Specifies whether the response code is +401+.
+ # Specifies whether the response code is `401`.
#
def is_unauthorized?
code == 401
@@ -102,10 +102,10 @@ def is_unauthorized?
alias unauthorized? is_unauthorized?
#
- # Determines if the response code is +403+.
+ # Determines if the response code is `403`.
#
# @return [Boolean]
- # Specifies whether the response code is +403+.
+ # Specifies whether the response code is `403`.
#
def is_forbidden?
code == 403
@@ -114,10 +114,10 @@ def is_forbidden?
alias forbidden? is_forbidden?
#
- # Determines if the response code is +404+.
+ # Determines if the response code is `404`.
#
# @return [Boolean]
- # Specifies whether the response code is +404+.
+ # Specifies whether the response code is `404`.
#
def is_missing?
code == 404
@@ -126,10 +126,10 @@ def is_missing?
alias missing? is_missing?
#
- # Determines if the response code is +500+.
+ # Determines if the response code is `500`.
#
# @return [Boolean]
- # Specifies whether the response code is +500+.
+ # Specifies whether the response code is `500`.
#
def had_internal_server_error?
code == 500
@@ -334,7 +334,7 @@ def body
#
# @return [Nokogiri::HTML::Document, Nokogiri::XML::Document, nil]
# The document that represents HTML or XML pages.
- # Returns +nil+ if the page is neither HTML, XML, RSS, Atom or if
+ # Returns `nil` if the page is neither HTML, XML, RSS, Atom or if
# the page could not be parsed properly.
#
# @see http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html
@@ -382,7 +382,7 @@ def search(*paths)
# Searches for the first occurrence an XPath or CSS Path expression.
#
# @return [Nokogiri::HTML::Node, Nokogiri::XML::Node, nil]
- # The first matched node. Returns +nil+ if no nodes could be matched,
+ # The first matched node. Returns `nil` if no nodes could be matched,
# or if the page is not a HTML or XML document.
#
# @example
@@ -418,7 +418,7 @@ def title
#
# @return [Array]
# All links within the HTML page, frame/iframe source URLs and any
- # links in the +Location+ header.
+ # links in the `Location` header.
#
def links
urls = []
@@ -504,7 +504,7 @@ def to_absolute(link)
protected
#
- # Provides transparent access to the values in +headers+.
+ # Provides transparent access to the values in `headers`.
#
def method_missing(sym,*args,&block)
if (args.empty? && block.nil?)