URIs can contain invalid characters if they are built by the website …

…using article titles or similar. That does work when clicked in a browser, it does however result in an invalid link when crawled by Anemone.

I added URI.escape to avoid that. Tested on about 20 websites without issues.
lpradovera authored and chriskite committed Apr 23, 2011
1 parent 217a22b commit 2495c1c0c85295dd51d22b591e26cbaebfce08fa
  1. +1 −1 lib/anemone/page.rb
@@ -62,7 +62,7 @@ def links"//a[@href]").each do |a|
u = a['href']
next if u.nil? or u.empty?
- abs = to_absolute(URI(u)) rescue next
+ abs = to_absolute(URI(URI.escape(u))) rescue next
@links << abs if in_domain?(abs)

