Skip to content
Browse files

URIs can contain invalid characters if they are built by the website …

…using article titles or similar. That does work when clicked in a browser, it does however result in an invalid link when crawled by Anemone.

I added URI.escape to avoid that. Tested on about 20 websites without issues.
  • Loading branch information...
1 parent 217a22b commit 2495c1c0c85295dd51d22b591e26cbaebfce08fa @polysics polysics committed with Apr 23, 2011
Showing with 1 addition and 1 deletion.
  1. +1 −1 lib/anemone/page.rb
View
2 lib/anemone/page.rb
@@ -62,7 +62,7 @@ def links
doc.search("//a[@href]").each do |a|
u = a['href']
next if u.nil? or u.empty?
- abs = to_absolute(URI(u)) rescue next
+ abs = to_absolute(URI(URI.escape(u))) rescue next
@links << abs if in_domain?(abs)
end
@links.uniq!

0 comments on commit 2495c1c

Please sign in to comment.
Something went wrong with that request. Please try again.