Skip to content

Commit

Permalink
FIX: Unescapes hash section with present to account for url-encoded c…
Browse files Browse the repository at this point in the history
…hars

Sections with unreserverd characters will appear url-encoded and need to
be unescaped before using it.

Wikipedia generates 2 different spans in this case in the same page, one
with an id resulting of replacing the % symbols with . and the other with
the decoded version of the string. For example, for /wiki/foo#A%C3%A1A it
will generate:

<span id="A.C3.A1A"></span>
<span id="AáA">AáA</span>

Unescaping the `m_url_hash_name` should work in all cases to target the
proper section span.
  • Loading branch information
jbalsas authored and eviltrout committed Aug 12, 2021
1 parent 745b99e commit d27d7c8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion lib/onebox/engine/wikipedia_onebox.rb
Expand Up @@ -24,7 +24,7 @@ def data
end

unless m_url_hash.nil?
section_header_title = raw.xpath("//span[@id='#{m_url_hash_name}']")
section_header_title = raw.xpath("//span[@id='#{CGI.unescape(m_url_hash_name)}']")

if section_header_title.empty?
paras = raw.search("p") # default get all the paras
Expand Down

0 comments on commit d27d7c8

Please sign in to comment.