Skip to content

Commit

Permalink
DEV: Replace custom uri_encode logic with Addressable (#420)
Browse files Browse the repository at this point in the history
Since 8ff3fff introduced the addressable gem as a runtime dependency, it's now possible to replace all custom URI encoding logic with addressable's methods.

It does handle some characters differently - e.g. it now doesn't escape URLs in URI fragments (see the added test). Previous escaping behavior was reported as an issue in Discourse.
  • Loading branch information
CvX committed Dec 30, 2019
1 parent 0bf1ccd commit de1ba8c
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 42 deletions.
51 changes: 9 additions & 42 deletions lib/onebox/helpers.rb
Expand Up @@ -190,54 +190,21 @@ def self.get_absolute_image_url(src, url)
src
end

RFC_3986_URI_REGEX ||= /^(?<scheme>([^:\/?#]+):)?(?<authority>\/\/([^\/?#]*))?(?<path>[^?#]*)(\?(?<query>[^#]*))?(#(?<fragment>.*))?$/
DOUBLE_ESCAPED_REGEXP ||= /%25([0-9a-f]{2})/i

# Percent-encodes a URI query parameter per RFC3986 - https://tools.ietf.org/html/rfc3986
def self.uri_query_encode(query_string)
return "" unless query_string

# query can encode space to %20 OR +
# + MUST be encoded as %2B
# in RFC3968 both query and fragment are defined as:
# = *( pchar / "/" / "?" )
# CGI.escape turns space into + which is the most backward compatible
# however it doesn't roundtrip through URI.unescape which prefers %20
CGI.escape(query_string).gsub('%25', '%').gsub('+', '%20')
end

# Percent-encodes a URI string per RFC3986 - https://tools.ietf.org/html/rfc3986
def self.uri_encode(url)
return "" unless url

# parse uri into named matches, then reassemble properly encoded
parts = url.match(RFC_3986_URI_REGEX)

encoded = ""
encoded += parts[:scheme] unless parts[:scheme].nil?
encoded += parts[:authority] unless parts[:authority].nil?

# path requires space to be encoded as %20 (NEVER +)
# + should be left unencoded
encoded += Addressable::URI.encode(parts[:path]) unless parts[:path].nil?
encoded.gsub!(DOUBLE_ESCAPED_REGEXP, '%\1')

# each query parameter
if !parts[:query].nil?
query_string = parts[:query].split('&').map do |pair|
# can optionally be separated by an =
pair.split('=').map do |v|
uri_query_encode(v)
end.join('=')
end.join('&')
encoded += '?' + query_string
end
uri = Addressable::URI.parse(url)

unless parts[:fragment].nil?
encoded += '#' + uri_query_encode(parts[:fragment])&.gsub('%21%2F', '!/')
end
encoded_uri = Addressable::URI.new(
scheme: Addressable::URI.encode_component(uri.scheme, Addressable::URI::CharacterClasses::SCHEME),
authority: Addressable::URI.encode_component(uri.authority, Addressable::URI::CharacterClasses::AUTHORITY),
path: Addressable::URI.encode_component(uri.path, Addressable::URI::CharacterClasses::PATH + "\\%"),
query: Addressable::URI.encode_component(uri.query, "a-zA-Z0-9\\-\\.\\_\\~\\$\\&\\*\\,\\=\\:\\@\\?\\%"),
fragment: Addressable::URI.encode_component(uri.fragment, "a-zA-Z0-9\\-\\.\\_\\~\\!\\$\\&\\'\\(\\)\\*\\+\\,\\;\\=\\:\\/\\?\\%")
)

encoded
encoded_uri.to_s
end

def self.uri_unencode(url)
Expand Down
1 change: 1 addition & 0 deletions spec/lib/onebox/helpers_spec.rb
Expand Up @@ -93,5 +93,6 @@

it { expect(described_class.uri_encode("https://example.com/random%2Bpath?q=random%2Bquery")).to eq("https://example.com/random%2Bpath?q=random%2Bquery") }
it { expect(described_class.uri_encode("https://glitch.com/edit/#!/equinox-watch")).to eq("https://glitch.com/edit/#!/equinox-watch") }
it { expect(described_class.uri_encode("https://gitpod.io/#https://github.com/eclipse-theia/theia")).to eq("https://gitpod.io/#https://github.com/eclipse-theia/theia") }
end
end

0 comments on commit de1ba8c

Please sign in to comment.