Skip to content

Commit

Permalink
Sanitize plaintext with Loofah gem.
Browse files Browse the repository at this point in the history
This is necessary because Loofah, unlike Sanitize, can be told not to escape HTML entities in the sanitized text. This is necessary to sanitize URLs entered by the user, otherwise & characters get converted to & and the URL is broken.

However when told not to HTML-escape entities, Loofah also unescapes any escaped entities that were present in the input text. This means that a malicious user can still e.g. inject scripts by HTML-encoding the special < and > chars. To avoid this type of attacks, we sanitize the text with Loofah several times, until either we lose our patience (at which point we return an empty string, admitting we cannot sanitize the input text) or the text doesn't change after sanitizing it with Loofah. Once sanitization doesn't change the text, it is safe to return.
  • Loading branch information
amatriain committed Nov 25, 2016
1 parent 6e9d825 commit eefdc22
Showing 1 changed file with 31 additions and 2 deletions.
33 changes: 31 additions & 2 deletions lib/sanitizer.rb
@@ -1,3 +1,5 @@
require 'loofah'

##
# Class with methods related to sanitizing user input to remove potentially malicious input.

Expand All @@ -14,8 +16,35 @@ class Sanitizer
def self.sanitize_plaintext(unsanitized_text)
# Check that the passed string contains something
return '' if unsanitized_text.blank?
sanitized_text = ActionController::Base.helpers.strip_tags(unsanitized_text)&.strip
return sanitized_text
sanitized_text = Loofah.scrub_fragment(unsanitized_text, :prune).text(encode_special_chars: false)&.strip

# Passing encode_special_chars: false to text(), which is necessary so that e.g. & characters in URL are not
# HTML-escaped during sanitization, unfortunately means that any encoded HTML entities in the unsanitized text
# become unencoded; e.g. if the user enters:
#
# &lt;script&gt;alert("pwnd")&lt;/script&gt;http://feedbunch.com
#
# then Loofah at this point returns:
#
# <script>alert("pwnd")</script>http://feedbunch.com
#
# which obviously is not safe. Also a malicious user could HTML-encode the & characters so that the malicious script
# would be dangerous after a second Loofah pass, and so on.
#
# To make sure that the string is safe no matter how many levels of HTML-encoding an attacker introduces, I'm
# using the fact that a safe string is the same before and after passing through Loofah. If after three passes the
# string keeps changing every time it is scrubbed with Loofah, stop playing cat and mouse and just return an
# empty string.
re_sanitized_text = Loofah.scrub_fragment(sanitized_text, :prune).text(encode_special_chars: false)&.strip
passes = 0
while sanitized_text!=re_sanitized_text do
passes += 1
return '' if passes >= 3
sanitized_text = re_sanitized_text
re_sanitized_text = Loofah.scrub_fragment(sanitized_text, :prune).text(encode_special_chars: false)&.strip
end

return re_sanitized_text
end

##
Expand Down

0 comments on commit eefdc22

Please sign in to comment.