Skip to content

Commit

Permalink
Remove non-printable control characters when sanitizing text.
Browse files Browse the repository at this point in the history
  • Loading branch information
bhollis committed Dec 21, 2013
1 parent 7363179 commit 1911554
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 4 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
* Removed extraneous newlines from around the output of fenced code blocks. #112
* Properly handle empty code blocks (``). #108
* No longer print a warning when headers have entities in them. #113
* Remove non-printable control characters when sanitizing text.

0.7.0
-----
Expand Down
4 changes: 2 additions & 2 deletions lib/maruku/html.rb
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def process_markdown_inside_elements(doc)

# Select all text children of e
e.xpath("./text()").each do |original_text|
s = CGI.escapeHTML(original_text.text)
s = MaRuKu::Out::HTML.escapeHTML(original_text.text)
unless s.strip.empty?
parsed = parse_blocks ? doc.parse_text_as_markdown(s) : doc.parse_span(s)

Expand Down Expand Up @@ -197,7 +197,7 @@ def process_markdown_inside_elements(doc)

# Select all text children of e
e.texts.each do |original_text|
s = CGI.escapeHTML(original_text.value)
s = MaRuKu::Out::HTML.escapeHTML(original_text.value)
unless s.strip.empty?
# TODO extract common functionality
parsed = parse_blocks ? doc.parse_text_as_markdown(s) : doc.parse_span(s)
Expand Down
11 changes: 9 additions & 2 deletions lib/maruku/output/to_html.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@
# This module groups all functions related to HTML export.
module MaRuKu::Out::HTML

# Escape text for use in HTML (content or attributes) by running it through
# standard XML escaping (quotes and angle brackets and ampersands) then
# getting rid of any non-printable control characters besides whitespace.
def self.escapeHTML(text)
CGI.escapeHTML(text).gsub(/[^[:print:]\n\r\t]/, '')
end

# A simple class to represent an HTML element for output.
class HTMLElement
attr_accessor :name
Expand Down Expand Up @@ -88,7 +95,7 @@ def to_html_document(context={})

# Helper to create a text node
def xtext(text)
CGI.escapeHTML(text)
MaRuKu::Out::HTML.escapeHTML(text)
end

# Helper to create an element
Expand Down Expand Up @@ -392,7 +399,7 @@ def html_element(name, content="", attributes={})

Array(HTML4Attributes[name]).each do |att|
if v = @attributes[att]
attributes[CGI.escapeHTML(att.to_s)] = CGI.escapeHTML(v.to_s)
attributes[MaRuKu::Out::HTML.escapeHTML(att.to_s)] = MaRuKu::Out::HTML.escapeHTML(v.to_s)
end
end
content = yield if block_given?
Expand Down

0 comments on commit 1911554

Please sign in to comment.