Featured Code Snippet #2
I implemented RedCloth in this iteration, to give me Textile markup for blog entires & comments. RedCloth has the ability to escape HTML tags, which is extremely nice. However, by “escape” it means “remove”. Any tag not in the whitelist is completely removed from the page. This is fairly annoying; by “escape”, I would imagine that I’d get the tag displayed on the page as I typed it in.
To test this I removed blockquote from the list of allowed tags. Much to my surprise, all blockquotes were removed, even those specified using the Textile code bq.! WTF??? Okay, so it turns out that when RedCloth “escapes” the HTML, it doesn’t do so until after the Textile transform has taken place. Therefore, the whitelist goes towards tags input by the user and tags determined by Textile codes. This makes no sense.
So given these two issues, I had some changes to make. After familiarizing myself with the code, I tried my hardest to think of a fix obtained through monkey-patching or subclassing. No such luck. The methods just weren’t written in an overriding-friendly way. Rather than just modify the gem, which would have been very, very bad of me (punishment by torture), I unpacked the gem to /vendor and made the changes there. This way, the changes aren’t systemwide, and anyone will have them when downloading the project (punishment by slap on the wrist). I hope the hardcore Rubyists out there can forgive me for this decision, and if you have a better way to implement these changes, please let me know.
Without any further adieu, here is the before/after code of RedCloth.rb showing my changes.
Change #1:
The to_html method calls clean_html just before returning, if necessary. This was the original code:
text.gsub!( /<\/?notextile>/, '' )
text.gsub!( /&/, '&' )
text.strip!
clean_html text if filter_html
text
end
Unfortunately, all Textile replacements have already been made, therefore that HTML is being cleaned too. This is completely unnecessary, therefore we want clean_html to be called before performing the Textile work, like so:
incoming_entities text
clean_white_space text
clean_html text if filter_html
@pre_list = []
rip_offtags text
no_textile text
hard_break text
unless @lite_mode
refs text
blocks text
end
inline text
smooth_offtags text
retrieve text
text.gsub!( /<\/?notextile>/, '' )
....
Got it?
The second fix allows me to type a disallowed HTML tag in the editor and display it to the user in its full taggy goodness. This change was made to the clean_html method. This method uses a regex to go through each tag; if it exists in the whitelist, then leave it alone, otherwise, replace it with an empty string. Here is the original snippet from clean_html:
end
end if tags[tag]
"<#{raw[1]}#{pcs.join " "}>"
else
" "
end
and here is my change, allowing the tag to be displayed on the page, but still not rendered as HTML:
end
end if tags[tag]
"<#{raw[1]}#{pcs.join " "}>"
else
"<#{raw[1]}#{raw[2]}>"
end
Not a huge deal, but now RedCloth and HTML escaping works like I would expect it to! I’m open for suggestions to better solutions for these problems. But for now, that’s how it will stay.
