Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bleach'ing input breaks code blocks #392

Closed
divad opened this issue Mar 5, 2015 · 6 comments
Closed

Bleach'ing input breaks code blocks #392

divad opened this issue Mar 5, 2015 · 6 comments

Comments

@divad
Copy link

divad commented Mar 5, 2015

If you use Bleach, or any other HTML-safety mechanism (like Flask's escape function - http://flask.pocoo.org/docs/0.10/api/#flask.escape) on an input string before passing it into Markdown (as suggested in the docs (https://pythonhosted.org/Markdown/release-2.6.html#safe_mode-deprecated) then code blocks don't render correctly.

This is because Bleach replaces characters such as < with &lt; but then Python-Markdown tries to escape characters again and so replaces the ampersand with &amp; so the eventual code block ends up broken and does not render correctly.

This is a bug since it breaks the 'code' function of Markdown when combined with the documented approach to using Markdown safely (i.e. with Bleach). There should be a way to tell Markdown not to try to escape HTML special characters within code blocks (i.e. by saying "I've already done that, thanks!").

To re-iterate.

  1. The input markdown text is something like <div>
  2. Bleach, or any escaping mechanism, converts this to &lt;div&gt; (which would work fine inside a HTML <code> block)
  3. Python-Markdown converts this to <code>&amp;lt;div&amp;gt;</code> which in a web browser renders as &lt;div&gt; rather than the expected <div>
@divad
Copy link
Author

divad commented Mar 5, 2015

Never mind, I just need to do it the other way around and escape/bleach after.

@divad divad closed this as completed Mar 5, 2015
@waylan
Copy link
Member

waylan commented Mar 6, 2015

@divad glad you figured it out. Don't be to embarrassed, you're not the first one I've seen make that mistake. What you need to remember is that even without support for raw HTML, one could inject unsafe code into the HTML just using Markdown markup. Therefore, you want to run a sanitizer on the output of Markdown to ensure everything is safe, not just the raw HTML passed in. This also provides the added benefit of not disabling raw HTML. All users get the full feature-set of Markdown.

@divad
Copy link
Author

divad commented Mar 6, 2015

Do you recommend a 'Markdown' sanitizer for use with Python-Markdown, or just bleach?

@waylan
Copy link
Member

waylan commented Mar 6, 2015

I'm not aware of any Markdown sanitizers.

@divad
Copy link
Author

divad commented Mar 6, 2015

Thanks. I've used https://github.com/yourcelf/bleach-whitelist/blob/master/bleach_whitelist/bleach_whitelist.py to get me started in creating a list of tags and attributes needed for Python-Markdown. It would be good if you could add a link to https://github.com/yourcelf/bleach-whitelist (its not mine of course) in your documentation, perhaps under a 'Security' heading explaining why Bleach is needed. Many thanks again.

@waylan
Copy link
Member

waylan commented Mar 6, 2015

That very cool. I hadn't seen that before. Thanks for the tip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants