New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bleach'ing input breaks code blocks #392
Comments
Never mind, I just need to do it the other way around and escape/bleach after. |
@divad glad you figured it out. Don't be to embarrassed, you're not the first one I've seen make that mistake. What you need to remember is that even without support for raw HTML, one could inject unsafe code into the HTML just using Markdown markup. Therefore, you want to run a sanitizer on the output of Markdown to ensure everything is safe, not just the raw HTML passed in. This also provides the added benefit of not disabling raw HTML. All users get the full feature-set of Markdown. |
Do you recommend a 'Markdown' sanitizer for use with Python-Markdown, or just bleach? |
I'm not aware of any Markdown sanitizers. |
Thanks. I've used https://github.com/yourcelf/bleach-whitelist/blob/master/bleach_whitelist/bleach_whitelist.py to get me started in creating a list of tags and attributes needed for Python-Markdown. It would be good if you could add a link to https://github.com/yourcelf/bleach-whitelist (its not mine of course) in your documentation, perhaps under a 'Security' heading explaining why Bleach is needed. Many thanks again. |
That very cool. I hadn't seen that before. Thanks for the tip. |
If you use Bleach, or any other HTML-safety mechanism (like Flask's escape function - http://flask.pocoo.org/docs/0.10/api/#flask.escape) on an input string before passing it into Markdown (as suggested in the docs (https://pythonhosted.org/Markdown/release-2.6.html#safe_mode-deprecated) then code blocks don't render correctly.
This is because Bleach replaces characters such as < with
<
but then Python-Markdown tries to escape characters again and so replaces the ampersand with&
so the eventual code block ends up broken and does not render correctly.This is a bug since it breaks the 'code' function of Markdown when combined with the documented approach to using Markdown safely (i.e. with Bleach). There should be a way to tell Markdown not to try to escape HTML special characters within code blocks (i.e. by saying "I've already done that, thanks!").
To re-iterate.
<div>
<div>
(which would work fine inside a HTML<code>
block)<code>&lt;div&gt;</code>
which in a web browser renders as<div>
rather than the expected<div>
The text was updated successfully, but these errors were encountered: