New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension Fenced Code Blocks breaks HTML formatting on quotes in attributes #1247
Comments
We need to consider what other implementations do in this scenario. Although, after a quick look, the couple I checked (GitHub and MultiMarkdown) seem to have weird output which doesn't really make any sense (the language is I will also note that while our serializer properly escapes HTML attributes, we bypass that with fenced code blocks as we build the HTML outside of the ElementTree and store it as raw HTML which gets swapped with a placeholder after serialization. If we wanted to make use of ElementTrree here, we would need to modify/subclass the upstream library to include support for a "raw" type to hold the content of the code block. That would be a much larger refactor though, and out-of-scope for an immediate fix here. That said, I have toyed with the idea from time to time. |
I suppose we could escape each of the attribute values here: markdown/markdown/extensions/fenced_code.py Lines 122 to 135 in e11cd25
That would match the behavior of anything which goes through ElementTree without changing the parsing of the fenced code blocks. Presumably, Pygments is already doing similar escaping (should probably confirm that). |
Yes, that seems to be the most practical solution imo. |
I took another look at how other implementations handle this with a simpler input (the curly brackets are not supported by all implementations and were resulting in confusing results).
On the one hand, I am inclined to follow PHP Markdown Extra's lead as the two implementations were originally developed simultaneously as the very first ever functioning fence code code parsers. However, that only addresses this one very specific issue. By escaping the attributes, we also potentially address as-yet undiscovered issues. |
Sorry, in my last comment I just realized I had a typo in the Babelmark input. Lets try again. It seems that CommonMark and Pandoc are escaping the attribute and MultiMarkdown does nothing (passes it through unaltered resulting in a XSS issue). Also of interest is that Python-Markdown actually does match PHP Markdown Extra and fails to recognize it as a code block. Python-Markdown needs the curly braces to ensure it is recognized as a code block in this case. I'm okay with that. So we just need to focus on what to do in the curly bracket case. PHP Markdown Extra also fails to recognize that case as a fenced code block. However, the PHP implementation doesn't support the advanced attribute list features we do (which includes wrapping attribute values in quotes), so I think we can ignore that here. Given all of the above, I think that the escaping all attributes solution is the right way forward. |
The extension fenced code blocks (https://python-markdown.github.io/extensions/fenced_code_blocks/#attributes) breaks the HTML formatting when a language, id or class contains a quotation-mark (").
https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/fenced_code.py#L122-L127
The following snippet
will render as
Even though the users of Python-Markdown are responsible for sanitizing / escaping the end-result, this might lead to some unintended behaviour (as seen in netbox-community/netbox#9292).
The text was updated successfully, but these errors were encountered: