Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension Fenced Code Blocks breaks HTML formatting on quotes in attributes #1247

Closed
magicOz opened this issue May 4, 2022 · 5 comments · Fixed by #1248
Closed

Extension Fenced Code Blocks breaks HTML formatting on quotes in attributes #1247

magicOz opened this issue May 4, 2022 · 5 comments · Fixed by #1248
Labels
bug Bug report. confirmed Confirmed bug report or approved feature request. extension Related to one or more of the included extensions.

Comments

@magicOz
Copy link

magicOz commented May 4, 2022

The extension fenced code blocks (https://python-markdown.github.io/extensions/fenced_code_blocks/#attributes) breaks the HTML formatting when a language, id or class contains a quotation-mark (").

https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/fenced_code.py#L122-L127

The following snippet

``` { .">outside}

```

will render as

<pre><code class="language-">outside">
</code></pre>

Even though the users of Python-Markdown are responsible for sanitizing / escaping the end-result, this might lead to some unintended behaviour (as seen in netbox-community/netbox#9292).

@waylan
Copy link
Member

waylan commented May 4, 2022

We need to consider what other implementations do in this scenario. Although, after a quick look, the couple I checked (GitHub and MultiMarkdown) seem to have weird output which doesn't really make any sense (the language is {). My best guess is that the quote is a disallowed character and so it causes the language to not be parsed correctly. The exact output doesn't really matter here, so long as the quote character isn't part of an HTML attribute. That would certainly be the easiest solution, if we were to do anything at all.

I will also note that while our serializer properly escapes HTML attributes, we bypass that with fenced code blocks as we build the HTML outside of the ElementTree and store it as raw HTML which gets swapped with a placeholder after serialization. If we wanted to make use of ElementTrree here, we would need to modify/subclass the upstream library to include support for a "raw" type to hold the content of the code block. That would be a much larger refactor though, and out-of-scope for an immediate fix here. That said, I have toyed with the idea from time to time.

@waylan waylan added extension Related to one or more of the included extensions. more-info-needed More information needs to be provided. needs-decision A decision needs to be made regarding request. labels May 4, 2022
@waylan
Copy link
Member

waylan commented May 4, 2022

I suppose we could escape each of the attribute values here:

if lang:
lang_attr = ' class="{}{}"'.format(self.config.get('lang_prefix', 'language-'), lang)
if classes:
class_attr = ' class="{}"'.format(' '.join(classes))
if id:
id_attr = ' id="{}"'.format(id)
if self.use_attr_list and config and not config.get('use_pygments', False):
# Only assign key/value pairs to code element if attr_list ext is enabled, key/value pairs
# were defined on the code block, and the `use_pygments` key was not set to True. The
# `use_pygments` key could be either set to False or not defined. It is omitted from output.
kv_pairs = ' ' + ' '.join(
'{k}="{v}"'.format(k=k, v=v) for k, v in config.items() if k != 'use_pygments'
)
code = '<pre{id}{cls}><code{lang}{kv}>{code}</code></pre>'.format(

That would match the behavior of anything which goes through ElementTree without changing the parsing of the fenced code blocks. Presumably, Pygments is already doing similar escaping (should probably confirm that).

@magicOz
Copy link
Author

magicOz commented May 4, 2022

I suppose we could escape each of the attribute values here:

Yes, that seems to be the most practical solution imo.

@waylan
Copy link
Member

waylan commented May 4, 2022

I took another look at how other implementations handle this with a simpler input (the curly brackets are not supported by all implementations and were resulting in confusing results).

``` "foo
code
```

Checking Babelmark it appears that no implementations address this at all (we can ignore the implementations which don't support fenced code blocks). Of particular interest is CommonMark which provides a detailed spec. I note that the spec itself makes no mention of any disallowed characters (except for the backtick) within the language name. I also find it interesting that many implementations seem to choke and error on this (although that could be specific to Babelmark, not the implementation itself). I checked PHP Markdown Extra directly (here) and it simply fails to recognize it as a code block. In other words, the quote character is presumably disallowed.

On the one hand, I am inclined to follow PHP Markdown Extra's lead as the two implementations were originally developed simultaneously as the very first ever functioning fence code code parsers. However, that only addresses this one very specific issue. By escaping the attributes, we also potentially address as-yet undiscovered issues.

@waylan
Copy link
Member

waylan commented May 4, 2022

Sorry, in my last comment I just realized I had a typo in the Babelmark input. Lets try again. It seems that CommonMark and Pandoc are escaping the attribute and MultiMarkdown does nothing (passes it through unaltered resulting in a XSS issue). Also of interest is that Python-Markdown actually does match PHP Markdown Extra and fails to recognize it as a code block. Python-Markdown needs the curly braces to ensure it is recognized as a code block in this case. I'm okay with that. So we just need to focus on what to do in the curly bracket case. PHP Markdown Extra also fails to recognize that case as a fenced code block. However, the PHP implementation doesn't support the advanced attribute list features we do (which includes wrapping attribute values in quotes), so I think we can ignore that here.

Given all of the above, I think that the escaping all attributes solution is the right way forward.

@waylan waylan added bug Bug report. confirmed Confirmed bug report or approved feature request. and removed more-info-needed More information needs to be provided. needs-decision A decision needs to be made regarding request. labels May 4, 2022
waylan added a commit to waylan/markdown that referenced this issue May 4, 2022
waylan added a commit that referenced this issue May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report. confirmed Confirmed bug report or approved feature request. extension Related to one or more of the included extensions.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants