Extension Fenced Code Blocks breaks HTML formatting on quotes in attributes #1247

magicOz · 2022-05-04T10:59:50Z

The extension fenced code blocks (https://python-markdown.github.io/extensions/fenced_code_blocks/#attributes) breaks the HTML formatting when a language, id or class contains a quotation-mark (").

https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/fenced_code.py#L122-L127

The following snippet

``` { .">outside}

```

will render as

<pre><code class="language-">outside">
</code></pre>

Even though the users of Python-Markdown are responsible for sanitizing / escaping the end-result, this might lead to some unintended behaviour (as seen in netbox-community/netbox#9292).

The text was updated successfully, but these errors were encountered:

waylan · 2022-05-04T13:35:55Z

We need to consider what other implementations do in this scenario. Although, after a quick look, the couple I checked (GitHub and MultiMarkdown) seem to have weird output which doesn't really make any sense (the language is {). My best guess is that the quote is a disallowed character and so it causes the language to not be parsed correctly. The exact output doesn't really matter here, so long as the quote character isn't part of an HTML attribute. That would certainly be the easiest solution, if we were to do anything at all.

I will also note that while our serializer properly escapes HTML attributes, we bypass that with fenced code blocks as we build the HTML outside of the ElementTree and store it as raw HTML which gets swapped with a placeholder after serialization. If we wanted to make use of ElementTrree here, we would need to modify/subclass the upstream library to include support for a "raw" type to hold the content of the code block. That would be a much larger refactor though, and out-of-scope for an immediate fix here. That said, I have toyed with the idea from time to time.

waylan · 2022-05-04T13:58:26Z

I suppose we could escape each of the attribute values here:

markdown/markdown/extensions/fenced_code.py

Lines 122 to 135 in e11cd25

    
           if lang: 
        
               lang_attr = ' class="{}{}"'.format(self.config.get('lang_prefix', 'language-'), lang) 
        
           if classes: 
        
               class_attr = ' class="{}"'.format(' '.join(classes)) 
        
           if id: 
        
               id_attr = ' id="{}"'.format(id) 
        
           if self.use_attr_list and config and not config.get('use_pygments', False): 
        
               # Only assign key/value pairs to code element if attr_list ext is enabled, key/value pairs 
        
               # were defined on the code block, and the `use_pygments` key was not set to True. The 
        
               # `use_pygments` key could be either set to False or not defined. It is omitted from output. 
        
               kv_pairs = ' ' + ' '.join( 
        
                   '{k}="{v}"'.format(k=k, v=v) for k, v in config.items() if k != 'use_pygments' 
        
               ) 
        
           code = '<pre{id}{cls}><code{lang}{kv}>{code}</code></pre>'.format(

That would match the behavior of anything which goes through ElementTree without changing the parsing of the fenced code blocks. Presumably, Pygments is already doing similar escaping (should probably confirm that).

magicOz · 2022-05-04T14:25:38Z

I suppose we could escape each of the attribute values here:

Yes, that seems to be the most practical solution imo.

waylan · 2022-05-04T14:49:08Z

I took another look at how other implementations handle this with a simpler input (the curly brackets are not supported by all implementations and were resulting in confusing results).

``` "foo
code
```

Checking Babelmark it appears that no implementations address this at all (we can ignore the implementations which don't support fenced code blocks). Of particular interest is CommonMark which provides a detailed spec. I note that the spec itself makes no mention of any disallowed characters (except for the backtick) within the language name. I also find it interesting that many implementations seem to choke and error on this (although that could be specific to Babelmark, not the implementation itself). I checked PHP Markdown Extra directly (here) and it simply fails to recognize it as a code block. In other words, the quote character is presumably disallowed.

On the one hand, I am inclined to follow PHP Markdown Extra's lead as the two implementations were originally developed simultaneously as the very first ever functioning fence code code parsers. However, that only addresses this one very specific issue. By escaping the attributes, we also potentially address as-yet undiscovered issues.

waylan · 2022-05-04T15:06:29Z

Sorry, in my last comment I just realized I had a typo in the Babelmark input. Lets try again. It seems that CommonMark and Pandoc are escaping the attribute and MultiMarkdown does nothing (passes it through unaltered resulting in a XSS issue). Also of interest is that Python-Markdown actually does match PHP Markdown Extra and fails to recognize it as a code block. Python-Markdown needs the curly braces to ensure it is recognized as a code block in this case. I'm okay with that. So we just need to focus on what to do in the curly bracket case. PHP Markdown Extra also fails to recognize that case as a fenced code block. However, the PHP implementation doesn't support the advanced attribute list features we do (which includes wrapping attribute values in quotes), so I think we can ignore that here.

Given all of the above, I think that the escaping all attributes solution is the right way forward.

Fixes Python-Markdown#1247.

Fixes #1247.

magicOz mentioned this issue May 4, 2022

XSS in markdown filter netbox-community/netbox#9292

Closed

waylan added extension Related to one or more of the included extensions. more-info-needed More information needs to be provided. needs-decision A decision needs to be made regarding request. labels May 4, 2022

waylan added bug Bug report. confirmed Confirmed bug report or approved feature request. and removed more-info-needed More information needs to be provided. needs-decision A decision needs to be made regarding request. labels May 4, 2022

waylan added a commit to waylan/markdown that referenced this issue May 4, 2022

Ensure fenced code attributes are properly escaped.

3ce0ae7

Fixes Python-Markdown#1247.

waylan mentioned this issue May 4, 2022

Ensure fenced code attributes are properly escaped. #1248

Merged

waylan closed this as completed in #1248 May 4, 2022

waylan added a commit that referenced this issue May 4, 2022

Ensure fenced code attributes are properly escaped.

ce73b27

Fixes #1247.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extension Fenced Code Blocks breaks HTML formatting on quotes in attributes #1247

Extension Fenced Code Blocks breaks HTML formatting on quotes in attributes #1247

magicOz commented May 4, 2022

waylan commented May 4, 2022

waylan commented May 4, 2022 •

edited

magicOz commented May 4, 2022

waylan commented May 4, 2022 •

edited

waylan commented May 4, 2022 •

edited

Extension Fenced Code Blocks breaks HTML formatting on quotes in attributes #1247

Extension Fenced Code Blocks breaks HTML formatting on quotes in attributes #1247

Comments

magicOz commented May 4, 2022

waylan commented May 4, 2022

waylan commented May 4, 2022 • edited

magicOz commented May 4, 2022

waylan commented May 4, 2022 • edited

waylan commented May 4, 2022 • edited

waylan commented May 4, 2022 •

edited

waylan commented May 4, 2022 •

edited

waylan commented May 4, 2022 •

edited