Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codehilite extension double escapes HTML #725

Closed
kkinder opened this issue Oct 7, 2018 · 5 comments
Closed

codehilite extension double escapes HTML #725

kkinder opened this issue Oct 7, 2018 · 5 comments

Comments

@kkinder
Copy link

kkinder commented Oct 7, 2018

Here's an example:

import markdown

example_content = """# Test 
  
    >>> print('hi') 
    hi 

The above is valid MarkDown."""

output = markdown.markdown(example_content, extensions=['markdown.extensions.codehilite'])
print(output)

As you can see, my example file includes a Markdown example with > characters. Unfortunately, with Codehilite enabled, it produces doubly-escaped content. Here's the output of that script:

<h1>Test</h1>
<div class="codehilite"><pre><span></span>&amp;gt;&amp;gt;&amp;gt; print(&#39;hi&#39;) 
hi
</pre></div>


<p>The above is valid MarkDown.</p>

See those &amp;&gt; parts? That means your HTML output includes the > characters escaped twice. It should be just &gt;&gt;&gt;, not &amp;gt;&amp;gt;&amp;gt;.

@facelessuser
Copy link
Collaborator

I'll take a look at this today when I get some time.

@kkinder
Copy link
Author

kkinder commented Oct 7, 2018

Thanks. I created a pull request that fixes it a quick and dirty way. Since the extensions seem to get markup after it's been escaped, this seemed like the only obvious way to resolve the problem short of refactoring how Python-Markdown processes extensions.

@facelessuser
Copy link
Collaborator

I'll have to take a look, because this may unescape intentional escaped, literal syntax.

@facelessuser
Copy link
Collaborator

I don't think this used to be the case. I imagine this was introduced in recent refactoring or pull request. I'll get to the bottom of this though.

@facelessuser
Copy link
Collaborator

facelessuser commented Oct 7, 2018

Yes, I introduced this I think when adjusting escapes in the serializer. We now escape the code when it is intially found by the code block processors. There should be no risk in unescaping literals. The proper approach is probably unescaping the content before processing it. I'll confirm in a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants