Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion docs/change_log/release-3.4.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,18 @@ markdown.markdown(src, extensions=[TableExtension(use_align_attribute=True)])

In addition, tests were moved to the modern test environment.

### Backslash unescaping moved to Treeprocessor (#1131).

Unescaping backslash escapes has been moved to a Treeprocessor. However, it is
recognized that various third-party extensions may be calling the old class at
`postprocessors.UnescapePostprocessor`. Therefore, the old class remains in the
code base, but has been deprecated and will be removed in a future release. The
new class `treeprocessors.UnescapeTreeprocessor` should be used instead.

### Previously deprecated objects have been removed

Various objects were deprecated in version 3.0 and began raising deprecation
warnings (see the [version 3.0 release notes] for details). Any of those object
warnings (see the [version 3.0 release notes] for details). Any of those objects
which remained in version 3.3 have been removed from the code base in version 3.4
and will now raise errors. A summary of the objects are provided below.

Expand Down
10 changes: 5 additions & 5 deletions markdown/extensions/toc.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from . import Extension
from ..treeprocessors import Treeprocessor
from ..util import code_escape, parseBoolValue, AMP_SUBSTITUTE, HTML_PLACEHOLDER_RE, AtomicString
from ..postprocessors import UnescapePostprocessor
from ..treeprocessors import UnescapeTreeprocessor
import re
import html
import unicodedata
Expand Down Expand Up @@ -84,8 +84,8 @@ def _html_sub(m):

def unescape(text):
""" Unescape escaped text. """
c = UnescapePostprocessor()
return c.run(text)
c = UnescapeTreeprocessor()
return c.unescape(text)


def nest_toc_tokens(toc_list):
Expand Down Expand Up @@ -289,10 +289,10 @@ def run(self, doc):
toc_tokens.append({
'level': int(el.tag[-1]),
'id': el.attrib["id"],
'name': unescape(stashedHTML2text(
'name': stashedHTML2text(
code_escape(el.attrib.get('data-toc-label', text)),
self.md, strip_entities=False
))
)
})

# Remove the data-toc-label attribute as it is no longer needed
Expand Down
5 changes: 4 additions & 1 deletion markdown/postprocessors.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ def build_postprocessors(md, **kwargs):
postprocessors = util.Registry()
postprocessors.register(RawHtmlPostprocessor(md), 'raw_html', 30)
postprocessors.register(AndSubstitutePostprocessor(), 'amp_substitute', 20)
postprocessors.register(UnescapePostprocessor(), 'unescape', 10)
return postprocessors


Expand Down Expand Up @@ -122,6 +121,10 @@ def run(self, text):
return text


@util.deprecated(
"This class will be removed in the future; "
"use 'treeprocessors.UnescapeTreeprocessor' instead."
)
class UnescapePostprocessor(Postprocessor):
""" Restore escaped chars """

Expand Down
27 changes: 27 additions & 0 deletions markdown/treeprocessors.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
License: BSD (see LICENSE.md for details).
"""

import re
import xml.etree.ElementTree as etree
from . import util
from . import inlinepatterns
Expand All @@ -29,6 +30,7 @@ def build_treeprocessors(md, **kwargs):
treeprocessors = util.Registry()
treeprocessors.register(InlineProcessor(md), 'inline', 20)
treeprocessors.register(PrettifyTreeprocessor(md), 'prettify', 10)
treeprocessors.register(UnescapeTreeprocessor(md), 'unescape', 0)
return treeprocessors


Expand Down Expand Up @@ -429,3 +431,28 @@ def run(self, root):
# Only prettify code containing text only
if not len(code) and code.text is not None:
code.text = util.AtomicString(code.text.rstrip() + '\n')


class UnescapeTreeprocessor(Treeprocessor):
""" Restore escaped chars """

RE = re.compile(r'{}(\d+){}'.format(util.STX, util.ETX))

def _unescape(self, m):
return chr(int(m.group(1)))

def unescape(self, text):
return self.RE.sub(self._unescape, text)

def run(self, root):
""" Loop over all elements and unescape all text. """
for elem in root.iter():
# Unescape text content
if elem.text and not elem.tag == 'code':
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we actually need to skip code tags, In fact, if I remove the check, the tests all pass. In fact, the previous code did not have a way to distinguish between code and other content. However, there is always a possibility that code could intentionally contain what looks like a placeholder. In that case, the content should not be altered. Therefore, I have left the check in.

elem.text = self.unescape(elem.text)
# Unescape tail content
if elem.tail:
elem.tail = self.unescape(elem.tail)
# Unescape attribute values
for key, value in elem.items():
elem.set(key, self.unescape(value))
2 changes: 1 addition & 1 deletion tests/basic/backlash-escapes.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
<p>Right bracket: ]</p>
<p>Left paren: (</p>
<p>Right paren: )</p>
<p>Greater-than: ></p>
<p>Greater-than: &gt;</p>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one and only change in behavior in the existing tests. I'm okay with this, however, as technically this results in valid output. The reason for the change is that the angle bracket gets escaped during serialization. Previously, a placeholder was there during serialization, which was swapped out for the actual character later. The whole point of this change was to better ensure valid HTML output, so this is an acceptable change in behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having unescaped > in HTML was a bug, so good that you fixed it.

<p>Hash: #</p>
<p>Period: .</p>
<p>Bang: !</p>
Expand Down
36 changes: 36 additions & 0 deletions tests/test_syntax/extensions/test_smarty.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# -*- coding: utf-8 -*-
"""
Python Markdown

A Python implementation of John Gruber's Markdown.

Documentation: https://python-markdown.github.io/
GitHub: https://github.com/Python-Markdown/markdown/
PyPI: https://pypi.org/project/Markdown/

Started by Manfred Stienstra (http://www.dwerg.net/).
Maintained for a few years by Yuri Takhteyev (http://www.freewisdom.org).
Currently maintained by Waylan Limberg (https://github.com/waylan),
Dmitry Shachnev (https://github.com/mitya57) and Isaac Muse (https://github.com/facelessuser).

Copyright 2007-2022 The Python Markdown Project (v. 1.7 and later)
Copyright 2004, 2005, 2006 Yuri Takhteyev (v. 0.2-1.6b)
Copyright 2004 Manfred Stienstra (the original version)

License: BSD (see LICENSE.md for details).
"""

from markdown.test_tools import TestCase


class TestSmarty(TestCase):

default_kwargs = {'extensions': ['smarty']}

def test_escaped_attr(self):
self.assertMarkdownRenders(
'![x\"x](x)',
'<p><img alt="x&quot;x" src="x" /></p>'
)

# TODO: Move rest of smarty tests here.