Skip to content

Commit

Permalink
Move backslash unescaping to treeprocessor
Browse files Browse the repository at this point in the history
By unescaping backslash escapes in a treeprocessor, the text is properly
escaped during serialization. Fixes #1131.

As it is recognized that various third-party extensions may be calling the
old class at `postprocessors.UnescapePostprocessor` the old class remains
in the codebase, but has been deprecated and will be removed in a future
release. The new class `treeprocessors.UnescapeTreeprocessor` should be
used instead.
  • Loading branch information
waylan committed Jul 15, 2022
1 parent 77fb7f1 commit c0f6e5a
Show file tree
Hide file tree
Showing 6 changed files with 82 additions and 8 deletions.
10 changes: 9 additions & 1 deletion docs/change_log/release-3.4.md
Expand Up @@ -30,10 +30,18 @@ markdown.markdown(src, extensions=[TableExtension(use_align_attribute=True)])

In addition, tests were moved to the modern test environment.

### Backslash unescaping moved to Treeprocessor (#1131).

Unescaping backslash escapes has been moved to a Treeprocessor. However, it is
recognized that various third-party extensions may be calling the old class at
`postprocessors.UnescapePostprocessor`. Therefore, the old class remains in the
code base, but has been deprecated and will be removed in a future release. The
new class `treeprocessors.UnescapeTreeprocessor` should be used instead.

### Previously deprecated objects have been removed

Various objects were deprecated in version 3.0 and began raising deprecation
warnings (see the [version 3.0 release notes] for details). Any of those object
warnings (see the [version 3.0 release notes] for details). Any of those objects
which remained in version 3.3 have been removed from the code base in version 3.4
and will now raise errors. A summary of the objects are provided below.

Expand Down
10 changes: 5 additions & 5 deletions markdown/extensions/toc.py
Expand Up @@ -16,7 +16,7 @@
from . import Extension
from ..treeprocessors import Treeprocessor
from ..util import code_escape, parseBoolValue, AMP_SUBSTITUTE, HTML_PLACEHOLDER_RE, AtomicString
from ..postprocessors import UnescapePostprocessor
from ..treeprocessors import UnescapeTreeprocessor
import re
import html
import unicodedata
Expand Down Expand Up @@ -84,8 +84,8 @@ def _html_sub(m):

def unescape(text):
""" Unescape escaped text. """
c = UnescapePostprocessor()
return c.run(text)
c = UnescapeTreeprocessor()
return c.unescape(text)


def nest_toc_tokens(toc_list):
Expand Down Expand Up @@ -289,10 +289,10 @@ def run(self, doc):
toc_tokens.append({
'level': int(el.tag[-1]),
'id': el.attrib["id"],
'name': unescape(stashedHTML2text(
'name': stashedHTML2text(
code_escape(el.attrib.get('data-toc-label', text)),
self.md, strip_entities=False
))
)
})

# Remove the data-toc-label attribute as it is no longer needed
Expand Down
5 changes: 4 additions & 1 deletion markdown/postprocessors.py
Expand Up @@ -37,7 +37,6 @@ def build_postprocessors(md, **kwargs):
postprocessors = util.Registry()
postprocessors.register(RawHtmlPostprocessor(md), 'raw_html', 30)
postprocessors.register(AndSubstitutePostprocessor(), 'amp_substitute', 20)
postprocessors.register(UnescapePostprocessor(), 'unescape', 10)
return postprocessors


Expand Down Expand Up @@ -122,6 +121,10 @@ def run(self, text):
return text


@util.deprecated(
"This class will be removed in the future; "
"use 'treeprocessors.UnescapeTreeprocessor' instead."
)
class UnescapePostprocessor(Postprocessor):
""" Restore escaped chars """

Expand Down
27 changes: 27 additions & 0 deletions markdown/treeprocessors.py
Expand Up @@ -19,6 +19,7 @@
License: BSD (see LICENSE.md for details).
"""

import re
import xml.etree.ElementTree as etree
from . import util
from . import inlinepatterns
Expand All @@ -29,6 +30,7 @@ def build_treeprocessors(md, **kwargs):
treeprocessors = util.Registry()
treeprocessors.register(InlineProcessor(md), 'inline', 20)
treeprocessors.register(PrettifyTreeprocessor(md), 'prettify', 10)
treeprocessors.register(UnescapeTreeprocessor(md), 'unescape', 0)
return treeprocessors


Expand Down Expand Up @@ -429,3 +431,28 @@ def run(self, root):
# Only prettify code containing text only
if not len(code) and code.text is not None:
code.text = util.AtomicString(code.text.rstrip() + '\n')


class UnescapeTreeprocessor(Treeprocessor):
""" Restore escaped chars """

RE = re.compile(r'{}(\d+){}'.format(util.STX, util.ETX))

def _unescape(self, m):
return chr(int(m.group(1)))

def unescape(self, text):
return self.RE.sub(self._unescape, text)

def run(self, root):
""" Loop over all elements and unescape all text. """
for elem in root.iter():
# Unescape text content
if elem.text and not elem.tag == 'code':
elem.text = self.unescape(elem.text)
# Unescape tail content
if elem.tail:
elem.tail = self.unescape(elem.tail)
# Unescape attribute values
for key, value in elem.items():
elem.set(key, self.unescape(value))
2 changes: 1 addition & 1 deletion tests/basic/backlash-escapes.html
Expand Up @@ -9,7 +9,7 @@
<p>Right bracket: ]</p>
<p>Left paren: (</p>
<p>Right paren: )</p>
<p>Greater-than: ></p>
<p>Greater-than: &gt;</p>
<p>Hash: #</p>
<p>Period: .</p>
<p>Bang: !</p>
Expand Down
36 changes: 36 additions & 0 deletions tests/test_syntax/extensions/test_smarty.py
@@ -0,0 +1,36 @@
# -*- coding: utf-8 -*-
"""
Python Markdown
A Python implementation of John Gruber's Markdown.
Documentation: https://python-markdown.github.io/
GitHub: https://github.com/Python-Markdown/markdown/
PyPI: https://pypi.org/project/Markdown/
Started by Manfred Stienstra (http://www.dwerg.net/).
Maintained for a few years by Yuri Takhteyev (http://www.freewisdom.org).
Currently maintained by Waylan Limberg (https://github.com/waylan),
Dmitry Shachnev (https://github.com/mitya57) and Isaac Muse (https://github.com/facelessuser).
Copyright 2007-2022 The Python Markdown Project (v. 1.7 and later)
Copyright 2004, 2005, 2006 Yuri Takhteyev (v. 0.2-1.6b)
Copyright 2004 Manfred Stienstra (the original version)
License: BSD (see LICENSE.md for details).
"""

from markdown.test_tools import TestCase


class TestSmarty(TestCase):

default_kwargs = {'extensions': ['smarty']}

def test_escaped_attr(self):
self.assertMarkdownRenders(
'![x\"x](x)',
'<p><img alt="x&quot;x" src="x" /></p>'
)

# TODO: Move rest of smarty tests here.

0 comments on commit c0f6e5a

Please sign in to comment.