Skip to content

Conversation

@waylan
Copy link
Member

@waylan waylan commented Jul 13, 2022

By unescaping backslash escapes in a treeprocessor, the text is properly
escaped during serialization. Fixes #1131.

As it is recognized that varous third-party extensions may be calling the
old class at postprocessors.UnescapePostprocessor the old class remains
in the codebase, but has been deprecated and will be removed in a future
release. The new class treeprocessors.UnescapeTreeprocessor should be
used instead.

By unescaping backslash escapes in a treeprocessor, the text is properly
escaped during serialization. Fixes Python-Markdown#1131.

As it is recognized that varous third-party extensions may be calling the
old class at `postprocessors.UnescapePostprocessor` the old class remains
in the codebase, but has been deprecated and will be removed in a future
release. The new class `treeprocessors.UnescapeTreeprocessor` should be
used instead.
Copy link
Member Author

@waylan waylan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below are a few comments and concerns I have about this change. Feedback is welcome.

<p>Left paren: (</p>
<p>Right paren: )</p>
<p>Greater-than: ></p>
<p>Greater-than: &gt;</p>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one and only change in behavior in the existing tests. I'm okay with this, however, as technically this results in valid output. The reason for the change is that the angle bracket gets escaped during serialization. Previously, a placeholder was there during serialization, which was swapped out for the actual character later. The whole point of this change was to better ensure valid HTML output, so this is an acceptable change in behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having unescaped > in HTML was a bug, so good that you fixed it.

""" Loop over all elements and unescape all text. """
for elem in root.iter():
# Unescape text content
if elem.text and not elem.tag == 'code':
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we actually need to skip code tags, In fact, if I remove the check, the tests all pass. In fact, the previous code did not have a way to distinguish between code and other content. However, there is always a possibility that code could intentionally contain what looks like a placeholder. In that case, the content should not be altered. Therefore, I have left the check in.

@waylan waylan requested review from facelessuser and mitya57 July 13, 2022 20:11
@facelessuser
Copy link
Collaborator

I'll try and look at this soon. I'd like to pull it and see how it impacts some of my things.

@waylan waylan added the needs-review Needs to be reviewed and/or approved. label Jul 14, 2022
Copy link
Collaborator

@mitya57 mitya57 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As all tests pass and the bug is fixed, I am happy with this change.

Copy link
Collaborator

@facelessuser facelessuser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see nothing breaking on my end. Seems good.

@waylan waylan merged commit c0f6e5a into Python-Markdown:master Jul 15, 2022
@waylan waylan deleted the 1131 branch July 15, 2022 12:38
@waylan waylan added approved The pull request is ready to be merged. and removed needs-review Needs to be reviewed and/or approved. labels Jul 15, 2022
andersk added a commit to andersk/zulip that referenced this pull request Feb 4, 2023
See Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
andersk added a commit to andersk/typeshed that referenced this pull request Feb 4, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
andersk added a commit to andersk/typeshed that referenced this pull request Feb 4, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
andersk added a commit to andersk/zulip that referenced this pull request Feb 4, 2023
See Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
andersk added a commit to andersk/typeshed that referenced this pull request Feb 4, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
andersk added a commit to andersk/typeshed that referenced this pull request Feb 4, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
timabbott pushed a commit to zulip/zulip that referenced this pull request Feb 5, 2023
See Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
andersk added a commit to andersk/typeshed that referenced this pull request Feb 9, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
AlexWaygood pushed a commit to python/typeshed that referenced this pull request Feb 9, 2023
This replaced the deprecated
`markdown.postprocessors.UnescapePostprocessor` in
Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
UjjwalAggarwal-1 pushed a commit to UjjwalAggarwal-1/zulip that referenced this pull request Feb 20, 2023
See Python-Markdown/markdown#1272.

Signed-off-by: Anders Kaseorg <anders@zulip.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved The pull request is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Smarty extension causes generation of invalid HTML syntax

3 participants