Skip to content

Commit

Permalink
Fix handling of bogus comments.
Browse files Browse the repository at this point in the history
As with most implementations, we now pass through bogus comments (as
defined by the HTML Spec) unaltered except that they are HTML escaped.
This deviates from the reference implementation which completely ignores
them. As the reference implementation seems to not have even contemplated
their existence, it is not being used as a reference in this instance.
Fixes #1425.
  • Loading branch information
waylan committed Jan 3, 2024
1 parent a2a9c53 commit e466f38
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 8 deletions.
1 change: 1 addition & 0 deletions docs/changelog.md
Expand Up @@ -20,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
* Fix edge-case crash in `InlineProcessor` with `AtomicString` (#1406).
* Fix edge-case crash in `codehilite` with an empty `code` tag (#1405).
* Improve and expand type annotations in the code base (#1401).
* Fix handling of bogus comments (#1425).

## [3.5.1] -- 2023-10-31

Expand Down
9 changes: 9 additions & 0 deletions markdown/htmlparser.py
Expand Up @@ -277,6 +277,15 @@ def parse_html_declaration(self, i: int) -> int:
self.handle_data('<!')
return i + 2

def parse_bogus_comment(self, i: int, report: int = 0) -> int:
# Override the default behavior so that bogus comments get passed
# through unaltered by setting `report` to `0` (see #1425).
pos = super().parse_bogus_comment(i, report)
if pos == -1: # pragma: no cover
return -1
self.handle_empty_tag(self.rawdata[i:pos], is_block=False)
return pos

# The rest has been copied from base class in standard lib to address #1036.
# As `__startag_text` is private, all references to it must be in this subclass.
# The last few lines of `parse_starttag` are reversed so that `handle_starttag`
Expand Down
16 changes: 8 additions & 8 deletions tests/test_syntax/blocks/test_html_blocks.py
Expand Up @@ -782,16 +782,16 @@ def test_raw_comment_trailing_whitespace(self):
'<!-- *foo* -->'
)

# Note: this is a change in behavior for Python-Markdown, which does *not* match the reference
# implementation. However, it does match the HTML5 spec. Declarations must start with either
# `<!DOCTYPE` or `<![`. Anything else that starts with `<!` is a comment. According to the
# HTML5 spec, a comment without the hyphens is a "bogus comment", but a comment nonetheless.
# See https://www.w3.org/TR/html52/syntax.html#markup-declaration-open-state.
# If we wanted to change this behavior, we could override `HTMLParser.parse_bogus_comment()`.
def test_bogus_comment(self):
self.assertMarkdownRenders(
'<!*foo*>',
'<!--*foo*-->'
'<!invalid>',
'<p>&lt;!invalid&gt;</p>'
)

def test_bogus_comment_endtag(self):
self.assertMarkdownRenders(
'</#invalid>',
'<p>&lt;/#invalid&gt;</p>'
)

def test_raw_multiline_comment(self):
Expand Down

0 comments on commit e466f38

Please sign in to comment.