Fix handling of bogus comments.

As with most implementations, we now pass through bogus comments (as defined by the HTML Spec) unaltered except that they are HTML escaped. This deviates from the reference implementation which completely ignores them. As the reference implementation seems to not have even contemplated their existence, it is not being used as a reference in this instance. Fixes #1425.
Python-Markdown · Jan 3, 2024 · e466f38 · e466f38
1 parent a2a9c53
commit e466f38
Show file tree

Hide file tree

Showing 3 changed files with 18 additions and 8 deletions.
diff --git a/docs/changelog.md b/docs/changelog.md
@@ -20,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 * Fix edge-case crash in `InlineProcessor` with `AtomicString` (#1406).
 * Fix edge-case crash in `codehilite` with an empty `code` tag (#1405).
 * Improve and expand type annotations in the code base (#1401).
+* Fix handling of bogus comments (#1425).
 
 ## [3.5.1] -- 2023-10-31
 

diff --git a/markdown/htmlparser.py b/markdown/htmlparser.py
@@ -277,6 +277,15 @@ def parse_html_declaration(self, i: int) -> int:
         self.handle_data('<!')
         return i + 2
 
+    def parse_bogus_comment(self, i: int, report: int = 0) -> int:
+        # Override the default behavior so that bogus comments get passed
+        # through unaltered by setting `report` to `0` (see #1425).
+        pos = super().parse_bogus_comment(i, report)
+        if pos == -1:  # pragma: no cover
+            return -1
+        self.handle_empty_tag(self.rawdata[i:pos], is_block=False)
+        return pos
+
     # The rest has been copied from base class in standard lib to address #1036.
     # As `__startag_text` is private, all references to it must be in this subclass.
     # The last few lines of `parse_starttag` are reversed so that `handle_starttag`

diff --git a/tests/test_syntax/blocks/test_html_blocks.py b/tests/test_syntax/blocks/test_html_blocks.py
@@ -782,16 +782,16 @@ def test_raw_comment_trailing_whitespace(self):
             '<!-- *foo* -->'
         )
 
-    # Note: this is a change in behavior for Python-Markdown, which does *not* match the reference
-    # implementation. However, it does match the HTML5 spec. Declarations must start with either
-    # `<!DOCTYPE` or `<![`. Anything else that starts with `<!` is a comment. According to the
-    # HTML5 spec, a comment without the hyphens is a "bogus comment", but a comment nonetheless.
-    # See https://www.w3.org/TR/html52/syntax.html#markup-declaration-open-state.
-    # If we wanted to change this behavior, we could override `HTMLParser.parse_bogus_comment()`.
     def test_bogus_comment(self):
         self.assertMarkdownRenders(
-            '<!*foo*>',
-            '<!--*foo*-->'
+            '<!invalid>',
+            '<p>&lt;!invalid&gt;</p>'
+        )
+
+    def test_bogus_comment_endtag(self):
+        self.assertMarkdownRenders(
+            '</#invalid>',
+            '<p>&lt;/#invalid&gt;</p>'
         )
 
     def test_raw_multiline_comment(self):