You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a result of #109, character and entity references are unconditionally dereferenced. This causes HTML which contains character references representing HTML-like text to be converted to markdown with raw HTML by html2text 2017.10.4 and later:
$ echo "<p>Horizontal rule is <hr></p>" | html2markdown
Horizontal rule is <hr>
To make the problem clearer, consider round-tripping from HTML to Markdown back to HTML:
$ echo "<p>Horizontal rule is <hr></p>" | html2markdown | cmark
<p>Horizontal rule is <!-- raw HTML omitted --></p>
$ echo "<p>Horizontal rule is <hr></p>" | html2markdown | cmark --unsafe
<p>Horizontal rule is <hr></p>
The conversion to markdown changes the meaning of the content by dereferencing the character references.
To satisfy the request in #109, I suggest preserving character and entity references which would be interpreted as Raw HTML if dereferenced. That would avoid producing unnecessary character references (as requested in #109) and also avoid changing the meaning of the content when it contains HTML-like text.
Thanks for considering,
Kevin
The text was updated successfully, but these errors were encountered:
As a result of #109, character and entity references are unconditionally dereferenced. This causes HTML which contains character references representing HTML-like text to be converted to markdown with raw HTML by html2text 2017.10.4 and later:
To make the problem clearer, consider round-tripping from HTML to Markdown back to HTML:
The conversion to markdown changes the meaning of the content by dereferencing the character references.
To satisfy the request in #109, I suggest preserving character and entity references which would be interpreted as Raw HTML if dereferenced. That would avoid producing unnecessary character references (as requested in #109) and also avoid changing the meaning of the content when it contains HTML-like text.
Thanks for considering,
Kevin
The text was updated successfully, but these errors were encountered: