Parser breaks with empty attributes or unquoted attribute values #2

krautsource · 2023-03-13T16:11:16Z

Hey,

first off, thanks a bunch for making this project available. It's exactly what I needed for a project of mine.

There doesn't seem to be a lot of development going on, but maybe this helps somebody with similar problems I had.
The parser seems to have issues when an element contains an attribute without value, or an attribute with an unquoted value (which is both valid HTML, AFAIK).

Examples:

Missing attribute value:

dom = htmldom.HtmlDom()
dom.createDom("<div><p foo class='bar'>hello world</p><p>bye</p></div>")

dom.find("p.bar") # returns an empty list

Unquoted attribute value:

dom.createDom("<div><p foo=1 class='bar'>hello world</p><p>bye</p></div>")

dom.find("p.bar") # returns an empty list

For my use-case I am currently working around this by retrieving the HTML source using requests, string-replacing the known offending attribute with an empty string and then feeding the result into createDom().

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser breaks with empty attributes or unquoted attribute values #2

Parser breaks with empty attributes or unquoted attribute values #2

krautsource commented Mar 13, 2023 •

edited

Parser breaks with empty attributes or unquoted attribute values #2

Parser breaks with empty attributes or unquoted attribute values #2

Comments

krautsource commented Mar 13, 2023 • edited

krautsource commented Mar 13, 2023 •

edited