Change from "html.HTMLParser" to "etree.XMLParser" for validation #97
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
html.HTMLParser
doesn't accept "unknown" html tags.e.g.: It raised an error because of
<nav>
tag :(The
etree.XMLParser
seems to be a better choice: It raise errors on really broken html documents,but accepts all tags.
Another difference:
HTMLParser
accepts whitespaces in closing tags like:</ \r\n h1>
andXMLParser
not.Thsi PR also doesn't activate
recover=False
, because with this option, validate django admindefault templates doesn't work. But they seems that they are fine html codes...
Another thing:
etree.XMLSyntaxError
will raise a very, very big traceback and not any context to the broken HTMLdocument.
This is a problem in combination with snapshot tests: Because it's totally unknown what part of the
HTML document contains the error.
Now we get really helpful messages, that points to the error and contains some context lines, e.g.: