XMLDom error #182

gregoriopellegrino · 2018-05-15T09:24:08Z

Ace version 1.0.0
Node version 8.7.0
MacOs 10.12.6

During Ace checking on an EPUB file I'm getting "[xmldom error] entity not found:–"

rdeltour · 2018-05-15T18:37:05Z

hi Gregorio. Would you be able to send me a sample EPUB?

gregoriopellegrino · 2018-05-21T17:14:01Z

Here you have:
error_test.epub.zip

rdeltour · 2018-05-24T23:08:31Z

Thanks for the sample file! I could reproduce the issue.

@fchasen

- Use @fchasen’s fork of xmldom to parse the HTML named character references defined in HTML, even when the document is XHTML. Note however that this is a willful violation of the HTML standard, since the entities are only declared when the document has one of the allowed public identifiers (see https://html.spec.whatwg.org/#parsing-xhtml-documents) - Set an error handler to xmldom’s `DOMParser` to catch parsing errors (like undeclared entities) and log them with winston. - Add tests. Fixes #182

rdeltour · 2018-05-25T12:14:01Z

This should be fixed in the proposed PR.

Note that these named entities references are only allowed in EPUB 2, which Ace doesn’t fully support.
EPUB 3 forbids external identifiers in the doctype declaration, so the public identifiers listed in the HTML standard’s “Parsing XML documents” section can’t be used, and consequently the only allowed entities are the ones predefined in XML (quot, amp, apos, lt, gt).

The PR makes it so the all the entities defined in HTML are parsed even when the mime type is XHTML (regardless of the doctype). It also catches XMLDOM warnings and errors to log them properly with Ace’s logging system.

gregoriopellegrino · 2018-05-25T12:18:28Z

Thanks

@fchasen

- Use @fchasen’s fork of xmldom to parse the HTML named character references defined in HTML, even when the document is XHTML. Note however that this is a willful violation of the HTML standard, since the entities are only declared when the document has one of the allowed public identifiers (see https://html.spec.whatwg.org/#parsing-xhtml-documents) - Set an error handler to xmldom’s `DOMParser` to catch parsing errors (like undeclared entities) and log them with winston. - Add tests. Fixes #182

@fchasen

- Use @fchasen’s fork of xmldom to parse the HTML named character references defined in HTML, even when the document is XHTML. Note however that this is a willful violation of the HTML standard, since the entities are only declared when the document has one of the allowed public identifiers (see https://html.spec.whatwg.org/#parsing-xhtml-documents) - Set an error handler to xmldom’s `DOMParser` to catch parsing errors (like undeclared entities) and log them with winston. - Add tests. Fixes #182

rdeltour added the bug label May 15, 2018

rdeltour self-assigned this May 15, 2018

rdeltour added waiting for feedback and removed bug labels May 21, 2018

rdeltour added bug and removed waiting for feedback labels May 24, 2018

rdeltour added this to the v1.0.1 milestone May 25, 2018

rdeltour mentioned this issue May 25, 2018

fix(parser): parse HTML named character references #188

Merged

rdeltour closed this as completed in #188 May 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XMLDom error #182

XMLDom error #182

gregoriopellegrino commented May 15, 2018

rdeltour commented May 15, 2018

gregoriopellegrino commented May 21, 2018

rdeltour commented May 24, 2018

rdeltour commented May 25, 2018

gregoriopellegrino commented May 25, 2018

XMLDom error #182

XMLDom error #182

Comments

gregoriopellegrino commented May 15, 2018

rdeltour commented May 15, 2018

gregoriopellegrino commented May 21, 2018

rdeltour commented May 24, 2018

rdeltour commented May 25, 2018

gregoriopellegrino commented May 25, 2018