Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XMLDom error #182

Closed
gregoriopellegrino opened this issue May 15, 2018 · 5 comments
Closed

XMLDom error #182

gregoriopellegrino opened this issue May 15, 2018 · 5 comments
Assignees
Labels
Milestone

Comments

@gregoriopellegrino
Copy link

Ace version 1.0.0
Node version 8.7.0
MacOs 10.12.6

During Ace checking on an EPUB file I'm getting "[xmldom error] entity not found:–"

@rdeltour
Copy link
Member

hi Gregorio. Would you be able to send me a sample EPUB?

@rdeltour rdeltour added the bug label May 15, 2018
@rdeltour rdeltour self-assigned this May 15, 2018
@gregoriopellegrino
Copy link
Author

Here you have:
error_test.epub.zip

@rdeltour
Copy link
Member

Thanks for the sample file! I could reproduce the issue.

@rdeltour rdeltour added this to the v1.0.1 milestone May 25, 2018
rdeltour added a commit that referenced this issue May 25, 2018
- Use @fchasen’s fork of xmldom to parse the HTML named character
  references defined in HTML, even when the document is XHTML.
  Note however that this is a willful violation of the HTML standard,
  since the entities are only declared when the document has one of the
  allowed public identifiers
  (see https://html.spec.whatwg.org/#parsing-xhtml-documents)
- Set an error handler to xmldom’s `DOMParser` to catch parsing errors
  (like undeclared entities) and log them with winston.
- Add tests.

Fixes #182
@rdeltour
Copy link
Member

This should be fixed in the proposed PR.

Note that these named entities references are only allowed in EPUB 2, which Ace doesn’t fully support.
EPUB 3 forbids external identifiers in the doctype declaration, so the public identifiers listed in the HTML standard’s “Parsing XML documents” section can’t be used, and consequently the only allowed entities are the ones predefined in XML (quot, amp, apos, lt, gt).

The PR makes it so the all the entities defined in HTML are parsed even when the mime type is XHTML (regardless of the doctype). It also catches XMLDOM warnings and errors to log them properly with Ace’s logging system.

@gregoriopellegrino
Copy link
Author

Thanks

rdeltour added a commit that referenced this issue May 25, 2018
- Use @fchasen’s fork of xmldom to parse the HTML named character
  references defined in HTML, even when the document is XHTML.
  Note however that this is a willful violation of the HTML standard,
  since the entities are only declared when the document has one of the
  allowed public identifiers
  (see https://html.spec.whatwg.org/#parsing-xhtml-documents)
- Set an error handler to xmldom’s `DOMParser` to catch parsing errors
  (like undeclared entities) and log them with winston.
- Add tests.

Fixes #182
rdeltour added a commit that referenced this issue May 25, 2018
- Use @fchasen’s fork of xmldom to parse the HTML named character
  references defined in HTML, even when the document is XHTML.
  Note however that this is a willful violation of the HTML standard,
  since the entities are only declared when the document has one of the
  allowed public identifiers
  (see https://html.spec.whatwg.org/#parsing-xhtml-documents)
- Set an error handler to xmldom’s `DOMParser` to catch parsing errors
  (like undeclared entities) and log them with winston.
- Add tests.

Fixes #182
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants