Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken image downloads and content parsing issues #51

Merged
merged 6 commits into from
Mar 8, 2018

Conversation

pedrosanta
Copy link
Collaborator

@pedrosanta pedrosanta commented Mar 8, 2018

As it was a known issue, v0.0.17 was broken because the images weren't being downloaded.

The reason for this was mainly because on a previous commit of #35 the whole content.data was being passed through XML entities encoding, which (1) made, for instance, all the greater/less than characters surrounding tags being encoded, eg: <p>... thus making the cheerio loading/parsing not work as intended which (2) rendered all the subsequent processing code (images and such) useless. Since contents data proper XHTML5 entities encoding should be a responsibility of lib users, this was removed, which made the images parsing and download work well again.

Finally the content.data was being outputted as HTML because cheerio $.xml() returns proper HTML even when loading an encoded string. (¯\_(ツ)_/¯, sense: makes none)

Moreover, several other issues with content data parsing we're fixed, namely the removal of ignoreWhitespace option on cheerio too, which fixes the issue experienced on #38 and renders that solution obsolete.

Summing up the list of fixes goes like:

The proper encoding of XHTML5 entities of content data should be a
responsibility of lib user. Furthermore this was encoding all the
greater/less than characters into named entities which was making all
the proceeding parsing code not work, as @rriclet properly exposed on
0498640#r23845706
The other example doesn't generate a valid EPUB, it includes, among
other things, invalid named entities.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cheerio version uses deprecated packages
1 participant