QUESTION: Should the lib stop caring about encoding/cleaning/fixing content.data? #49

pedrosanta · 2018-03-07T16:48:51Z

Hello,

Following up on @jamesporter #38, I had this question for a while: should it really be responsibility of the lib to be concerned with the data the lib users add on each content, namely on cleaning up/encoding content.data?

In the past there was an effort to make the lib produce valid version 2 and 3 EPUBs, but I feel that concern should only be present in the parts directly to do with the lib, and should be the responsibility of the lib user to provide valid EPUB 2/3 XHTML for their added contents.

This way the issue @jamesporter raised on #38 would not occur too.

So, I would vote to remove this verification/cleanup/encoding from the lib all together. I can still admit the warning about unsupported tags, but even so, I think it should warn but not alter the content user added to its EPUB.

@cyrilis @jamesporter thoughts?

The text was updated successfully, but these errors were encountered:

cyrilis · 2018-03-08T01:42:17Z

Hi, @pedrosanta
I agree with you on your opinion that lib user should be responsible for the content validation and we shouldn't alter the content just for making less warnings. I always considered it as a tool instead of a lib, which is not quite accurate, and sorry for recently change caused by this.

So I vote to remove verification/cleanup/encoding, too. lib user should make sure they passed "correct" content, which makes this lib more pure.

I guess we can even remove the cheerio dependency, right?

pedrosanta · 2018-03-08T04:53:36Z

Thanks for the reply @cyrilis, glad to see you agree. 🙂 Just merged #51 which removes the entities encoding of content.data (because it was wrong and causing all sorts of issues) which goes along this line.

I want to think a bit more about your comment and reply a bit more in detail soon. But yhea, I would say we could deepen/simplify how we're processing the content.data, like, namely on that allowed tags part and all that, I think that could be rethought – although I feel warning for possible validation issues to be a plus, I would say we shouldn't alter/try to guess what the correct contents would be and I also feel we could do it in a more robust way if that's so, probably using a more complete library or so to avoid situations where we miss certain tags/etc like #34.

pedrosanta · 2018-03-23T12:28:47Z

@cyrilis:

I agree with you on your opinion that lib user should be responsible for the content validation and we shouldn't alter the content just for making less warnings.

Yes, even the allowed attributes and allowed tags mechanism cleanup, I wonder if isn't better to simplify and remove that.

As in terms of validation, I think the lib should at most warn about invalid contents, but like we said above, not alter the contents. But in that case, I think it could be better to look out for another lib that gives us a more robust/complete validation instead of keeping a manual array of allowed tags and attributes. 🙂

I always considered it as a tool instead of a lib, which is not quite accurate, and sorry for recently change caused by this.

It's cool, we're all here to help/contribute. #praiseOSS 😎 🙌

Anyway, I don't think those two are like, mutually exclusive. Like, with a bit of an effort we could pick the idea suggested on #13 and add a CLI tool to this as well, while allowing it to work as a lib too. Just a thought.

So I vote to remove verification/cleanup/encoding, too. lib user should make sure they passed "correct" content, which makes this lib more pure.

Ok, so, I already removed the entities encoding part on #51 but there's still some code to check and clean the contents (namely the allowed attributes/tags part, etc), but yes, I agree we could simplify this – making the lib leaner/pure like you said.

I'll draft a PR removing/simplifying this as soon as I can. 🙂

I guess we can even remove the cheerio dependency, right?

We can... 🤔 (and rely on Regexes, or so)... but, it does come handy to parse all the image URLs/src's. IDK, perhaps we can keep it for now – basically for reason mentioned before – and reassess its need later on, WDYT?

Cheers, thanks for the comments @cyrilis! 🙌

cyrilis · 2018-03-26T03:25:21Z

@pedrosanta
Thanks for your reply, and thanks for your contribution. cheerio will still be in our dependency list.
Haha, we still need it to parse the HTMLs. And I think CLI tool is also a good idea, which could makes it more easy to use.

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

pedrosanta added the question label Mar 7, 2018

pedrosanta mentioned this issue Mar 23, 2018

QUESTION: More robust tests? #57

Open

zbrongyaozb pushed a commit to zbrongyaozb/epub-gen that referenced this issue Aug 2, 2022

Update all non-major dependencies (cyrilis#49)

a5fd21e

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUESTION: Should the lib stop caring about encoding/cleaning/fixing content.data? #49

QUESTION: Should the lib stop caring about encoding/cleaning/fixing content.data? #49

pedrosanta commented Mar 7, 2018

cyrilis commented Mar 8, 2018

pedrosanta commented Mar 8, 2018 •

edited

Loading

pedrosanta commented Mar 23, 2018

cyrilis commented Mar 26, 2018

QUESTION: Should the lib stop caring about encoding/cleaning/fixing content.data? #49

QUESTION: Should the lib stop caring about encoding/cleaning/fixing content.data? #49

Comments

pedrosanta commented Mar 7, 2018

cyrilis commented Mar 8, 2018

pedrosanta commented Mar 8, 2018 • edited Loading

pedrosanta commented Mar 23, 2018

cyrilis commented Mar 26, 2018

pedrosanta commented Mar 8, 2018 •

edited

Loading