Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML ordered list: recognize the 'type' attribute #214

Merged
merged 3 commits into from
Jan 21, 2021

Conversation

ourairquality
Copy link
Contributor

No description provided.

@poire-z
Copy link
Contributor

poire-z commented Jan 6, 2021

This can already be achieved by using html5.css:

ol[type=1], li[type=1] {
list-style-type: decimal;
}
ol[type=a], li[type=a] {
list-style-type: lower-alpha;
}
ol[type=A], li[type=A] {
list-style-type: upper-alpha;
}
ol[type=i], li[type=i] {
list-style-type: lower-roman;
}
ol[type=I], li[type=I] {
list-style-type: upper-roman;
}

(There are a lot less stuff in epub.css, for historical reasons, and to have a simpler CSS.)

When HTML attributes can map directly to CSS properties, we can do it via CSS. It might be more expensive, but it avoids adding too much such stuff in the code.
(Not fond of having that in setNodeStyle(), which is already crowded with the pure CSS pecularities.)

@Frenzie
Copy link

Frenzie commented Jan 6, 2021

It might be more expensive

Perhaps a good motivation to optimize those code paths, too. grins

@ourairquality
Copy link
Contributor Author

Ok, thanks and that works well and does look cleaner. Updated the PR to add these to all the epub.css files in case that is ok to land now.

@poire-z
Copy link
Contributor

poire-z commented Jan 6, 2021

(Dunno about CoolReader, but on our KOReader side, one can easily switch from epub.css to html5.css - epub.css is quite smaller, and does not have all the attribute matches that html5.css has - may be it could benefit from all of them, but given that it's available in html5.css, I think it's not really necessary - and the spirit of EPUB is to privilege CSS other HTML attributes - HTML3/4 had deprecated quite a few of them, even if HTML5 seems to have added new ones...)

@ourairquality
Copy link
Contributor Author

The type attribute appears to be valid for the <ol> element in the EPUB standard so why not support it. If html5.css can be included easily in coolreader then that was not obvious to me so perhaps not for users in general. It might be appropriate to not add the type attribute for the <li> element in the epub.css files as that is not valid EPUB anyway, but the main commercial readers appear to support it.

@poire-z
Copy link
Contributor

poire-z commented Jan 6, 2021

But then, there's no reason to privilege your ol[type] over the many other elem[attr=] that are present in html5.css (ul[type=circle i] thead[hidden] table[rules=cols i] caption[align=bottom i] p[align=center i] table[border] hr[align=right]...)
So, if we put them all in epub.css, we get a huge epub.css that is nearly equal to html5.css. The main difference would be the styling of P (in html5.css: no text-indent, top & bottom margins, like in web browsers - which does not look super nice with books).
Also, in crengine, there are some legacy parsing of attributes, that are done only for HTML files, but not with EPUB files (that might be why I got the feeling of EPUBs privileging CSS over attributes).

The way to go about that is a CoolReader decision (I think I'll keep our short epub.css in KOReader). But having just ol[type] added to epub.css feels a bit clumsy/too quick.

@ourairquality
Copy link
Contributor Author

The EPUB standard could be implement. The type attribute does not not appear to be valid on the ul element in EPUB so that could rule that one out, and could check if the others are valid in EPUB and only add those that are, and that process might keep the epub.css smaller. It's just a request, to try to be a little more compatible with the EPUB standard.

@poire-z
Copy link
Contributor

poire-z commented Jan 6, 2021

Might be nice if you can achieve that, and the difference is substantial.
What's your reference for what's valid in EPUB ?

@ourairquality
Copy link
Contributor Author

Have been using epubcheck as a reference but that must be based on publications: https://github.com/w3c/epubcheck

There is an online implementation here: http://validator.idpf.org/

For the lists, I was looking at the rules in:
https://github.com/w3c/epubcheck/blob/master/src/main/resources/com/adobe/epubcheck/schema/30/mod/html5/block.rnc

Also I check a range of other implementations. Many of the Android readers support these html attributes, but fwiw my kobo eink reader does not appear to support start, reversed, or type on list elements.

@poire-z
Copy link
Contributor

poire-z commented Jan 6, 2021

OK, good luck :)
(For completeness, you can find my reasoning for epub.css vs html5.css in koreader/koreader#6654 (comment) and the 2 crengine links.)

@ourairquality
Copy link
Contributor Author

Thank you for the perspective, but it still seems helpful to try to get the list labels right, to follow the standard on that specific matter. It's not clear how it helps the reader on a particular device to get the labels wrong, it does not seems like a formatting preference.

Fwiw my Kobo eink reader appears to accept the start, reversed, and type attributes when using the kepub format, and also accepts type on the li element, consistent with many other readers.

@Frenzie
Copy link

Frenzie commented Jan 6, 2021

The type attribute does not not appear to be valid on the ul element in EPUB so that could rule that one out,

Link and quote of the exact part of the spec where it says that or epubcheck is wrong.

@Frenzie
Copy link

Frenzie commented Jan 6, 2021

Note a slight distinction between EPUB 2 and 3.

http://idpf.org/epub/201

Any construct deprecated in XHTML 1.1 is either deprecated or omitted from this specification [that ; CSS-based equivalents are provided in most such cases. Style sheet constructs are also used for new presentational functionality beyond that provided in XHTML.

https://www.w3.org/publishing/epub32/epub-contentdocs.html#sec-overview-relations-html

This specification does not reference a specific version of W3C [HTML], but instead uses an undated reference that will always point to the latest recommendation. This approach ensures that EPUB will always keep pace with changes to the HTML standard. Authors and Reading System developers will need to keep track of changes to HTML, and ensure that their processes and systems are kept up to date.

@poire-z
Copy link
Contributor

poire-z commented Jan 6, 2021

So, that could explain:
epub.css = EPUB2, that limited attributes and stuff to what was defined and cleaned up in XHTML1.1.
html5.css = EPUB3, that decided to stop worrying and love HTML5 wherever it goes?

@Frenzie
Copy link

Frenzie commented Jan 6, 2021

I don't think there are many books that would even validate as XHTML 1.1. Some might by skirting around the spirit of it completely, but I think most are something closer to XHTML 1.0 Transitional or XHTML 5 (that's HTML 5 written as well-formed XML). And frankly that's how it should be. ;-)

@ourairquality
Copy link
Contributor Author

The type attribute does not not appear to be valid on the ul element in EPUB so that could rule that one out,

Link and quote of the exact part of the spec where it says that or epubcheck is wrong.

EPUB 3 references HTML, and the HTML living standard does not include type as an attribute on the ul element:
https://html.spec.whatwg.org/multipage/grouping-content.html#the-ul-element

Might there be a different issue here, that coolreader should distinguish epub2 from epub3 and then use separate css files if the css file is effectively adding in support for epub3? I see it getting the file name in LVDocFormatCssFileName(). Could the format doc_format_epub be split into doc_format_epub2 and doc_format_epub3 if people really want a restricted epub2 path, or could people be happy supporting more of epub3 and the html living standard even when loading an epub2 file?

@poire-z
Copy link
Contributor

poire-z commented Jan 6, 2021

EPUB 3 references HTML, and the HTML living standard does not include type as an attribute on the ul element

But who knows if at the time an EPUB book was produced, the HTML standard of the time did not accept it :) We're consumers, we should be the most flexible, and accept most of what may have been accepted at any time, as long as there is no incompatibility.

So then, just use html5.css, and don't touch, and consider epub.css obsolete.
Or copy html5.css to epub3.css, and report back in the little adaptations we did for better books - but then, some EPUBs expect to have the base CSS as web browsers, as I mentionned in the issue I linked above. So, use html5.css with no adaptations :/
It's not an easy decision, for me.
(In KOReader, again, users can set any as the default, so CoolReader's decision if that's not possible.)

@Frenzie
Copy link

Frenzie commented Jan 6, 2021

EPUB 3 references HTML, and the HTML living standard does not include type as an attribute on the ul element:

Interesting, so they actually deprecated that.

https://www.w3.org/TR/html4/struct/lists.html#type-values

that coolreader should distinguish epub2 from epub3

It already does where appropriate (i.e., some metadata stuff).

or could people be happy supporting more of epub3 and the html living standard even when loading an epub2 file?

HTML5 reflects how people actually use HTML (which apparently doesn't include using type disc/square/circle, and indeed I've only used CSS for that for many years ;-) ), in stark contrast to XHMTL 1.1. I can't imagine who'd want such a thing. ^_^

@Frenzie
Copy link

Frenzie commented Jan 6, 2021

To be clear, only disc/circle/square are/were technically supposed to be supported on UL. In practice I don't know if anybody's ever bothered to restrict it, but presumably some early browsers did more or less as an implementation accident.

@poire-z
Copy link
Contributor

poire-z commented Jan 6, 2021

Could the format doc_format_epub be split into doc_format_epub2 and doc_format_epub3 if people really want a restricted epub2 path, or could people be happy supporting more of epub3 and the html living standard even when loading an epub2 file?

Distinguishing is no-use, we're consumers, we might have epub2 books that contains obsolete or newer tags, better support them than be rigid for sake of specs. So, use html5.css. (And possibly have some users complain about the changes in styles for P et al.)

See top of html5.css for its origin, might be old, or might have been made compatible if it has ul[type], dunno.

@ourairquality
Copy link
Contributor Author

Could the format doc_format_epub be split into doc_format_epub2 and doc_format_epub3 if people really want a restricted epub2 path, or could people be happy supporting more of epub3 and the html living standard even when loading an epub2 file?

Distinguishing is no-use, we're consumers, we might have epub2 books that contains obsolete or newer tags, better support them than be rigid for sake of specs. So, use html5.css. (And possibly have some users complain about the changes in styles for P et al.)

Great, I think we are on the same page as I also support being permissive. Perhaps I misunderstood you, you are suggesting effectively replacing the coolreader epub.css with the html5.css, not bothering to limit HTML support in EPUB2 Could that be a separate PR, as my focus here was to just improve the list label support and I would not this request to fail to get support just because people did not want to replace epub.css with all of html5.css.

@poire-z
Copy link
Contributor

poire-z commented Jan 7, 2021

Could that be a separate PR, as my focus here was to just improve the list label support and I would not this request to fail to get support just because people did not want to replace epub.css with all of html5.css.

Not my decision, but CoolReader's. I guess it hurts nothing, it just feels like a job half done :)

Anyway, if you can (dunno if that's possible, might need an android rooted device), you could try coyping html5.css to epub.css, and see how that looks on multiple books, to get another idea if it could be a viable default (my personal opinion is that I don't like it).

@poire-z
Copy link
Contributor

poire-z commented Jan 7, 2021

it just feels like a job half done

But in any case, let this ugly other half of the job to me (not before a few months): it we're putting more attr matches in epub.css, we also need to remove these kind of handling (that are done only on single HTML files, not with EPUBs):

// Not sure this is to be done here: we get attributes as they are read,
// so possibly before or after a style=, that the attribute may override.
// Hopefully, a document use either one or the other.
// (Alternative: in lvrend.cpp when used, as fallback when there is
// none specified in node->getStyle().)
// HTML align= => CSS text-align:
// Done for all elements, except IMG and TABLE (for those, it should
// translate to float:left/right, which is ensured by epub.css)
// Should this be restricted to some specific elements?
if ( !lStr_cmp(attrname, "align") && (id != el_img) && (id != el_table) ) {
lString32 align = lString32(attrvalue).lowercase();
if ( align == U"justify")
appendStyle( U"text-align: justify" );
else if ( align == U"left")
appendStyle( U"text-align: left" );
else if ( align == U"right")
appendStyle( U"text-align: right" );
else if ( align == U"center")
appendStyle( U"text-align: center" );
return;
}
// For the table & friends elements where we do support the following styles,
// we translate these deprecated attributes to their style equivalents:
//
// HTML valign= => CSS vertical-align: only for TH & TD (as lvrend.cpp
// only uses it with table cells (erm_final or erm_block))
if (id == el_th || id == el_td) {
// Default rendering for cells is valign=baseline
if ( !lStr_cmp(attrname, "valign") ) {
lString32 valign = lString32(attrvalue).lowercase();
if ( valign == U"top" )
appendStyle( U"vertical-align: top" );
else if ( valign == U"middle" )
appendStyle( U"vertical-align: middle" );
else if ( valign == U"bottom")
appendStyle( U"vertical-align: bottom" );
return;
}
}

@Frenzie
Copy link

Frenzie commented Jan 7, 2021

Anyway, if you can (dunno if that's possible, might need an android rooted device), you could try coyping html5.css to epub.css, and see how that looks on multiple books, to get another idea if it could be a viable default (my personal opinion is that I don't like it).

What if we @import html5.css in epub.css to keep the current "us" vs "browser" default styles, and override the margins on a few elements? Or something along those lines, anyway.

@poire-z
Copy link
Contributor

poire-z commented Jan 7, 2021

(CR on Android might feed these CSS content as a string, and might not have @import processed.)

I think I would prefer something curated, and the relevant stuff handpicked.
Our current html5.css has lots of crap and possibly uneeded stuff. I took it from some place, just adapted it so it works with DocFragments and page breaks, removed some obvious unsupported stuff like bidi CSS, not much more thoughts than that. It's just there as a quick solution, but live with any side effects.

epub.css also tries to ensure compatibility with old epub.css highlights (with its -cr-ignore-if-dom-version-greater-or-equal stuff at end). Dunno how these combinations would work.

Also, having tons of such things might increase loading/rendering time (by how much, I dunno):

table[rules=groups i] > tfoot > tr > td, table[rules=groups i] > tfoot > tr > th,
table[rules=rows i] > tfoot > tr > td, table[rules=rows i] > tfoot > tr > th,
table[rules=cols i] > tfoot > tr > td, table[rules=cols i] > tfoot > tr > th,
table[rules=all i] > tfoot > tr > td, table[rules=all i] > tfoot > tr > th {
border-color: black;
}

p[align=right i], h1[align=right i], h2[align=right i], h3[align=right i],
h4[align=right i], h5[align=right i], h6[align=right i] {
text-align: right;
}

table[rules=cols i] > tfoot > tr > td, table[rules=cols i] > tfoot > tr > th {
border-width: 1px;
border-style: none solid;
}

applet[align=texttop i], embed[align=texttop i], iframe[align=texttop i],
img[align=texttop i], input[type=image i][align=texttop i], object[align=texttop i] {
vertical-align: text-top;
}

I don't think Calibre has as much stuff.

But yes, a proper html5-base.css + some overrides for epub.css look (which could be a single Style tweak with KOReader, but it would need some thoughts for the UI) , might be a solution.

Implemented in CSS, but this is a standard HTML attribute.
@ourairquality
Copy link
Contributor Author

Also updated the htm.css files to support the ordered list type attribute.

@ourairquality
Copy link
Contributor Author

Many older documents still use type=square/circle/disc, even if it's deprecated now.

Is this a request to add more cases, and is that for the 'ol' or 'ul' and/or the 'li' element?

This is no longer part of the HTML standard, but support is included
for reading documents formatted to older standards.
@Frenzie
Copy link

Frenzie commented Jan 8, 2021

It's what I was implicitly referring to in my first comment, spec here. <ul type="disc/square/circle"> is valid HTML 4; I just didn't realize at the time of writing that it wasn't valid HTML5 because HTML5 undeprecated most stuff deprecated in XHTML 1.1.

@ourairquality
Copy link
Contributor Author

Rolled in similar additions for unordered lists, which seems to be requested for this PR.

@@ -67,6 +67,17 @@ ul {
list-style-type: disc;
margin-left: 1em;
}

ul[type=disc], li[type=disc] {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, like that. 👍

Common web browsers vary the default label for unordered lists when
nested, so follow this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants