-
-
Notifications
You must be signed in to change notification settings - Fork 382
hyphenate text output #1214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hyphenate text output #1214
Conversation
MartinNowak
commented
Jan 25, 2016
- justify text on all browsers
- use htmld and hyphenate libs w/ en-US pattern
- run dpl-docs w/ hyphenation
|
Based on #1213 |
|
@CyberShadow can you easily run |
|
Is this really better than just waiting for Blink to implement hyphens? I mean, what's so bad about Chrome and Opera having left aligned text? This adds so much extra stuff for a tiny detail. |
It's not much (+129 −34, we already have the libraries, both of which are simple and stable) and chrome is unlikely to implement it soon. |
f8d01df to
f51b124
Compare
Done |
|
But it seems to me that it is a bug in dub if it doesn't re-check online when it sees a tag it has never seen before. |
Yes, we have to fix it. |
f51b124 to
4d6369b
Compare
|
This changes the output drastically. Among other things, the output is no longer HTML5. Edit: OK, not really, but I've gotten void tags to be uniformly not self-closed during my valid HTML pass a few months ago. |
|
Since this parses the HTML, can it also validate it? If not, I'd like it to keep the original HTML (as emitted by DMD) somewhere, so I can validate it. |
|
What exactly do you want to check? The parser is fairly forgiving but could likely be adapted to strictly validate it's input. Is this a useful goal when we're post-processing the output anyhow? |
Pending PR eBookingServices/htmld#8. |
Things like syntax errors (unescaped
Yes, absolutely. These errors often mask larger problems that post-processing can only make worse. |
aa0da08 to
cb25d96
Compare
|
Ready from my side. |
|
Chrome was planning to get hyphenation early this year. The last update from a few days ago for Chromium was:
So they are actively working on it now but no ETA. I think this idea is clever but, personally, I think it'd be better to just wait. Nobody seems to know what effect this will have, if any, on search engine ranking. |
They haven't done this since 2012, and a full-blown hyphenation support (including arabic, and spelling rewrites for german) is quite more complex than just using tex hyphenation patterns.
A small search reveals that search engines are very well capable to ignore |
|
Thanks, Martin! I like the idea of postprocessing. Took a look at the generated docs, they look beautiful. I'm unsure about hyphenation of function names, e.g. http://dtest.thecybershadow.net/artifact/website-502ec4a93049bfa74cfaa864418a7c3c9d064b76-dc749816785e8de0a55b98a287cf060c/web/phobos-prerelease/std_algorithm.html hyphenates "commonPrefix" and "filterBidirectional" etc. These particular hyphenations look nice but in general function/class/struct/etc names are not English (contain abbreviations, initials etc) so they shouldn't be hyphenated as English words. (In my book I only hyphenated such names by hand, in a few instances when text looked really ugly without.) Other than that, cool. I'm a bit weary about making the build process depend on an external library, but I guess that's the way to go. |
posix.mak
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't a dependency here be also on html? You need the html (of the site proper) done before you do the hyphenation.
|
For me the current unjustified text looks quite a bit better, especially on the homepage. Why all this complexity in the first place? |
|
https://kaiweber.wordpress.com/2010/05/31/ragged-right-or-justified-alignment/ seems to be a good source of information. The way I look at it is, if text width is low OR hyphenation is not available, then justified text is a bad choice and should be avoided. So this PR introduces hyphenation, which clears one aspect. I'm not sure about text width - I reduced the browser window to the minimum possible and (for the few pages I looked at) justification does not seem to produce unpleasant lakes and rivers, and also greatly improves information density. I like justification but only when done well. It's like coffee - we consumed it for hundreds of years and no study managed to find anything bad with it, when prepared well. People have set words on paper for others to see for hundreds of years. Hyphenation had good economic incentive (less consumed paper) so it was worth for typographers to invest in it. But there is no economic incentive for justification, yet typographers have spent considerable research and development to do it well. For hundreds of years, in virtually all interior sizes and designs. That's ample anecdotal evidence. I do agree that justification (correx per @klickverbot: hyphenation, not justification) of electronic documents has been prevalently bad in recent years (most often it's done without hyphenation). That may have trained us to reject it wholesale, which is unwarranted. Getting back to the here and now. I think: (a) things have evolved to the point browsers do a decent layout of hyphenated justified text when columns are not too narrow; (b) this is an interesting differentiating feature of our pages; (c) the framework for postprocessing generated pages is a nice additional incentive. So I'm in favor of this. |
|
Oh, I'm not arguing against justification per se. I've done quite a bit of "serious" print design and layout work, and most of the time the body copy would have been justified. It's just that to my (rather trained) eyes it does not look particularly good in the current home page design anyway – probably because there is no consistent grid for the various elements, especially the Convenience/Power/Efficiency blocks –, so I'm not sure whether it's worth the added complexity in the build process. If you don't mind the added steps and dependencies, then feel free to go ahead with this – it certainly doesn't look terrible either. I know it's been a long-time desire of yours, and at least we seem to have a solution now that's technically acceptable. We should probably disable hyphenation for function names, though (as you have already pointed out), and possibly also other symbols like language grammar references. As a note aside, and following the academic tradition of waging intellectually intense but utterly insignificant discussions, let me point out that your claim that
is wrong as per your own statements, at least taking "words per screen area" as the definition for information density. It's hyphenation that leads to better use of layout space, not justification (barring different hyphenation engine settings between the ragged and justified cases, of course). |
|
With hyphenate.js we had issues where text copied from the web page would contain these hidden Another issue is that this makes documentation diffs harder to review - looking at the diffs generated by the doc autotester you'll see the Honestly, considering that only Chrome doesn't support built-in hyphenation and they plan to add it, this seems to me like a solution to a non-problem. |
@klickverbot yes, sorry I meant hyphenation |
@klickverbot @CyberShadow I think this becomes a discussion of the framework's value. (1) If the framework will have many future uses, hyphenation is just a first application, a proof of concept that we can later keep or phase out. (2) If the framework has only this one use, it counts as a liability rather than an asset on this PR's pros and cons sheet. @MartinNowak could you enlist a few more possible future uses of your framework? And thanks very much for the work! |
Of course there shouldn't be any hyphenation in code examples, this PR adds a few more dont_hyphenate classes. The dependency argument is mood, we use a pinned version of the well written and simple htmld library which has no further dependencies other than phobos, and I hadn't updated the hyphenate library in 2 or 3 years. The times when D was so unstable that you couldn't rely on libraries is over.
Static TOC generation, automatic cross-referencing (2-pass process), spell checker, extraction of keywords. I'd like to see that we put more effort into dpl-docs which can easily do all of the above, and I think all the effort on nicer ddoc output was a success but also a waste of time. Work we put into ddox improves docs for dlang.org and many other D libraries. For the time being let's just do it, progress in chrome is blocked atm., and they haven't been able to implement this in the past 5 years. Regarding hyphenation of function names, I already disabled hyphenation for any code blocks (and also the grammar). If you find something that's missing, let's simply add it. |
I think it should actually be placed under web/ but excluded from rsync, so that it's inspectable, shows up in autotester diffs, but not actually uploaded to dlang.org. |
Showing not the actual diff might be misleading if we start to do more w/ this. |
c28f581 to
fdc6851
Compare
The idea is to show both. The diffs after running this tool are difficult to review, because of all the inserted |
If you really think it's that important, I can try to keep a copy of the original html files. |
|
I just think it makes more sense. If HTML validation results in an error, you won't be able to see the generated HTML in the doc autotester otherwise. |
|
What's left to do here? |
Can we just turn it into a 2-step process for your tester @CyberShadow? I guess that's the main reason why this is still blocked. |
- add soft hyphens to text - justify text on all browsers - use htmld and hyphenate libs w/ en-US pattern - run dpl-docs w/ hyphenation
That will be much less of an issue once the initial conversion is done. |
|
Well, shall we decide on this YTD? I'm in favor. @MartinNowak, any bitrot to worry about? |
|
Chrome supports hyphens as of this month. Was there any other browser that
needed this emulation?
…On Sat, Dec 24, 2016, 3:18 AM Andrei Alexandrescu ***@***.***> wrote:
Well, shall we decide on this YTD? I'm in favor. @MartinNowak
<https://github.com/MartinNowak>, any bitrot to worry about?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1214 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAEjezfbZGCL5JC7mZ4FjFTCitTD-sMGks5rLPFugaJpZM4HLv63>
.
|
|
We now use a Ddoc preprocessor, so adding a post-processor won't be too hard. |
|
Is this @MartinNowak 's code based on TeX's hyphenation algorithm? That's awesome!! I'm literally reading right now a book (the famed "Fire and Fury" incidentally) on the Kindle. On my portable Paperwhite, there's no support for hyphenation. However the text is still justified, and looks horrible. On the Kindle laptop application, the display fits two pages at about the same pitch size, also justified, but beautifully hyphenated. Night and day difference. The net consequence is I lug my laptop with me wherever I can if I want to read the book. I can't bring myself to read on the Paperwhite anymore. I'm very much in favor of adding static hyphenation to our docs, they'll look a lot better on portables and small screens. |
andralex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll approve this in hope it gets attention :)