hyphenate text output #1214

MartinNowak · 2016-01-25T16:26:01Z

justify text on all browsers
use htmld and hyphenate libs w/ en-US pattern
run dpl-docs w/ hyphenation

MartinNowak · 2016-01-25T16:26:29Z

Based on #1213

MartinNowak · 2016-01-25T16:29:36Z

@CyberShadow can you easily run dub clean-caches on the tester?
We just tagged ddox-0.12.1 and the tester doesn't yet know about it.
Otherwise we'll have to wait a day until the cache gets invalidated.

JackStouffer · 2016-01-25T17:49:02Z

Is this really better than just waiting for Blink to implement hyphens? I mean, what's so bad about Chrome and Opera having left aligned text?

This adds so much extra stuff for a tiny detail.

MartinNowak · 2016-01-25T17:59:22Z

Is this really better than just waiting for Blink to implement hyphens? I mean, what's so bad about Chrome and Opera having left aligned text?

This adds so much extra stuff for a tiny detail.

It's not much (+129 −34, we already have the libraries, both of which are simple and stable) and chrome is unlikely to implement it soon.
Let's not derail into a pro/con justified text debate, one of the main parts here is html postprocessing which can be useful for other things.

CyberShadow · 2016-01-25T19:15:51Z

@CyberShadow can you easily run dub clean-caches on the tester?

Done

CyberShadow · 2016-01-25T19:16:31Z

But it seems to me that it is a bug in dub if it doesn't re-check online when it sees a tag it has never seen before.

MartinNowak · 2016-01-25T21:44:24Z

But it seems to me that it is a bug in dub if it doesn't re-check online when it sees a tag it has never seen before.

Yes, we have to fix it.
Fetch doesn't work for recently updated package · Issue #528 · D-Programming-Language/dub

CyberShadow · 2016-01-25T22:40:12Z

This changes the output drastically.

Among other things, the output is no longer HTML5. Edit: OK, not really, but I've gotten void tags to be uniformly not self-closed during my valid HTML pass a few months ago.

CyberShadow · 2016-01-25T22:53:00Z

Since this parses the HTML, can it also validate it? If not, I'd like it to keep the original HTML (as emitted by DMD) somewhere, so I can validate it.

MartinNowak · 2016-01-26T04:17:59Z

What exactly do you want to check? The parser is fairly forgiving but could likely be adapted to strictly validate it's input. Is this a useful goal when we're post-processing the output anyhow?

MartinNowak · 2016-01-26T09:04:00Z

OK, not really, but I've gotten void tags to be uniformly not self-closed during my valid HTML pass a few months ago.

Pending PR eBookingServices/htmld#8.

CyberShadow · 2016-01-26T09:53:05Z

What exactly do you want to check?

Things like syntax errors (unescaped <>&) and mismatched/unclosed tags. You can look at my HTML fixes PRs, they were detected by a tool.

Is this a useful goal when we're post-processing the output anyhow?

Yes, absolutely. These errors often mask larger problems that post-processing can only make worse.

MartinNowak · 2016-01-27T14:25:54Z

Ready from my side.

brad-anderson · 2016-01-27T21:25:19Z

Chrome was planning to get hyphenation early this year. The last update from a few days ago for Chromium was:

We are currently blocked on an upstream dependency: the hyphenation library
we are planning to use in chromium, which needs to be cleaned up before we
can open-source it.

Unfortunately, I have no progress to report yet; I'm planning to sit down
with the library developer soon, and will post back here with an update
when I have it.

So they are actively working on it now but no ETA.

I think this idea is clever but, personally, I think it'd be better to just wait. Nobody seems to know what effect this will have, if any, on search engine ranking.

MartinNowak · 2016-01-27T22:06:55Z

So they are actively working on it now but no ETA.

They haven't done this since 2012, and a full-blown hyphenation support (including arabic, and spelling rewrites for german) is quite more complex than just using tex hyphenation patterns.
If they're progressing, nice, but I would expect anything any time soon.

Nobody seems to know what effect this will have, if any, on search engine ranking.

A small search reveals that search engines are very well capable to ignore , but a few (google) will use it to additionally index split words.
SEO writing should not affect spelling

andralex · 2016-02-03T03:27:03Z

Thanks, Martin! I like the idea of postprocessing. Took a look at the generated docs, they look beautiful.

I'm unsure about hyphenation of function names, e.g. http://dtest.thecybershadow.net/artifact/website-502ec4a93049bfa74cfaa864418a7c3c9d064b76-dc749816785e8de0a55b98a287cf060c/web/phobos-prerelease/std_algorithm.html hyphenates "commonPrefix" and "filterBidirectional" etc. These particular hyphenations look nice but in general function/class/struct/etc names are not English (contain abbreviations, initials etc) so they shouldn't be hyphenated as English words. (In my book I only hyphenated such names by hand, in a few instances when text looked really ugly without.)

Other than that, cool. I'm a bit weary about making the build process depend on an external library, but I guess that's the way to go.

andralex · 2016-02-03T03:31:57Z

posix.mak

Shouldn't a dependency here be also on html? You need the html (of the site proper) done before you do the hyphenation.

dnadlinger · 2016-02-03T04:08:15Z

For me the current unjustified text looks quite a bit better, especially on the homepage. Why all this complexity in the first place?

andralex · 2016-02-05T18:12:40Z

https://kaiweber.wordpress.com/2010/05/31/ragged-right-or-justified-alignment/ seems to be a good source of information. The way I look at it is, if text width is low OR hyphenation is not available, then justified text is a bad choice and should be avoided.

So this PR introduces hyphenation, which clears one aspect. I'm not sure about text width - I reduced the browser window to the minimum possible and (for the few pages I looked at) justification does not seem to produce unpleasant lakes and rivers, and also greatly improves information density.

I like justification but only when done well. It's like coffee - we consumed it for hundreds of years and no study managed to find anything bad with it, when prepared well. People have set words on paper for others to see for hundreds of years. Hyphenation had good economic incentive (less consumed paper) so it was worth for typographers to invest in it. But there is no economic incentive for justification, yet typographers have spent considerable research and development to do it well. For hundreds of years, in virtually all interior sizes and designs. That's ample anecdotal evidence.

I do agree that justification (correx per @klickverbot: hyphenation, not justification) of electronic documents has been prevalently bad in recent years (most often it's done without hyphenation). That may have trained us to reject it wholesale, which is unwarranted.

Getting back to the here and now. I think: (a) things have evolved to the point browsers do a decent layout of hyphenated justified text when columns are not too narrow; (b) this is an interesting differentiating feature of our pages; (c) the framework for postprocessing generated pages is a nice additional incentive. So I'm in favor of this.

dnadlinger · 2016-02-05T20:08:37Z

Oh, I'm not arguing against justification per se. I've done quite a bit of "serious" print design and layout work, and most of the time the body copy would have been justified. It's just that to my (rather trained) eyes it does not look particularly good in the current home page design anyway – probably because there is no consistent grid for the various elements, especially the Convenience/Power/Efficiency blocks –, so I'm not sure whether it's worth the added complexity in the build process.

If you don't mind the added steps and dependencies, then feel free to go ahead with this – it certainly doesn't look terrible either. I know it's been a long-time desire of yours, and at least we seem to have a solution now that's technically acceptable. We should probably disable hyphenation for function names, though (as you have already pointed out), and possibly also other symbols like language grammar references.

As a note aside, and following the academic tradition of waging intellectually intense but utterly insignificant discussions, let me point out that your claim that

justification […] also greatly improves information density

is wrong as per your own statements, at least taking "words per screen area" as the definition for information density. It's hyphenation that leads to better use of layout space, not justification (barring different hyphenation engine settings between the ragged and justified cases, of course).

CyberShadow · 2016-02-05T21:30:46Z

With hyphenate.js we had issues where text copied from the web page would contain these hidden  characters, and you would get mysterious compiler errors if you tried to paste them in code files. Will this cause such issues all over again?

Another issue is that this makes documentation diffs harder to review - looking at the diffs generated by the doc autotester you'll see the  noise all over. (Yeah, they could be filtered out, but then the diffs would no longer represent what's actually going to go up on dlang.org.)

Honestly, considering that only Chrome doesn't support built-in hyphenation and they plan to add it, this seems to me like a solution to a non-problem.

andralex · 2016-02-06T16:00:52Z

justification […] also greatly improves information density

is wrong as per your own statements

@klickverbot yes, sorry I meant hyphenation

andralex · 2016-02-06T16:04:56Z

I'm not sure whether it's worth the added complexity in the build process.

Honestly, considering that only Chrome doesn't support built-in hyphenation and they plan to add it, this seems to me like a solution to a non-problem.

@klickverbot @CyberShadow I think this becomes a discussion of the framework's value.

(1) If the framework will have many future uses, hyphenation is just a first application, a proof of concept that we can later keep or phase out.

(2) If the framework has only this one use, it counts as a liability rather than an asset on this PR's pros and cons sheet.

@MartinNowak could you enlist a few more possible future uses of your framework? And thanks very much for the work!

MartinNowak · 2016-02-13T15:30:21Z

With hyphenate.js we had issues where text copied from the web page would contain these hidden characters, and you would get mysterious compiler errors if you tried to paste them in code files. Will this cause such issues all over again?

Of course there shouldn't be any hyphenation in code examples, this PR adds a few more dont_hyphenate classes.

The dependency argument is mood, we use a pinned version of the well written and simple htmld library which has no further dependencies other than phobos, and I hadn't updated the hyphenate library in 2 or 3 years. The times when D was so unstable that you couldn't rely on libraries is over.
There is also nothing complex or complicated about parsing html and processing text elements.

could you enlist a few more possible future uses of your framework?

Static TOC generation, automatic cross-referencing (2-pass process), spell checker, extraction of keywords.
A lot of things are possible w/ html post-processing, but it remains a kludge to recover structural information from the html output.

I'd like to see that we put more effort into dpl-docs which can easily do all of the above, and I think all the effort on nicer ddoc output was a success but also a waste of time. Work we put into ddox improves docs for dlang.org and many other D libraries.
See how simple hyphenation and static higlighting was in ddox.
dlang/ddox#112
dlang/ddox#104

For the time being let's just do it, progress in chrome is blocked atm., and they haven't been able to implement this in the past 5 years.
At the same time if we find this to cause too many issues we can easily disable or revert it.
I have no sympathy for these endless pseudo-strategical discussions on unimportant details.

Regarding hyphenation of function names, I already disabled hyphenation for any code blocks (and also the grammar). If you find something that's missing, let's simply add it.

CyberShadow · 2016-02-13T15:47:16Z

Since this parses the HTML, can it also validate it? If not, I'd like it to keep the original HTML (as emitted by DMD) somewhere, so I can validate it.

I think it should actually be placed under web/ but excluded from rsync, so that it's inspectable, shows up in autotester diffs, but not actually uploaded to dlang.org.

MartinNowak · 2016-02-13T16:07:06Z

I think it should actually be placed under web/ but excluded from rsync, so that it's inspectable, shows up in autotester diffs, but not actually uploaded to dlang.org.

Showing not the actual diff might be misleading if we start to do more w/ this.
Why not add validation as an intermediate step? After all you can run make html..., validate, make hypenate.

CyberShadow · 2016-02-13T18:19:52Z

Showing not the actual diff might be misleading if we start to do more w/ this.

The idea is to show both.

The diffs after running this tool are difficult to review, because of all the inserted s.

MartinNowak · 2016-02-14T16:45:03Z

The diffs after running this tool are difficult to review, because of all the inserted s.

If you really think it's that important, I can try to keep a copy of the original html files.

CyberShadow · 2016-02-14T17:34:24Z

I just think it makes more sense.

If HTML validation results in an error, you won't be able to see the generated HTML in the doc autotester otherwise.

DmitryOlshansky · 2016-04-10T07:52:24Z

What's left to do here?

MartinNowak · 2016-08-05T12:21:01Z

I think it should actually be placed under web/ but excluded from rsync, so that it's inspectable, shows up in autotester diffs, but not actually uploaded to dlang.org.

Can we just turn it into a 2-step process for your tester @CyberShadow? I guess that's the main reason why this is still blocked.
You could run make -f posix.mak doc html, then generate the diffs, then call make -f posix.mak html-postprocess?

- add soft hyphens to text - justify text on all browsers - use htmld and hyphenate libs w/ en-US pattern - run dpl-docs w/ hyphenation

MartinNowak · 2016-08-05T13:42:55Z

The diffs after running this tool are difficult to review, because of all the inserted s.

That will be much less of an issue once the initial conversion is done.

andralex · 2016-12-24T10:18:22Z

Well, shall we decide on this YTD? I'm in favor. @MartinNowak, any bitrot to worry about?

brad-anderson · 2016-12-24T17:55:52Z

Chrome supports hyphens as of this month. Was there any other browser that needed this emulation?

…

On Sat, Dec 24, 2016, 3:18 AM Andrei Alexandrescu ***@***.***> wrote: Well, shall we decide on this YTD? I'm in favor. @MartinNowak <https://github.com/MartinNowak>, any bitrot to worry about? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1214 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEjezfbZGCL5JC7mZ4FjFTCitTD-sMGks5rLPFugaJpZM4HLv63> .

wilzbach · 2018-01-19T00:21:38Z

We now use a Ddoc preprocessor, so adding a post-processor won't be too hard.

andralex · 2018-01-19T00:26:19Z

Is this @MartinNowak 's code based on TeX's hyphenation algorithm? That's awesome!!

I'm literally reading right now a book (the famed "Fire and Fury" incidentally) on the Kindle. On my portable Paperwhite, there's no support for hyphenation. However the text is still justified, and looks horrible. On the Kindle laptop application, the display fits two pages at about the same pitch size, also justified, but beautifully hyphenated. Night and day difference. The net consequence is I lug my laptop with me wherever I can if I want to read the book. I can't bring myself to read on the Paperwhite anymore.

I'm very much in favor of adding static hyphenation to our docs, they'll look a lot better on portables and small screens.

andralex

I'll approve this in hope it gets attention :)

MartinNowak force-pushed the hyphenate branch from f8d01df to f51b124 Compare January 25, 2016 18:05

MartinNowak force-pushed the hyphenate branch from f51b124 to 4d6369b Compare January 25, 2016 21:44

MartinNowak force-pushed the hyphenate branch 5 times, most recently from aa0da08 to cb25d96 Compare January 27, 2016 00:10

MartinNowak assigned andralex Jan 27, 2016

andralex reviewed Feb 3, 2016
View reviewed changes

MartinNowak force-pushed the hyphenate branch 2 times, most recently from c28f581 to fdc6851 Compare February 13, 2016 16:37

post-process/hyphenate html output

a679a69

- add soft hyphens to text - justify text on all browsers - use htmld and hyphenate libs w/ en-US pattern - run dpl-docs w/ hyphenation

MartinNowak force-pushed the hyphenate branch from fdc6851 to a679a69 Compare August 5, 2016 13:42

CyberShadow mentioned this pull request Feb 23, 2017

Add assert -> writeln transformation magic for runnable unittest examples #1582

Merged

wilzbach added the Needs Work label Feb 28, 2017

dlang-bot added Needs Rebase stalled labels Jan 1, 2018

andralex approved these changes Jan 19, 2018

View reviewed changes

MartinNowak closed this Feb 28, 2021

Uh oh!

hyphenate text output #1214

hyphenate text output #1214

Uh oh!

Conversation

MartinNowak commented Jan 25, 2016

Uh oh!

MartinNowak commented Jan 25, 2016

Uh oh!

MartinNowak commented Jan 25, 2016

Uh oh!

JackStouffer commented Jan 25, 2016

Uh oh!

MartinNowak commented Jan 25, 2016

Uh oh!

CyberShadow commented Jan 25, 2016

Uh oh!

CyberShadow commented Jan 25, 2016

Uh oh!

MartinNowak commented Jan 25, 2016

Uh oh!

CyberShadow commented Jan 25, 2016

Uh oh!

CyberShadow commented Jan 25, 2016

Uh oh!

MartinNowak commented Jan 26, 2016

Uh oh!

MartinNowak commented Jan 26, 2016

Uh oh!

CyberShadow commented Jan 26, 2016

Uh oh!

MartinNowak commented Jan 27, 2016

Uh oh!

brad-anderson commented Jan 27, 2016

Uh oh!

MartinNowak commented Jan 27, 2016

Uh oh!

andralex commented Feb 3, 2016

Uh oh!

andralex Feb 3, 2016

Choose a reason for hiding this comment

Uh oh!

dnadlinger commented Feb 3, 2016

Uh oh!

andralex commented Feb 5, 2016

Uh oh!

dnadlinger commented Feb 5, 2016

Uh oh!

CyberShadow commented Feb 5, 2016

Uh oh!

andralex commented Feb 6, 2016

Uh oh!

andralex commented Feb 6, 2016

Uh oh!

MartinNowak commented Feb 13, 2016

Uh oh!

CyberShadow commented Feb 13, 2016

Uh oh!

MartinNowak commented Feb 13, 2016

Uh oh!

CyberShadow commented Feb 13, 2016

Uh oh!

MartinNowak commented Feb 14, 2016

Uh oh!

CyberShadow commented Feb 14, 2016

Uh oh!

DmitryOlshansky commented Apr 10, 2016

Uh oh!

MartinNowak commented Aug 5, 2016

Uh oh!

MartinNowak commented Aug 5, 2016

Uh oh!

andralex commented Dec 24, 2016

Uh oh!

brad-anderson commented Dec 24, 2016 via email

Uh oh!

wilzbach commented Jan 19, 2018

Uh oh!

andralex commented Jan 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

andralex commented Jan 19, 2018 •

edited

Loading