Skip to content

Conversation

@MartinNowak
Copy link
Member

  • justify text on all browsers
  • use htmld and hyphenate libs w/ en-US pattern
  • run dpl-docs w/ hyphenation

@MartinNowak
Copy link
Member Author

Based on #1213

@MartinNowak
Copy link
Member Author

@CyberShadow can you easily run dub clean-caches on the tester?
We just tagged ddox-0.12.1 and the tester doesn't yet know about it.
Otherwise we'll have to wait a day until the cache gets invalidated.

@JackStouffer
Copy link
Contributor

Is this really better than just waiting for Blink to implement hyphens? I mean, what's so bad about Chrome and Opera having left aligned text?

This adds so much extra stuff for a tiny detail.

@MartinNowak
Copy link
Member Author

Is this really better than just waiting for Blink to implement hyphens? I mean, what's so bad about Chrome and Opera having left aligned text?

This adds so much extra stuff for a tiny detail.

It's not much (+129 −34, we already have the libraries, both of which are simple and stable) and chrome is unlikely to implement it soon.
Let's not derail into a pro/con justified text debate, one of the main parts here is html postprocessing which can be useful for other things.

@CyberShadow
Copy link
Member

@CyberShadow can you easily run dub clean-caches on the tester?

Done

@CyberShadow
Copy link
Member

But it seems to me that it is a bug in dub if it doesn't re-check online when it sees a tag it has never seen before.

@MartinNowak
Copy link
Member Author

But it seems to me that it is a bug in dub if it doesn't re-check online when it sees a tag it has never seen before.

Yes, we have to fix it.
Fetch doesn't work for recently updated package · Issue #528 · D-Programming-Language/dub

@CyberShadow
Copy link
Member

This changes the output drastically.

Among other things, the output is no longer HTML5. Edit: OK, not really, but I've gotten void tags to be uniformly not self-closed during my valid HTML pass a few months ago.

@CyberShadow
Copy link
Member

Since this parses the HTML, can it also validate it? If not, I'd like it to keep the original HTML (as emitted by DMD) somewhere, so I can validate it.

@MartinNowak
Copy link
Member Author

What exactly do you want to check? The parser is fairly forgiving but could likely be adapted to strictly validate it's input. Is this a useful goal when we're post-processing the output anyhow?

@MartinNowak
Copy link
Member Author

OK, not really, but I've gotten void tags to be uniformly not self-closed during my valid HTML pass a few months ago.

Pending PR eBookingServices/htmld#8.

@CyberShadow
Copy link
Member

What exactly do you want to check?

Things like syntax errors (unescaped <>&) and mismatched/unclosed tags. You can look at my HTML fixes PRs, they were detected by a tool.

Is this a useful goal when we're post-processing the output anyhow?

Yes, absolutely. These errors often mask larger problems that post-processing can only make worse.

@MartinNowak MartinNowak force-pushed the hyphenate branch 5 times, most recently from aa0da08 to cb25d96 Compare January 27, 2016 00:10
@MartinNowak
Copy link
Member Author

Ready from my side.

@brad-anderson
Copy link
Contributor

Chrome was planning to get hyphenation early this year. The last update from a few days ago for Chromium was:

We are currently blocked on an upstream dependency: the hyphenation library
we are planning to use in chromium, which needs to be cleaned up before we
can open-source it.

Unfortunately, I have no progress to report yet; I'm planning to sit down
with the library developer soon, and will post back here with an update
when I have it.

So they are actively working on it now but no ETA.

I think this idea is clever but, personally, I think it'd be better to just wait. Nobody seems to know what effect this will have, if any, on search engine ranking.

@MartinNowak
Copy link
Member Author

So they are actively working on it now but no ETA.

They haven't done this since 2012, and a full-blown hyphenation support (including arabic, and spelling rewrites for german) is quite more complex than just using tex hyphenation patterns.
If they're progressing, nice, but I would expect anything any time soon.

Nobody seems to know what effect this will have, if any, on search engine ranking.

A small search reveals that search engines are very well capable to ignore &shy;, but a few (google) will use it to additionally index split words.
SEO writing should not affect spelling

@andralex
Copy link
Member

andralex commented Feb 3, 2016

Thanks, Martin! I like the idea of postprocessing. Took a look at the generated docs, they look beautiful.

I'm unsure about hyphenation of function names, e.g. http://dtest.thecybershadow.net/artifact/website-502ec4a93049bfa74cfaa864418a7c3c9d064b76-dc749816785e8de0a55b98a287cf060c/web/phobos-prerelease/std_algorithm.html hyphenates "commonPrefix" and "filterBidirectional" etc. These particular hyphenations look nice but in general function/class/struct/etc names are not English (contain abbreviations, initials etc) so they shouldn't be hyphenated as English words. (In my book I only hyphenated such names by hand, in a few instances when text looked really ugly without.)

Other than that, cool. I'm a bit weary about making the build process depend on an external library, but I guess that's the way to go.

posix.mak Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't a dependency here be also on html? You need the html (of the site proper) done before you do the hyphenation.

@dnadlinger
Copy link
Contributor

For me the current unjustified text looks quite a bit better, especially on the homepage. Why all this complexity in the first place?

@andralex
Copy link
Member

andralex commented Feb 5, 2016

https://kaiweber.wordpress.com/2010/05/31/ragged-right-or-justified-alignment/ seems to be a good source of information. The way I look at it is, if text width is low OR hyphenation is not available, then justified text is a bad choice and should be avoided.

So this PR introduces hyphenation, which clears one aspect. I'm not sure about text width - I reduced the browser window to the minimum possible and (for the few pages I looked at) justification does not seem to produce unpleasant lakes and rivers, and also greatly improves information density.

I like justification but only when done well. It's like coffee - we consumed it for hundreds of years and no study managed to find anything bad with it, when prepared well. People have set words on paper for others to see for hundreds of years. Hyphenation had good economic incentive (less consumed paper) so it was worth for typographers to invest in it. But there is no economic incentive for justification, yet typographers have spent considerable research and development to do it well. For hundreds of years, in virtually all interior sizes and designs. That's ample anecdotal evidence.

I do agree that justification (correx per @klickverbot: hyphenation, not justification) of electronic documents has been prevalently bad in recent years (most often it's done without hyphenation). That may have trained us to reject it wholesale, which is unwarranted.

Getting back to the here and now. I think: (a) things have evolved to the point browsers do a decent layout of hyphenated justified text when columns are not too narrow; (b) this is an interesting differentiating feature of our pages; (c) the framework for postprocessing generated pages is a nice additional incentive. So I'm in favor of this.

@dnadlinger
Copy link
Contributor

Oh, I'm not arguing against justification per se. I've done quite a bit of "serious" print design and layout work, and most of the time the body copy would have been justified. It's just that to my (rather trained) eyes it does not look particularly good in the current home page design anyway – probably because there is no consistent grid for the various elements, especially the Convenience/Power/Efficiency blocks –, so I'm not sure whether it's worth the added complexity in the build process.

If you don't mind the added steps and dependencies, then feel free to go ahead with this – it certainly doesn't look terrible either. I know it's been a long-time desire of yours, and at least we seem to have a solution now that's technically acceptable. We should probably disable hyphenation for function names, though (as you have already pointed out), and possibly also other symbols like language grammar references.

As a note aside, and following the academic tradition of waging intellectually intense but utterly insignificant discussions, let me point out that your claim that

justification […] also greatly improves information density

is wrong as per your own statements, at least taking "words per screen area" as the definition for information density. It's hyphenation that leads to better use of layout space, not justification (barring different hyphenation engine settings between the ragged and justified cases, of course).

@CyberShadow
Copy link
Member

With hyphenate.js we had issues where text copied from the web page would contain these hidden &shy; characters, and you would get mysterious compiler errors if you tried to paste them in code files. Will this cause such issues all over again?

Another issue is that this makes documentation diffs harder to review - looking at the diffs generated by the doc autotester you'll see the &#173; noise all over. (Yeah, they could be filtered out, but then the diffs would no longer represent what's actually going to go up on dlang.org.)

Honestly, considering that only Chrome doesn't support built-in hyphenation and they plan to add it, this seems to me like a solution to a non-problem.

@andralex
Copy link
Member

andralex commented Feb 6, 2016

justification […] also greatly improves information density

is wrong as per your own statements

@klickverbot yes, sorry I meant hyphenation

@andralex
Copy link
Member

andralex commented Feb 6, 2016

I'm not sure whether it's worth the added complexity in the build process.

Honestly, considering that only Chrome doesn't support built-in hyphenation and they plan to add it, this seems to me like a solution to a non-problem.

@klickverbot @CyberShadow I think this becomes a discussion of the framework's value.

(1) If the framework will have many future uses, hyphenation is just a first application, a proof of concept that we can later keep or phase out.

(2) If the framework has only this one use, it counts as a liability rather than an asset on this PR's pros and cons sheet.

@MartinNowak could you enlist a few more possible future uses of your framework? And thanks very much for the work!

@MartinNowak
Copy link
Member Author

With hyphenate.js we had issues where text copied from the web page would contain these hidden ­ characters, and you would get mysterious compiler errors if you tried to paste them in code files. Will this cause such issues all over again?

Of course there shouldn't be any hyphenation in code examples, this PR adds a few more dont_hyphenate classes.

The dependency argument is mood, we use a pinned version of the well written and simple htmld library which has no further dependencies other than phobos, and I hadn't updated the hyphenate library in 2 or 3 years. The times when D was so unstable that you couldn't rely on libraries is over.
There is also nothing complex or complicated about parsing html and processing text elements.

could you enlist a few more possible future uses of your framework?

Static TOC generation, automatic cross-referencing (2-pass process), spell checker, extraction of keywords.
A lot of things are possible w/ html post-processing, but it remains a kludge to recover structural information from the html output.

I'd like to see that we put more effort into dpl-docs which can easily do all of the above, and I think all the effort on nicer ddoc output was a success but also a waste of time. Work we put into ddox improves docs for dlang.org and many other D libraries.
See how simple hyphenation and static higlighting was in ddox.
dlang/ddox#112
dlang/ddox#104

For the time being let's just do it, progress in chrome is blocked atm., and they haven't been able to implement this in the past 5 years.
At the same time if we find this to cause too many issues we can easily disable or revert it.
I have no sympathy for these endless pseudo-strategical discussions on unimportant details.

Regarding hyphenation of function names, I already disabled hyphenation for any code blocks (and also the grammar). If you find something that's missing, let's simply add it.

@CyberShadow
Copy link
Member

Since this parses the HTML, can it also validate it? If not, I'd like it to keep the original HTML (as emitted by DMD) somewhere, so I can validate it.

I think it should actually be placed under web/ but excluded from rsync, so that it's inspectable, shows up in autotester diffs, but not actually uploaded to dlang.org.

@MartinNowak
Copy link
Member Author

I think it should actually be placed under web/ but excluded from rsync, so that it's inspectable, shows up in autotester diffs, but not actually uploaded to dlang.org.

Showing not the actual diff might be misleading if we start to do more w/ this.
Why not add validation as an intermediate step? After all you can run make html..., validate, make hypenate.

@MartinNowak MartinNowak force-pushed the hyphenate branch 2 times, most recently from c28f581 to fdc6851 Compare February 13, 2016 16:37
@CyberShadow
Copy link
Member

Showing not the actual diff might be misleading if we start to do more w/ this.

The idea is to show both.

The diffs after running this tool are difficult to review, because of all the inserted &shy;s.

@MartinNowak
Copy link
Member Author

The diffs after running this tool are difficult to review, because of all the inserted ­s.

If you really think it's that important, I can try to keep a copy of the original html files.

@CyberShadow
Copy link
Member

I just think it makes more sense.

If HTML validation results in an error, you won't be able to see the generated HTML in the doc autotester otherwise.

@DmitryOlshansky
Copy link
Member

What's left to do here?

@MartinNowak
Copy link
Member Author

I think it should actually be placed under web/ but excluded from rsync, so that it's inspectable, shows up in autotester diffs, but not actually uploaded to dlang.org.

Can we just turn it into a 2-step process for your tester @CyberShadow? I guess that's the main reason why this is still blocked.
You could run make -f posix.mak doc html, then generate the diffs, then call make -f posix.mak html-postprocess?

- add soft hyphens to text
- justify text on all browsers
- use htmld and hyphenate libs w/ en-US pattern
- run dpl-docs w/ hyphenation
@MartinNowak
Copy link
Member Author

The diffs after running this tool are difficult to review, because of all the inserted ­s.

That will be much less of an issue once the initial conversion is done.

@andralex
Copy link
Member

Well, shall we decide on this YTD? I'm in favor. @MartinNowak, any bitrot to worry about?

@brad-anderson
Copy link
Contributor

brad-anderson commented Dec 24, 2016 via email

@wilzbach
Copy link
Contributor

We now use a Ddoc preprocessor, so adding a post-processor won't be too hard.

@andralex
Copy link
Member

andralex commented Jan 19, 2018

Is this @MartinNowak 's code based on TeX's hyphenation algorithm? That's awesome!!

I'm literally reading right now a book (the famed "Fire and Fury" incidentally) on the Kindle. On my portable Paperwhite, there's no support for hyphenation. However the text is still justified, and looks horrible. On the Kindle laptop application, the display fits two pages at about the same pitch size, also justified, but beautifully hyphenated. Night and day difference. The net consequence is I lug my laptop with me wherever I can if I want to read the book. I can't bring myself to read on the Paperwhite anymore.

I'm very much in favor of adding static hyphenation to our docs, they'll look a lot better on portables and small screens.

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve this in hope it gets attention :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants