Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Punctuation wrongly affects character count for hyphenation #109
Normally short words aren't ever split across lines and hyphenated, but if they have adjacent punctuation then WeasyPrint (version 0.19.2) wrongly treats them as though they were a longer word.
We've found “(LST)” being split as “(L-” at the end of one line and “ST)” at the start of the next. Evidence that the parens are being counted as word characters: setting this property avoids “(LST)” being split:
(But obviously that could still hyphenate a 4-letter word with 2 adjacent punctuation marks. And in the general case requires setting
CSS says to strip punctuation characters between words for counting their characters: http://dev.w3.org/csswg/css-text-4/#hyphenate-char-limits
Pyphen say that punctuation-stripping should be done outside of Pyphen:
Let me know if you'd like a sample document showing this happening.
Of course, it may also possible to write a quick fix for this bug, but rewriting the whole module will be necessary for the other bugs.