Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Punctuation wrongly affects character count for hyphenation #109

Closed
Smylers opened this Issue Jul 10, 2013 · 2 comments

Comments

2 participants
@Smylers
Copy link
Contributor

Smylers commented Jul 10, 2013

Normally short words aren't ever split across lines and hyphenated, but if they have adjacent punctuation then WeasyPrint (version 0.19.2) wrongly treats them as though they were a longer word.

We've found “(LST)” being split as “(L-” at the end of one line and “ST)” at the start of the next. Evidence that the parens are being counted as word characters: setting this property avoids “(LST)” being split:

-weasy-hyphenate-limit-chars: 6 3;

(But obviously that could still hyphenate a 4-letter word with 2 adjacent punctuation marks. And in the general case requires setting hyphenate-limit-chars higher than you wish, thereby also disallowing hyphenating some words without punctuation which you'd wish to allow.)

CSS says to strip punctuation characters between words for counting their characters: http://dev.w3.org/csswg/css-text-4/#hyphenate-char-limits

Pyphen say that punctuation-stripping should be done outside of Pyphen:
Kozea/Pyphen#4

Let me know if you'd like a sample document showing this happening.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jul 10, 2013

Yes, that's a problem. We must rewrite the way WP handles text, that's on our TODO-list, and there are some annoying bugs related to that limitation (#74, #100, #106).

Of course, it may also possible to write a quick fix for this bug, but rewriting the whole module will be necessary for the other bugs.

@liZe

This comment has been minimized.

Copy link
Member

liZe commented Jan 2, 2019

Now that #74 and #100 are fixed, it was easier to handle this case.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Feb 21, 2019

py-weasyprint: Update to 45.
Version 45
----------

Released on 2019-02-20.

WeasyPrint now has a `code of conduct
<https://github.com/Kozea/WeasyPrint/blob/master/CODE_OF_CONDUCT.rst>`_.

A new website has been launched, with beautiful and useful graphs about speed
and memory use across versions: check `WeasyPerf
<https://kozea.github.io/WeasyPerf/index.html>`_.

Dependencies:

* Python 3.5+ is now needed, Python 3.4 is not supported anymore

Bug fixes:

* `798 <https://github.com/Kozea/WeasyPrint/pull/798>`_:
  Prevent endless loop and index out of range in pagination
* `767 <https://github.com/Kozea/WeasyPrint/issues/767>`_:
  Add a ``--quiet`` CLI parameter
* `784 <https://github.com/Kozea/WeasyPrint/pull/784>`_:
  Fix library loading on Alpine
* `791 <https://github.com/Kozea/WeasyPrint/pull/791>`_:
  Use path2url in tests for Windows
* `789 <https://github.com/Kozea/WeasyPrint/pull/789>`_:
  Add LICENSE file to distributed sources
* `788 <https://github.com/Kozea/WeasyPrint/pull/788>`_:
  Fix pending references
* `780 <https://github.com/Kozea/WeasyPrint/issues/780>`_:
  Don't draw patterns for empty page backgrounds
* `774 <https://github.com/Kozea/WeasyPrint/issues/774>`_:
  Don't crash when links include quotes
* `637 <https://github.com/Kozea/WeasyPrint/issues/637>`_:
  Fix a problem with justified text
* `763 <https://github.com/Kozea/WeasyPrint/pull/763>`_:
  Launch tests with Python 3.7
* `704 <https://github.com/Kozea/WeasyPrint/issues/704>`_:
  Fix a corner case with tables
* `804 <https://github.com/Kozea/WeasyPrint/pull/804>`_:
  Don't logger handlers defined before importing WeasyPrint
* `109 <https://github.com/Kozea/WeasyPrint/issues/109>`_,
  `748 <https://github.com/Kozea/WeasyPrint/issues/748>`_:
  Don't include punctuation for hyphenation
* `770 <https://github.com/Kozea/WeasyPrint/issues/770>`_:
  Don't crash when people use uppercase words from old-fashioned Microsoft
  fonts in tables, especially when there's an 5th column
* Use a `separate logger
  <https://weasyprint.readthedocs.io/en/latest/tutorial.htmllogging>`_ to
  report the rendering process
* Add a ``--debug`` CLI parameter and set debug level for unknown prefixed CSS
  properties
* Define minimal versions of Python and setuptools in setup.cfg

Documentation

* `796 <https://github.com/Kozea/WeasyPrint/pull/796>`_:
  Fix a small typo in the tutorial
* `792 <https://github.com/Kozea/WeasyPrint/pull/792>`_:
  Document no alignement character support
* `773 <https://github.com/Kozea/WeasyPrint/pull/773>`_:
  Fix phrasing in Hacking section
* `402 <https://github.com/Kozea/WeasyPrint/issues/402>`_:
  Add a paragraph about fontconfig error
* `764 <https://github.com/Kozea/WeasyPrint/pull/764>`_:
  Fix list of dependencies for Alpine
* Fix API documentation of HTML and CSS classes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.