Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.string.wrap should conform to Unicode line-breaking algorithm #9944

Open
dlangBugzillaToGithub opened this issue Dec 17, 2012 · 0 comments

Comments

@dlangBugzillaToGithub
Copy link

hsteoh reported this on 2012-12-17T13:24:08Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=9173

CC List

Description

Currently, there are some issues with std.string.wrap:

1) It uses std.uni.isWhite as criterion for line-breaking opportunities, but isWhite includes such things as non-breaking space, which should *not* be wrapped. It also includes things like vowel mark separators, which shouldn't be wrapped, either.

2) It does not take zero-width characters and combining diacritics into account when counting columns, which means that it will sometimes wrap the line at the wrong place.

3) It does not wrap CJK text or Thai text correctly.

For reference, here's the Unicode technical reference that describes proper line-breaking of Unicode text:

http://www.unicode.org/reports/tr14/

(After having read through TR14, I was in awe at how insanely complicated line-wrapping in Unicode is. So I'd propose that, if nothing else, we should fix items (1) and (2) above, which should be within the reach of a relatively simple-to-implement European-centric line wrapping algorithm. People who want CJK wrapping or other complicated stuff probably want to be writing their own algo anyway.)
@LightBender LightBender removed the P4 label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants