Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge charwidth and strwidth? #20816

Closed
TotalVerb opened this issue Feb 26, 2017 · 9 comments
Closed

Merge charwidth and strwidth? #20816

TotalVerb opened this issue Feb 26, 2017 · 9 comments
Labels
domain:strings "Strings!" kind:speculative Whether the change will be implemented is speculative

Comments

@TotalVerb
Copy link
Contributor

TotalVerb commented Feb 26, 2017

Since these functions are very similar, I wonder if it is worth merging them into a single function, maybe textwidth.

Alternatively, perhaps strwidth(_) can be spelt sum(charwidth, _).

@ararslan ararslan added kind:speculative Whether the change will be implemented is speculative domain:strings "Strings!" labels Feb 26, 2017
@oscardssmith
Copy link
Member

I think the first suggestion could work well. The problem with the second is that in arabic, for example, final characters make it so where a letter is in a string is important. This could cause the answer to be incorrect if someone sums length of characters. Note: this is not a purely academic concern, bad code like this was responsible for a bug that allowed a text to crash iphones.

@TotalVerb
Copy link
Contributor Author

Thanks, I was unaware of that. We should fix our implementation of strwidth then.

@oscardssmith
Copy link
Member

A quick bit of research suggests that doing it generally, properly, and in a performant manner is impossible. We have several options: not change algorithms, produce slightly wrong results for some languages. Change algorithms: Persian, Hebrew, Arabic, and others will be ~1000x slower. Throw an error if certain characters are in the input.

@TotalVerb
Copy link
Contributor Author

If someone wants a performant but incorrect algorithm, they can use sum(charwidth, _). If we are going to provide strwidth, we may as well make it correct.

@JeffBezanson
Copy link
Sponsor Member

strwidth(::Char) seems to make sense, since it could act as if you converted the Char to string, except possibly with a more efficient algorithm.

@nalimilan
Copy link
Member

+1 for merging but not recommending sum(charwidth, s). Correctness is what matters, in particular in the present case where displaying the result is likely to be much slower than computing its width.

@stevengj
Copy link
Member

It's not like charwidth is unambiguously "correct" either. utf8proc reports a reasonable value, but ultimately this depends on the font and terminal settings; see e.g. the discussion at JuliaStrings/utf8proc#83.

Can you give a specific example of a Unicode string for which sum(charwidth, s) gives an unambiguously wrong result?

@oscardssmith, I'm skeptical that incorrect glyph width (i.e. computation of on-screen character size) caused the iPhone crash. I can't find a lot of technical information on the iPhone crash you seem to be referring to, but what little I can find seems to indicate that it stemmed from incorrect computation of the in-memory width (i.e. the width in bytes) of a character, which has nothing to do with the on-screen width.

@JeffBezanson
Copy link
Sponsor Member

My point would be that deprecating charwidth to strwidth(::Char) or textwidth(::Char) seems much safer than deprecating strwidth to sum(charwidth, s). The latter assumes much more (even if that assumption is generally valid). textwidth is also extensible to other objects that might represent text, regardless of how many characters they contain or whether they are iterable.

@oscardssmith
Copy link
Member

@stevengj that may be right, I was basing it off a video by Tom Scott here https://www.youtube.com/watch?v=hJLMSllzoLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!" kind:speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests

6 participants