Correctly split three-or-more byte sequences of UTF-8 #2123

BenWiederhake · 2024-03-07T21:06:19Z

The underlying bug was the assumption that uft8.DecodeLastRuneInString returns some kind of number of bytes that, when stripped from the end, leaves the string with a correct ending.

In reality, this function always returns the constant value 1 if the last rune is not valid.

Therefore, if there are two or more partial bytes of a three-or-more byte rune, this used to give the wrong result.

Found while trying to implement a related feature.

codeclimate · 2024-03-07T21:07:01Z

Code Climate has analyzed commit d9c1df7 and detected 0 issues on this pull request.

View more on Code Climate.

42wim · 2024-05-23T22:02:24Z

Thanks 👍

Correctly split three-or-more byte sequences of UTF-8

d9c1df7

BenWiederhake mentioned this pull request Mar 7, 2024

Discord: Split messages if necessary #2124

Merged

42wim merged commit d055b45 into 42wim:master May 23, 2024

42wim added this to the 1.27.0 milestone May 23, 2024

BenWiederhake deleted the dev-split-utf8-correctly branch May 24, 2024 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly split three-or-more byte sequences of UTF-8 #2123

Correctly split three-or-more byte sequences of UTF-8 #2123

BenWiederhake commented Mar 7, 2024

codeclimate bot commented Mar 7, 2024

42wim commented May 23, 2024

Correctly split three-or-more byte sequences of UTF-8 #2123

Correctly split three-or-more byte sequences of UTF-8 #2123

Conversation

BenWiederhake commented Mar 7, 2024

codeclimate bot commented Mar 7, 2024

42wim commented May 23, 2024