Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing unicode whitespace support to String methods #10367

Conversation

straight-shoota
Copy link
Member

String#split, String#strip and String#to_i etc. only recognized ascii whitespace, this adds support for unicode whitespace.

Resolves #10312

String#split, String#strip and String#to_i etc. only recognized ascii
whitespace, this adds support for unicode whitespace.
@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:text labels Feb 6, 2021
while ptr.value.unsafe_chr.ascii_whitespace?
ptr += 1
end
ptr += calc_excess_right
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? The loop make ptr skip a bunch of whitespaces from the current position and now it will jump exactly the amount of whitespaces at the end of the string. I might be missing a reason why that would lead to the same result in this particular situation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this piece is to determine if the entire string was consumed (including trailing whitespace). If we add the size of trailing whitespace to the pointer and end up at the end of string, it's all good. But if there's some non-whitespace character between the pointer position and the end of string, adding the size of trailing white space will not move the pointer to the end of string, thus it can be detected as invalid input (if strict == true).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the entire unless ptr.value == 0 could be made conditional on strict as well because the trailing whitespace detection is unnecessary without strict.

@bcardiff bcardiff added this to the 1.0.0 milestone Feb 18, 2021
@straight-shoota straight-shoota merged commit 3f517b1 into crystal-lang:master Feb 19, 2021
@straight-shoota straight-shoota deleted the fix/string-unicode-whitespace branch February 19, 2021 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:text
Projects
None yet
Development

Successfully merging this pull request may close these issues.

String methods should be aware of unicode whitespace
2 participants