Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several unicode whitespace characters are not supported #7

Closed
Henry-Sarabia opened this issue Mar 2, 2019 · 0 comments
Closed

Several unicode whitespace characters are not supported #7

Henry-Sarabia opened this issue Mar 2, 2019 · 0 comments
Labels
enhancement New feature or request
Milestone

Comments

@Henry-Sarabia
Copy link
Owner

After some research, it seems the regexp package might not be the best fit for identifying whitespace. Although it does catch some control characters, there are others that are not caught by the regular expression.

The escaped vertical tab character '\v' is whitespace but is not currently identified as such.
The unicode non-breaking space character U+00A0 is also not identified as whitespace.

There are definitely other similar cases. To solve this issue, it may be prudent to replace the regular expression with a more naive but thorough individual rune iteration and inspection.

@Henry-Sarabia Henry-Sarabia added the enhancement New feature or request label Mar 2, 2019
This was referenced Mar 3, 2019
Merged
Merged
@Henry-Sarabia Henry-Sarabia added this to the v2.0.0 milestone Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant