Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word Boundary detection #156

Open
Entomy opened this issue Mar 29, 2020 · 2 comments
Open

Word Boundary detection #156

Entomy opened this issue Mar 29, 2020 · 2 comments
Labels
🛠 Enhancement New feature or request 👨🏻‍🎓 Good First Issue Good for newcomers 🆘 Help Wanted Extra attention is needed

Comments

@Entomy
Copy link
Owner

Entomy commented Mar 29, 2020

Methods like Words() are supposed to be splitting... words. But they don't. They split on spaces, which isn't necessarily the only boundary. Also, Words() should be removing non word components, but it's not.

In order to do this, a proper implementation of word boundary detection is required. UAX 21.4 describes this.

@Entomy Entomy transferred this issue from Entomy/LibLangly Apr 1, 2020
@Entomy
Copy link
Owner Author

Entomy commented Apr 3, 2020

this and this describe an issue with zwsp along with the debate around it. I've settled on a solution involving keeping the Cf classification instead of Zs, but also ensuring that it is detected as a word boundary. So zwsp (U+200B) absolutely must be recognized that way.

@Entomy Entomy transferred this issue from another repository Sep 14, 2020
@Entomy Entomy transferred this issue from Entomy/LibLangly Sep 14, 2020
@Entomy
Copy link
Owner Author

Entomy commented Sep 14, 2020

Appologies for the transfer spam. This definately belongs here now.

@Entomy Entomy transferred this issue from another repository Nov 4, 2020
@Entomy Entomy added 🆘 Help Wanted Extra attention is needed 👨🏻‍🎓 Good First Issue Good for newcomers 🛠 Enhancement New feature or request labels Nov 5, 2020
Repository owner deleted a comment from issue-label-bot bot Nov 5, 2020
Repository owner deleted a comment from issue-label-bot bot Nov 5, 2020
@Entomy Entomy added this to the v6.0 milestone Feb 18, 2021
@Entomy Entomy removed this from the v6.0 milestone Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🛠 Enhancement New feature or request 👨🏻‍🎓 Good First Issue Good for newcomers 🆘 Help Wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant