Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to ignore punctuation symbols #75

Closed
glebov21 opened this issue Apr 25, 2022 · 3 comments
Closed

How to ignore punctuation symbols #75

glebov21 opened this issue Apr 25, 2022 · 3 comments
Labels

Comments

@glebov21
Copy link

Hello. I have a text: "Hello, my name is Bob. How do you do?"
I use Split(' '); and do Check for all words. But how can i ignore: comma, dot and question in Check method?

Maybe this library have properties for this? Or i must use regex?

@funex
Copy link

funex commented Apr 26, 2022

Have a look at ICU4N or ICU.Net. I suggest that you use breakiterator class to identify the word boundaries and spellcheck them one by one.
Some examples in Java

@glebov21
Copy link
Author

Have a look at ICU4N or ICU.Net. I suggest that you use breakiterator class to identify the word boundaries and spellcheck them one by one. Some examples in Java

additional library for remove punctuation? No, thanks. I will use regex or loop for this :)

@funex
Copy link

funex commented May 15, 2022

I would probably close/reject this one since "parsing" the terms to be spellchecked isn't really what this library is supposed to do. Identifying word boundaries for different locales isn't a trivial process, I would recommend a ICU C# wrapper instead. If you know before hand which locale/language will be used, then I guess regex should be sufficient for some locales.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants