New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NGrams doesnt support words with hyphen and slash in English #656
Comments
The root cause of this behaviour is the tokenizer. Will look into adapting the tokenizer to support these characters. |
There is also other issues in other language.
For instance the Eszett symbol ß is also not supported.
If you can also check, it will be nice.
Thanks
…On Mon, May 29, 2023 at 8:11 PM Hugo ter Doest ***@***.***> wrote:
The root cause of this behaviour is the tokenizer. Will look into adapting
the tokenizer to support these characters.
—
Reply to this email directly, view it on GitHub
<#656 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHL44R6OWCQAPK5LD7FEUDXITRD7ANCNFSM6AAAAAARMJBVLY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hugo-ter-Doest
added a commit
that referenced
this issue
Nov 26, 2023
Solved in #706 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There are a few words in English that contain hyphen or slash
Example:
It would be great if Natural could manage these cases.
Output: [["links"], ["text"], ["based"], ["opposed"], ["image"], ["based"], ["links"], ["CTA"], ["s"]]
The text was updated successfully, but these errors were encountered: