-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyo3_runtime.PanicException occurs in Morpheme.surface() after calling Morpheme.split() #182
Comments
Moved the issue to Sudachi.rs repo. |
Seems to be the problem with indices recalculation around Python bindings and surface normalization. |
@hiroshi-matsuda-rit should be fixed after #183 is merged Also, doing
|
I'd like to refactor this logic if there is any method to identify the byte offset of the beginning of each morpheme.
To reduce the allocation costs, it's better to call the split-analysis API with a buffer instance argument like:
|
Python operates on codepoint offsets though. It is possible to get those (
I'm leaning towards using |
* add test infrastructure for using non-built dictionaries * correctly resolve boundaries wrt normalization on nodes splitting fixes #182
I found an input pattern which causes exception in
sudachipy==0.6.0
.The reproducing code below is the abstract of the Japanese tokenizer of spaCy v3.2.0.
The text was updated successfully, but these errors were encountered: