-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python Exception Types #216
Comments
We will try to use better exception types later |
is it possible to u16 -> u32 or more? |
@gulldan Your question is not actually related to the main issue here, which is not about length but types. You should open a new issue. Separately, from experience, tokenizers like this aren't designed for long inputs like that and you should split yours up into multiple calls. |
Adding to @polm, making the max input length to be u32::MAX will make it possible for Sudachi to crash with OOM because memory usage for long sentences will be very significant. In future it would be better to add an API for analyzing long text, as Java version has. Also, getting to the original issue, I think that I changed all usages of Python SudachiError will be used instead. |
Using a single generic error feels a little too general, but it's much better than a full Exception - thanks! I'll go ahead and close this. |
I noticed that if input is too long in Python an Exception is thrown, but it's a plain Exception, not a
ValueError
or something. I see in the Rust code there are a variety of specific error types.I'm not familiar with Rust, but surely it's possible to have the Python code throw something more specific like a
InputTooLongException
?The text was updated successfully, but these errors were encountered: