-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVM PosTagger fails on document without recovery on error #127
Comments
Here are the first elements of my investigations.
Here is a sample XML file causing SVMTagger to crash :
|
The recovery on error is handled in Release mode thanks to the fix on WITH_DEBUG_MESSAGES in commit e8e2e11 . The SVMTag crash is still not solved. |
Solved in commit 876c293:
But it does not solve the underlying tokenizer error. |
Dear @kleag , I got a new example that crashes the SVMPosTagger. The malicious characters are the succession of three dots : "..." . |
Describe the bug
The SVM PosTagger fails sometimes on several documents. The origin of the errors are not clear (either inside SVMTool or in the way we use it). This leads the processus to crash (either analyzeText or analyzeXml).
To Reproduce
This issue is linked with #95 which describes one of the errors occasionally encountered.
More examples are needed .
Please @benlabbe, you are summoned to upload XML sample files !
Expected behavior
What ever the reason of the errors in the SVM PosTagger, the text processing should continue without side-effects for the following text segments to analyze (eg : for the following paragraphs in an Xml file).
The text was updated successfully, but these errors were encountered: