-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggest: warning instead of error concerning ner_mitie #528
Comments
@fakesecret that sounds very reasonable, and I agree with you. Mind creating a pull request with the proposed change? |
@tmbo I correct myself concerning ner_crf. While I wanted to create warning messages in a Rasa-NLU style, I found that ner_crf also have warning messages about entity boundries. But these warning messages are not outputted by default (because of logging settings). I need to add logging.basicConfig(level=xxx) in my root python script of training to see these warnings, which is not mentionned in Rasa-NLU documentation "Using rasa NLU from python". I think warning/info messages might be better managed in the project, including basicConfig problem mentionned above, and other minor issues: warning messages managed by different modules -- warnings and logging, logging.warn method (logging.warning is the official method name). |
Yes, you are absolutely on track here, we definitely can do a better job in terms of logging. We did a couple of improvements there lately (e.g. every class should now have it's own logger) but there is still some room for improvements. Concerning the log level, which script do you mean, the one for training? |
A simple example: rasa NLU docs --> Using rasa NLU from python --> Training Time. It calls rasa_nlu.model.Trainer. Even though rasa_nlu.model defines a logger, if there is no "logging.basicConfig(level=xxx)" in the root python script ("Training time" one in this example) or any child process, warning/info messages will not printed. This is a problem of logging.handler and parent/child loggers. For example, if I set logging.basicConfig(level=xxx) directly in mitie_entity_extractor.py, not only warining messages within but also those in other modules (such as sklearn_intent_classifier) will be printed. In addition, Rasa wants xxx= "log_level" in config.py. So logging needs to be configurated more properly. |
Now I get you, I thought you are using one of the provided scripts, but I was quite sure they are all properly configuring the logging. But I agree, the code examples should mention/include that as well. Feel free to create a PR 👍 |
…asaHQ#529) fixes RasaHQ#528 Co-authored-by: Hugh Lunt <hugh.lunt@itv.com> Co-authored-by: Hugh Lunt <hugh.lunt@itv.com>
rasa NLU version : 0.10.0
Used backend / pipeline : ner_mitie
Operating system : Any
Issue: Error while Mitie encounter partial tokens
I suggest to raise a warning and skip the problematic example instead of raising an error. Because in some cases, we cannot make sure a tokenizer works as we expect for thousands of cases.
More details: While designing custom entities, Rasa-NLU suggest ner_mitie or ner_crf. While ner_crf accept a fraction of token as entity, ner_mitie does not. I choose tokenizer_spacy after some tests and comparisons with NTLK and mitie. However, we still cannot make sure that the tokenizer works always as expected (i.e. our tagged entities). For example, if a restaurant name is a little strange "XXX-XXX-XX" or "XXX.xx" and it is the last word of a sentence, the tokenizer does not separate the name and the full-stop dot. Let alone NTLK and mitie tokenizers, which have a little more problems with apostrophes and dots (ex: "2.5 dollars" and "in 2017."). But since such problems rarely happen, I don't want to change my training data (ner_crf works, and maybe other software tools, too). I don't want that ner_mitie stops because of these rare examples.
One quick solution is modifying rasa_nlu/extractors/mitie_entity_extractor.py. At line 89, I added a "try ... except" clause to give a warning and skip the problematic example.
The text was updated successfully, but these errors were encountered: