suggest: warning instead of error concerning ner_mitie #528

Jinchao-Yu · 2017-08-16T10:06:50Z

rasa NLU version : 0.10.0

Used backend / pipeline : ner_mitie

Operating system : Any

Issue: Error while Mitie encounter partial tokens

I suggest to raise a warning and skip the problematic example instead of raising an error. Because in some cases, we cannot make sure a tokenizer works as we expect for thousands of cases.

More details: While designing custom entities, Rasa-NLU suggest ner_mitie or ner_crf. While ner_crf accept a fraction of token as entity, ner_mitie does not. I choose tokenizer_spacy after some tests and comparisons with NTLK and mitie. However, we still cannot make sure that the tokenizer works always as expected (i.e. our tagged entities). For example, if a restaurant name is a little strange "XXX-XXX-XX" or "XXX.xx" and it is the last word of a sentence, the tokenizer does not separate the name and the full-stop dot. Let alone NTLK and mitie tokenizers, which have a little more problems with apostrophes and dots (ex: "2.5 dollars" and "in 2017."). But since such problems rarely happen, I don't want to change my training data (ner_crf works, and maybe other software tools, too). I don't want that ner_mitie stops because of these rare examples.

One quick solution is modifying rasa_nlu/extractors/mitie_entity_extractor.py. At line 89, I added a "try ... except" clause to give a warning and skip the problematic example.

tmbo · 2017-08-16T10:12:04Z

@fakesecret that sounds very reasonable, and I agree with you. Mind creating a pull request with the proposed change?

Jinchao-Yu · 2017-08-16T13:19:44Z

@tmbo I correct myself concerning ner_crf. While I wanted to create warning messages in a Rasa-NLU style, I found that ner_crf also have warning messages about entity boundries. But these warning messages are not outputted by default (because of logging settings). I need to add logging.basicConfig(level=xxx) in my root python script of training to see these warnings, which is not mentionned in Rasa-NLU documentation "Using rasa NLU from python".

I think warning/info messages might be better managed in the project, including basicConfig problem mentionned above, and other minor issues: warning messages managed by different modules -- warnings and logging, logging.warn method (logging.warning is the official method name).

tmbo · 2017-08-16T13:22:39Z

Yes, you are absolutely on track here, we definitely can do a better job in terms of logging. We did a couple of improvements there lately (e.g. every class should now have it's own logger) but there is still some room for improvements.

Concerning the log level, which script do you mean, the one for training?

Jinchao-Yu · 2017-08-16T13:51:36Z

A simple example: rasa NLU docs --> Using rasa NLU from python --> Training Time. It calls rasa_nlu.model.Trainer. Even though rasa_nlu.model defines a logger, if there is no "logging.basicConfig(level=xxx)" in the root python script ("Training time" one in this example) or any child process, warning/info messages will not printed. This is a problem of logging.handler and parent/child loggers. For example, if I set logging.basicConfig(level=xxx) directly in mitie_entity_extractor.py, not only warining messages within but also those in other modules (such as sklearn_intent_classifier) will be printed.

In addition, Rasa wants xxx= "log_level" in config.py. So logging needs to be configurated more properly.

tmbo · 2017-08-16T13:58:40Z

Now I get you, I thought you are using one of the provided scripts, but I was quite sure they are all properly configuring the logging. But I agree, the code examples should mention/include that as well. Feel free to create a PR 👍

…asaHQ#529) fixes RasaHQ#528 Co-authored-by: Hugh Lunt <hugh.lunt@itv.com> Co-authored-by: Hugh Lunt <hugh.lunt@itv.com>

tmbo added the type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR label Aug 16, 2017

Jinchao-Yu mentioned this issue Aug 16, 2017

issue528 in extractors/mitie_entity_extractor.py #531

Merged

4 tasks

tmbo closed this as completed Aug 18, 2017

taytzehao pushed a commit to taytzehao/rasa that referenced this issue Jul 14, 2023

Removed pipefail from bash shell execution to match GitHub Actions (R…

6066980

…asaHQ#529) fixes RasaHQ#528 Co-authored-by: Hugh Lunt <hugh.lunt@itv.com> Co-authored-by: Hugh Lunt <hugh.lunt@itv.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suggest: warning instead of error concerning ner_mitie #528

suggest: warning instead of error concerning ner_mitie #528

Jinchao-Yu commented Aug 16, 2017

tmbo commented Aug 16, 2017

Jinchao-Yu commented Aug 16, 2017

tmbo commented Aug 16, 2017

Jinchao-Yu commented Aug 16, 2017

tmbo commented Aug 16, 2017

suggest: warning instead of error concerning ner_mitie #528

suggest: warning instead of error concerning ner_mitie #528

Comments

Jinchao-Yu commented Aug 16, 2017

tmbo commented Aug 16, 2017

Jinchao-Yu commented Aug 16, 2017

tmbo commented Aug 16, 2017

Jinchao-Yu commented Aug 16, 2017

tmbo commented Aug 16, 2017