Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggest: warning instead of error concerning ner_mitie #528

Closed
Jinchao-Yu opened this issue Aug 16, 2017 · 5 comments
Closed

suggest: warning instead of error concerning ner_mitie #528

Jinchao-Yu opened this issue Aug 16, 2017 · 5 comments
Labels
type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@Jinchao-Yu
Copy link
Contributor

rasa NLU version : 0.10.0

Used backend / pipeline : ner_mitie

Operating system : Any

Issue: Error while Mitie encounter partial tokens

I suggest to raise a warning and skip the problematic example instead of raising an error. Because in some cases, we cannot make sure a tokenizer works as we expect for thousands of cases.

More details: While designing custom entities, Rasa-NLU suggest ner_mitie or ner_crf. While ner_crf accept a fraction of token as entity, ner_mitie does not. I choose tokenizer_spacy after some tests and comparisons with NTLK and mitie. However, we still cannot make sure that the tokenizer works always as expected (i.e. our tagged entities). For example, if a restaurant name is a little strange "XXX-XXX-XX" or "XXX.xx" and it is the last word of a sentence, the tokenizer does not separate the name and the full-stop dot. Let alone NTLK and mitie tokenizers, which have a little more problems with apostrophes and dots (ex: "2.5 dollars" and "in 2017."). But since such problems rarely happen, I don't want to change my training data (ner_crf works, and maybe other software tools, too). I don't want that ner_mitie stops because of these rare examples.

One quick solution is modifying rasa_nlu/extractors/mitie_entity_extractor.py. At line 89, I added a "try ... except" clause to give a warning and skip the problematic example.

@tmbo
Copy link
Member

tmbo commented Aug 16, 2017

@fakesecret that sounds very reasonable, and I agree with you. Mind creating a pull request with the proposed change?

@tmbo tmbo added the type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR label Aug 16, 2017
@Jinchao-Yu
Copy link
Contributor Author

@tmbo I correct myself concerning ner_crf. While I wanted to create warning messages in a Rasa-NLU style, I found that ner_crf also have warning messages about entity boundries. But these warning messages are not outputted by default (because of logging settings). I need to add logging.basicConfig(level=xxx) in my root python script of training to see these warnings, which is not mentionned in Rasa-NLU documentation "Using rasa NLU from python".

I think warning/info messages might be better managed in the project, including basicConfig problem mentionned above, and other minor issues: warning messages managed by different modules -- warnings and logging, logging.warn method (logging.warning is the official method name).

@tmbo
Copy link
Member

tmbo commented Aug 16, 2017

Yes, you are absolutely on track here, we definitely can do a better job in terms of logging. We did a couple of improvements there lately (e.g. every class should now have it's own logger) but there is still some room for improvements.

Concerning the log level, which script do you mean, the one for training?

@Jinchao-Yu
Copy link
Contributor Author

A simple example: rasa NLU docs --> Using rasa NLU from python --> Training Time. It calls rasa_nlu.model.Trainer. Even though rasa_nlu.model defines a logger, if there is no "logging.basicConfig(level=xxx)" in the root python script ("Training time" one in this example) or any child process, warning/info messages will not printed. This is a problem of logging.handler and parent/child loggers. For example, if I set logging.basicConfig(level=xxx) directly in mitie_entity_extractor.py, not only warining messages within but also those in other modules (such as sklearn_intent_classifier) will be printed.

In addition, Rasa wants xxx= "log_level" in config.py. So logging needs to be configurated more properly.

@tmbo
Copy link
Member

tmbo commented Aug 16, 2017

Now I get you, I thought you are using one of the provided scripts, but I was quite sure they are all properly configuring the logging. But I agree, the code examples should mention/include that as well. Feel free to create a PR 👍

@tmbo tmbo closed this as completed Aug 18, 2017
taytzehao pushed a commit to taytzehao/rasa that referenced this issue Jul 14, 2023
…asaHQ#529)

fixes RasaHQ#528

Co-authored-by: Hugh Lunt <hugh.lunt@itv.com>

Co-authored-by: Hugh Lunt <hugh.lunt@itv.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

No branches or pull requests

2 participants