Predictions NER for entities with interpunction #1642

ronaldvelzen · 2023-04-26T08:09:12Z

DeepPavlov version: 1.1.1
Python version: 3.10
Operating system: Ubuntu

Issue:
I am using the ner_ontonotes_bert_mult model to predict entities for text. For sentences with interpunction in the entities, this gives unexpected results. Before the 1.0.0 release, I used the Deeppavlov docker image with the ner_ontonotes_bert_mult config as well. I didn't encounter these issues with the older version of Deeppavlov.

Content or a name of a configuration file:

[ner_ontonotes_bert_mult](https://github.com/deeppavlov/DeepPavlov/blob/1.0.2/deeppavlov/configs/ner/ner_ontonotes_bert_mult.json)

Command that led to the unexpected results:

from deeppavlov import build_model

deeppavlov_model = build_model(
        "ner_ontonotes_bert_mult",
        install=True,
        download=True)

sentence = 'Today at 13:10 we had a meeting'
output = deeppavlov_model([sentence])
print(output[0])
[['Today', 'at', '13', ':', '10', 'we', 'had', 'a', 'meeting']]
print(output[1])
[['O', 'O', 'B-TIME', 'O', 'B-TIME', 'O', 'O', 'O', 'O']]

As you can see 13:10 is not recognized as a time entity as a whole, but 13 as B-TIME, : as O, and 10 as B-time. The same happens for names with interpunctions such as E.A. Jones. I also tried the ner_ontonotes_bert configuration, but this gave the same results. Since I want to use the model also for languages other than English, this is not an option at all.

I already opened an issue about this problem. However, the issue was closed without giving me a satisfying outcome.

I was wondering what I could do to solve this issue, is it possible to fine-tune the model on such examples?

The text was updated successfully, but these errors were encountered:

Kolpnick · 2023-07-10T06:48:14Z

Hello, @ronaldvelzen. Thank you for your interest!
We are sorry that you have encountered the described problem. It turned out that incorrect labeling is related to the specific markup of the training data. We have trained a new model, which is available in pull request, and will be added to the main branch soon.

IgnatovFedor · 2023-10-11T04:52:02Z

Fixed in #1661

ronaldvelzen added the bug label Apr 26, 2023

Kolpnick mentioned this issue Sep 4, 2023

New classification models #1657

Open

Kolpnick added a commit that referenced this issue Sep 18, 2023

Fixed issue #1642

c1d03f0

Kolpnick mentioned this issue Sep 18, 2023

NER bugs fixes #1661

Merged

IgnatovFedor mentioned this issue Oct 11, 2023

Release 1.4.0 #1664

Merged

IgnatovFedor closed this as completed Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions NER for entities with interpunction #1642

Predictions NER for entities with interpunction #1642

ronaldvelzen commented Apr 26, 2023

Kolpnick commented Jul 10, 2023

IgnatovFedor commented Oct 11, 2023

Predictions NER for entities with interpunction #1642

Predictions NER for entities with interpunction #1642

Comments

ronaldvelzen commented Apr 26, 2023

Kolpnick commented Jul 10, 2023

IgnatovFedor commented Oct 11, 2023