Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Named Entity Linking #1545

Open
rogerwaldvogel opened this issue Nov 2, 2021 · 6 comments
Open

Named Entity Linking #1545

rogerwaldvogel opened this issue Nov 2, 2021 · 6 comments
Labels
feature request feature request for doccano

Comments

@rogerwaldvogel
Copy link

For me it would be extremely important that with Doccano Named Entity Linking (NEL) and not only Named Entity Recognizion (NER) would be possible. For this it would have to be possible to link to a specific entity from a knowledge base, as was done with INCEpTION.

@github-actions
Copy link

github-actions bot commented Nov 2, 2021

Would you write your environment? Thank you!

@ghontolux
Copy link

ghontolux commented Nov 10, 2021

It looks like there is nothing about that in the improvement ideas or project boards. I have a working Entity Linking in my fork, but it is bound to the Entity Linking Parser and lexicon of my company.

@Hironsan Hironsan added the feature request feature request for doccano label Nov 10, 2021
@Hironsan
Copy link
Member

To achieve this feature, I think we need to create a mechanism to upload a knowledge base in some format(e.g. id, title) and make suggestions in real-time.

@rogerwaldvogel
Copy link
Author

rogerwaldvogel commented Nov 29, 2021

Sorry for my late reply. I would suggest to implement the feature in two substeps.

  1. the minimum requirement would be that you can capture URIs to link to the corresponding entity in an ontology. In the first step, this could be done by simply adding the url by hand or by copy and past. In the user interface there should simply be a corresponding field. It must be possible to enter more than one URI. For example, in a current project I am working on it is a requirement per entity to reference a private ontology and if possible a public ontology. The question now is how best to integrate the identifier into the Doccano Jsonl format? One suggestion would be to extend the labels as follows:
{"text": "Douglas Adams", "lables": [{0, 13, "person", "https://www.wikidata.org/wiki/Q42"}]}

Other suggestion? I think implementing this part of the feature is possible with relatively little effort and would add tremendous value. I'm sure there are others than me doing NEL tasks 😉

  1. in the second part you could implement the upload and use of a knowledge base. I would suggest that the user can upload his knowledge base as a NIF file. This is supported by many Named Entity Linking Engines like dbpedia-spotlight. Now such an existing named entity linking engine could be used to automatically make suggestions to the user.

@ghontolux
Copy link

You are right. The first step is fast to implement:

  • add the models into the backend
    • new project type entity linking
    • new label type entity link as freetext field
    • migrate data base
  • build the frontend
    • goal: open a text input for the selected text and save it in the suggested data structure
    • works similiar to sequence labeling, so you can copy and change the opening menu and change the input mechanism
    • clicking on the link opens a new tab of entity url

But functions based on this will soon become complex. A few points need to be considered for this:

  • Storage of entities: what should be stored (link, label, category, description, ...).
  • What should be displayed in the menu when marking?
    • Typeahead
    • How many suggestions
    • Can insufficient suggestions still be corrected with a free text
  • How to ensure fast database access to all entities
  • Can external databases/services be connected? If yes how (REST, Proxy, ...)

@cuihaoleo
Copy link

Upvote for this feature. A general-use free text field associated with each labeled span would suffice our need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request feature request for doccano
Projects
None yet
Development

No branches or pull requests

4 participants