Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Mesh" and "Hpo" linkers give the same result #463

Closed
almogmor opened this issue Dec 27, 2022 · 8 comments
Closed

"Mesh" and "Hpo" linkers give the same result #463

almogmor opened this issue Dec 27, 2022 · 8 comments

Comments

@almogmor
Copy link

Hi,
I'm trying to annotate data using Scispacy. Loading "mesh" and "hpo" gives the exact same results no matter what is the input.
For example:
image-1
image-2
image-3

I tried on many texts and both linkers plotted the same results.

@dakinggg
Copy link
Collaborator

Hi, there are two components related to entity recognition and linking in scispacy. One is the Named Entity Recognition (NER) component, which identifies textual spans that are likely to be entities (and depending on which scipsacy model, also their broad type). This information can be accessed as you've done via doc.ents and doc.ents[0].ent_type_. The second is the Entity Linking component, which is the one you specify mesh/hpo for. That component takes in the textual spans selected by the NER component and attempts to link them to an entity from the knowledge base. That information can be accessed via doc.ents[0]._.kb_ents. Hope that helps!

@almogmor
Copy link
Author

Thanks for the quick response, yes it does help.
I see now that the Entities linking are different
diff

But I couldn't find a way to map back from id e.g ('C0346073') to the name of the entity at the knowledge base ('mesh'/'hpo')

@hrshdhgd
Copy link

hrshdhgd commented Dec 28, 2022

I have a similar question. In the above example itself, in spite of using hpo as the linker, the id returned is C0346073 instead of HP:0012329 as we'd expect from the mapping shown here. I tried go as well and yet same result. Am I missing something?

@dakinggg
Copy link
Collaborator

dakinggg commented Dec 29, 2022

All of the ontology options are implemented as subsets of UMLS. We don't have any cross mapping to the root ontology identifier. You would have to get that from UMLS or another source. The entity information available from UMLS in scispacy can be accessed as in the example code

linker = nlp.get_pipe("scispacy_linker")
for umls_ent in entity._.kb_ents:
    print(linker.kb.cui_to_entity[umls_ent[0]]) 

@hrshdhgd
Copy link

Then how do linkers like hpo and go change the output?

@dakinggg
Copy link
Collaborator

They link to subsets of UMLS that are more specific than the full UMLS. This can be useful for two reasons (at least two that come to mind) if you know that you just want entities that fall into one of those subsets, 1) the downloaded file is much smaller and memory usage is less 2) the results will be higher precision because you won't get links to entities of a different type that you are not interested in.

@almogmor
Copy link
Author

almogmor commented Jan 1, 2023

Is there any way to map back 'mesh' or 'hpo' linkers back the to relevant UMLS Entities ?
In other words, if I'm using the umls linker can I filter which are 'mesh' related and which are 'hpo' related ?

e.g.
Screenshot 2023-01-01 212041

@dakinggg
Copy link
Collaborator

dakinggg commented Jan 2, 2023

The mesh and hpo linker entities should contain the exact same information as the umls linker entities since they are just a subset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants