Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

biomarker_for results appear to be rubbish #1774

Closed
dkoslicki opened this issue Jan 21, 2022 · 5 comments
Closed

biomarker_for results appear to be rubbish #1774

dkoslicki opened this issue Jan 21, 2022 · 5 comments
Assignees

Comments

@dkoslicki
Copy link
Member

From today's standup: https://arax.ncats.io/?r=0a0efe04-9f73-410e-89ae-1e1f9c2e658f
it appears that most/all results are just plain terrible. Things like "Rheumatologist is a biomarker for psoriatic arthritis". Unclear what the cause of this could be (since poor results are returned by KG2, BTE, etc.), but certainly seems worth investigating.

Tagging KG2 team, but feel free to un-assign as appropriate.

@dkoslicki
Copy link
Member Author

Other notes:
Changing the NamedThing to Protein returns no results, but changing it to BiologicalEntity reveals quite a few nodes appear to be misclassified: https://arax.ncats.io/?r=35579
Eg. "diagnosis" isn't a biological entity, right?

@saramsey
Copy link
Member

Thank you for the bug report.

Digging into this, one thing I notice right off the bat is that we are getting (from SemMedDB) a lot of procedures being connected to diseases/conditions, via biolink:biomarker_for. Details in RTX-KG2 issue 194. So that should be an easy fix for the next KG2 build. However, that may not be the only issue with the results for the original query that was looking for biomarkers of rheumatoid arthritis. I will have to do some more digging.

@saramsey
Copy link
Member

saramsey commented Jan 21, 2022

In fairness, we were using the mapping for SEMMEDDB:diagnoses to biolink:biomarker_for that came straight from Biolink
https://github.com/biolink/biolink-model/blob/ca93f7ebd7fd52437b23a7195b51f54d74b81d37/biolink-model.yaml#L4469

but in RTX-KG2 issue 194, there's pretty good evidence that the Biolink-suggested mapping is problematic.

@saramsey
Copy link
Member

saramsey commented Jan 21, 2022

I haven't yet looked into the BTE results, but I note that if they are using the Biolink-suggested predicate mapping of SEMMEDDB:diagnoses to biolink:biomarker_for, then they would be affected as well.

@saramsey
Copy link
Member

saramsey commented Feb 22, 2023

Fixed (both in Biolink and in the RTX-KG2 build system).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants