Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict DOID mappings to UMLS, MeSH, and EFO #68

Merged
merged 2 commits into from
Oct 21, 2021
Merged

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Oct 12, 2021

@allenbaron @lschriml there is a tiny issue with the way some of the entries are normalized, but this is the outline of generating the DOID mappings

Issues

  • MONDO and IDO both have some conflicts with how their identifiers are checked, which is part of a larger discussion on local identifiers that will come up in the workshop next week
  • ORDO and CIDO both are in a format that could not be parsed as either OBO nor OWL via pronto

@cthoyt cthoyt marked this pull request as ready for review October 21, 2021 08:35
@cthoyt cthoyt changed the title Add DOID mappings to MONDO, UMLS, MeSH, EFO, and IDO Add DOID mappings to UMLS, MeSH, EFO Oct 21, 2021
@cthoyt cthoyt changed the title Add DOID mappings to UMLS, MeSH, EFO Predict DOID mappings to UMLS, MeSH, and EFO Oct 21, 2021
@cthoyt cthoyt merged commit 7917bbb into master Oct 21, 2021
@cthoyt cthoyt deleted the add-doid-mappings branch October 21, 2021 08:39
@bgyori
Copy link
Contributor

bgyori commented Oct 25, 2021

There is an important issue with this that just caused issues in INDRA through which I noticed this. Namely, the script doesn't take into account mappings that are already provided by DOID to MeSH, and adds redundant predictions for these. @cthoyt could you look into this and remove these predictions?

@bgyori
Copy link
Contributor

bgyori commented Oct 25, 2021

Another issue is that a lot of one-to-many mappings are added as predicted exact matches. I think it would be better to leave these out or improve the script to take more features into account when deciding what mappings to propose. For instance here (for convenience this is not directly from predictions.tsv but a derived table):

EFO    0000182 hepatocellular carcinoma        DOID    DOID:3571       liver cancer
EFO    0000182 hepatocellular carcinoma        DOID    DOID:684        hepatocellular carcinoma
EFO    0000182 hepatocellular carcinoma        DOID    DOID:686        liver carcinoma
EFO    0000182 hepatocellular carcinoma        DOID    DOID:687        hepatoblastoma

there is an exact match at the name level so that single mapping could be proposed.

@cthoyt
Copy link
Member Author

cthoyt commented Oct 26, 2021

There is an important issue with this that just caused issues in INDRA through which I noticed this. Namely, the script doesn't take into account mappings that are already provided by DOID to MeSH, and adds redundant predictions for these. @cthoyt could you look into this and remove these predictions?

I think this is an issue with the redundant prefixes in the identifiers. I think a potential solution would to start standardizing identifiers in the main files, then provide an export that uses Identifiers.org rules for import in INDRA, since using a non-general solution is requiring writing custom handling for this in many places

Another issue is that a lot of one-to-many mappings are added as predicted exact matches. I think it would be better to leave these out or improve the script to take more features into account when deciding what mappings to propose. For instance here (for convenience this is not directly from predictions.tsv but a derived table):

That's an excellent idea. It seems totally obvious that one is better than the others. I think the current logic outputs all mappings returned by Gilda, but somewhere inside the scored match object if it has an "exact match" then that's definitely good enough to only keep it.

@bgyori
Copy link
Contributor

bgyori commented Nov 14, 2021

These issues haven't yet been resolved, I will try to do something about it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants