Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling MESH-CHEBI Mappings #77

Closed
4 tasks done
callahantiff opened this issue Dec 22, 2020 · 2 comments · Fixed by #81
Closed
4 tasks done

Handling MESH-CHEBI Mappings #77

callahantiff opened this issue Dec 22, 2020 · 2 comments · Fixed by #81
Assignees
Labels
ISMB challenge release v2.0.0 noting work and issues related to release v2.0.0

Comments

@callahantiff
Copy link
Owner

callahantiff commented Dec 22, 2020

TASK

Task Type: CODEBASE

Decide how to handle MESH to CHEBI mappings. Currently there is a GitHub Gist (ncbo_rest_api.py) that pings the BioPortal API into a script that can be run as part of the KG CI/CD build.

Problems: The ncbo_rest_api.py script runs fine, but it's brittle given its reliance on the BioPortal API, which is notoriously unstable. A potential solution (for now or in the future) could be implement the LOOM algorithm which is what creates the mappings underlying the API.

TODO

@callahantiff callahantiff added release v2.0.0 noting work and issues related to release v2.0.0 ISMB challenge labels Dec 22, 2020
@callahantiff callahantiff self-assigned this Dec 22, 2020
@callahantiff callahantiff added this to To do in ISMB Bio-Ontologies Challenge via automation Dec 22, 2020
@callahantiff
Copy link
Owner Author

This work impacts issue #72 because of its reference in the associated Jupyter Notebook.

@callahantiff callahantiff linked a pull request Dec 31, 2020 that will close this issue
5 tasks
@callahantiff
Copy link
Owner Author

@bill-baumgartner - this is complete (will be integrated with PR #81). I followed the details for the LOOM algorithm described on the BioPortal Wiki. It's very simple, just a few methods. Since there is nothing fancy, essentially accomplished through some preprocessing of the input MesH and ChEBI data and performing an inner join to find overlapping concepts.

In a Nutshell: We download the mesh2021.nt data file directly from MeSH and the Flat_file_tab_delimited/names.tsv.gz file directly from ChEBI. Using these files, we have recapitulated the LOOM algorithm implemented by BioPortal when creating mappings between these resources. The procedure is relatively straightforward and utilizes the following information from each resource:

  • For all MeSH SCR Chemicals, obtain the following information:
    • Identifiers: MeSH identifiers
    • Labels: string labels using the RDFS:label object property
    • Synonyms: track down all synonyms using the vocab:concept and vocab:preferredConcept object properties
  • For all ChEBI classes, obtain the following information:
    • Labels: string labels using the RDFS:label object property
    • Synonyms: track down all synonyms using all synonym object properties

You can see details with a description in the notebook here under ChEBI Identifiers as well as in the scripted version of this notebook (lines: 496-628, here)

ISMB Bio-Ontologies Challenge automation moved this from To do to Done Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ISMB challenge release v2.0.0 noting work and issues related to release v2.0.0
Development

Successfully merging a pull request may close this issue.

1 participant