Skip to content
Tiffany J. Callahan edited this page Oct 13, 2022 · 41 revisions


Screen Shot 2020-10-14 at 20 48 36

Project Description

A significant promise of electronic health records (EHRs) lies in the ability to perform large-scale investigations of mechanistic drivers of complex diseases. Despite significant progress in biomarker discovery, this promise remains largely aspirational due to its disconnectedness from biomedical knowledge (PMID:32335224, PMID:30304648). Linking molecular data to clinical data stored in EHR data will support biologically meaningful analysis of that data, and can be achieved by integrating knowledge about biology and pathophysiology from multiple ontologies. Similar to clinical terminologies, computational ontologies are classification systems that provide detailed representations of a specific domain of knowledge consisting of a set of concepts and logically defined relationships. Unlike most clinical terminologies, ontologies are computable and interoperable, which means they can be logically verified using description logics and easily integrated with other ontologies and non-ontological data including data from basic science and clinical research (PMID:30304648).

The usefulness of normalizing (i.e. mapping or annotating) clinical data to ontologies, like those in the Open Biomedical Ontology (OBO) Foundry, has been recognized as a fundamental need for the future of deep phenotyping (PMID:32335224). Existing work has largely focused on using ontologies to improve phenotyping in specific diseases (i.e. infectious PMID:31160594 and rare diseases PMID:31231902) and for the enhancement of specific biological and clinical domains (e.g. laboratory tests PMID:31119199 and diagnoses PMID:29295235).

Prior work has been largely limited to one-to-one mappings (e.g. mapping a single clinical term to a single ontology concept) and rarely includes external validation. Unfortunately, learning algorithms are not yet able to capture the complex clinical and biological semantics underlying these concepts and their relationships. Until a comprehensive, robust resource that includes mappings between multiple clinical domains and biomedical ontologies is created and validated, automatic generation of inference between patient-level clinical observations and biological knowledge will not be possible.

We have developed OMOP2OBO, the first health system-wide integration and alignment between the Observational Health Data Sciences and Informatics' Observational Medical Outcomes Partnership (OMOP) standardized clinical terminologies and eight OBO biomedical ontologies spanning diseases, phenotypes, anatomical entities, cell types, organisms, chemicals, metabolites, hormones, vaccines, and proteins.

To verify that the mappings are both clinically and biologically meaningful, we have performed extensive experiments to verify the accuracy, generalizability, and logical consistency of each released mapping set.

πŸ“’ Manuscript preprint is available πŸ‘‰

What Does This Repository Provide?

Through this repository we provide the following:
An algorithm and mapping pipeline that enables one to construct their set of omop2obo mappings. The figure below provides a high-level overview of the algorithm workflow. The code provided in this repository facilitates all of the automatic steps shown in this figure except for the manual mapping (for now, although we are currently working on a deep learning model to address this).

Open source omop2obo mappings that can be used out of the box (requires no coding) for 92,367 OMOP Conditions, 8,615 Drug Exposure ingredients, and 3,827 Measurements (10,673 measurement test results). The mappings can be downloaded from Zenodo using the links included below.

βš™ Releases βš™

If you would like to explore or use the mappings please see our dashboard, which includes mapping statistics, interactive plots and tables, and access to links to download the latest release.

Current Release

πŸ“πŸ“Š Publications and Presentations

We created a dedicated Zenodo Community, which provides access to data, mappings, and presentations (

Have Suggestions or Questions?

We are always looking for ways to make this resource useful for the community. If you have ideas or suggestions, we’d love to hear from you! To get in touch with us, please create an issue or send us an email πŸ’Œ