Skip to content

monarch-initiative/monarch-mapping-commons

Repository files navigation

monarch-mapping-commons

This repository contains the source code which generates the SSSOM-style mapping files used for the Monarch Initiative knowledge graph.
The pipeline is run via Jenkins and the resulting mapping files are uploaded to Google Cloud Storage, hosted at https://data.monarchinitiative.org/mappings/

Repository Structure

Usage

Prerequisites

Installation

git clone https://github.com/monarch-initiative/monarch-mapping-commons.git
cd monarch-mapping-commons
poetry install

Running

make mappings

Note:
The first time you run this command, it will take a while to download and process the data.
Subsequent runs will be much faster.
This is because Monarch Gene Mapping depends on a very large (11gb) file from UniProtKB.
Future plans are in place to cache this file in Google Cloud Storage, or to use the UniProtKB API,
but for now, the file must be downloaded in its entirety.

Developer Documentation

To update the mapping registry from OLS:

sh odk.sh make update_registry -B

To update the mappings:

sh odk.sh make mappings

If the run requires a recently published SSSOM or OAK feature, first update ODK:

docker pull obolibrary/odkfull:dev

and then run the dependencies goal together with the mappings goal:

IMAGE=odkfull:dev sh odk.sh make mappings

For Windows, append :dev to obolibrary/odkfull in the odk.bat file.

Note: If running on a Windows machine, replace sh odk.sh with odk.bat in the above commands.

Design decisions:

  1. Only mappings of base entities are extracted. This ensures that we do not import the same UBERON mapping for every species specific anatomy ontology (XAO). This is realised as a filtering step that relies on the crude assumption that the ontology ID is somehow reflected in the subject_id.

Credits

This project was made with the mapping-commons-cookiecutter.