For a good overview of the biolink-model, watch Chris Mungall's talk at ICBO 2020.
- Browse the model: https://biolink.github.io/biolink-model
Refer to the following resources for a quick introduction to the Biolink Model:
- Introduction to the Biolink Datamodel
- Biolink Model - A community driven data model for life sciences (Biocuration 2020)
See also Biolink Model Guidelines for help understanding, curating, and working with the model.
The purpose of the Biolink Model is to provide a high-level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc), their properties, relationships, and enumerate ways in which they can be associated.
The representation is independent of storage technology or metamodel (Solr documents, neo4j/property graphs, RDF/OWL, JSON, CSVs, etc). Different mappings to each of these are provided.
The specification of the Biolink Model is a single YAML file built using linkml. The basic elements of the YAML are:
- Class Definitions: definitions of upper level classes representing both 'named thing' and 'association'
- Slot Definitions: definitions of slots (aka properties) that can be used to relate members of these classes to other classes or data types. Slots collectively refer to predicates, node properties, and edge properties
The model itself is being used in the following projects:
- NCATS Biomedical Data Translator
- Monarch Initiative
- KG Microbe
- Illuminating the Druggable Genome
The main source of truth is biolink-model.yaml. This is a YAML file that is intended to be relatively simple to view and edit in its native form.
The yaml definition is currently used to derive:
- JSON Schema
- Python dataclasses
- Java code gen
- ProtoBuf definitions
- RDF Shape Expressions
- JSON-LD context
- GOlr YAML schemas
- these can be compiled down to Solr XML schemas
- these are also intermediate targets used within the BBOP/AmiGO framework
- Markdown documentation
Make and build instructions
Prerequisites: Python 3.7+ and pipenv
To install pipenv,
pip3 install pipenv
To install the project,
To regenerate artifacts from the Biolink Model YAML,
Note: the Makefile requires the following dependencies to be installed:
Generally install using
pip3 install jsonschema
If you are on a Mac, it can be installed using
brew install jsonschema2pojo
For other OS environments, download the latest release then extract it into your execution path. eg
wget https://github.com/joelittlejohn/jsonschema2pojo/releases/download/jsonschema2pojo-1.0.2/jsonschema2pojo-1.0.2.tar.gz tar -xvzf jsonschema2pojo-1.0.2.tar.gz export PATH=$PATH:`pwd`/jsonschema2pojo-1.0.2/bin
See GraphViz site for installation in your operating system.
How do I use Biolink Model YAML programatically?
For operations such as CURIE lookup, finding class by synonyms, get parents, get ancestors, etc. please make use of biolink-model-toolkit. It provides convenience methods for traversing Biolink Model.
Citing Biolink Model
Unni DR, Moxon SAT, Bada M, Brush M, Bruskiewich R, Caufield JH, Clemons PA, Dancik V, Dumontier M, Fecho K, Glusman G, Hadlock JJ, Harris NL, Joshi A, Putman T, Qin G, Ramsey SA, Shefchek KA, Solbrig H, Soman K, Thessen AE, Haendel MA, Bizon C, Mungall CJ, The Biomedical Data Translator Consortium (2022). Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science. Clin Transl Sci. Wiley; 2022 Jun 6; https://onlinelibrary.wiley.com/doi/10.1111/cts.13302