Schema and generated objects for biolink data model and upper ontology
Clone or download
mbrush and hsolbrig extend mappings with new slots (#173)
* extend mappings with new slots

Addresses issue #172.
Update definition of current 'mapping' slot.
Create new mapping subtype slots for exact_matches, broader_matches, narrower_matches, and close_matches.

* Rebuild all artifacts effected by the change

* Reset images flag on make file
Latest commit 08aa448 Nov 12, 2018
Permalink
Failed to load latest commit information.
about docs Jul 3, 2018
biolinkmodel Test and build of patch change Nov 6, 2018
contrib extend mappings with new slots (#173) Nov 13, 2018
docs extend mappings with new slots (#173) Nov 13, 2018
golr-views Fixes issue #133 and does complete build Sep 23, 2018
graphql Fixes issue #133 and does complete build Sep 23, 2018
graphviz Added comparefiles tool and added to makefile. Nov 4, 2018
images Theoretically complete except for ShEx Jun 13, 2018
java Fix for issue #138. Also updated version number Sep 24, 2018
json-schema Fix for issue #138. Also updated version number Sep 24, 2018
metamodel extend mappings with new slots (#173) Nov 13, 2018
notebooks Sample test code for RDF files Nov 3, 2018
ontology extend mappings with new slots (#173) Nov 13, 2018
proto Test and build of patch change Nov 6, 2018
rdf extend mappings with new slots (#173) Nov 13, 2018
script regen Feb 5, 2018
shex extend mappings with new slots (#173) Nov 13, 2018
tests extend mappings with new slots (#173) Nov 13, 2018
.gitignore Update secondary files Nov 6, 2018
.travis.yml Add COLUMNS to travis build Sep 23, 2018
Gemfile docs Feb 9, 2018
LICENSE license Mar 19, 2018
Makefile extend mappings with new slots (#173) Nov 13, 2018
Pipfile Fix issue #163. Also added pipenv files and fixed make file bug Nov 5, 2018
Pipfile.lock Fix issue #163. Also added pipenv files and fixed make file bug Nov 5, 2018
README.md Resync build Sep 19, 2018
_config.yml Set theme jekyll-theme-dinky Feb 4, 2018
biolink-model.yaml Change base URI to http://w3id.org/biolink/vocab Nov 7, 2018
context.jsonld extend mappings with new slots (#173) Nov 13, 2018
index.md added links to context file, see #65 May 10, 2018
init.sh initial commit Dec 4, 2017
jekyll-theme-dinky.gemspec regen Feb 5, 2018
meta.yaml extend mappings with new slots (#173) Nov 13, 2018
notes.md Theoretically complete except for ShEx Jun 13, 2018
requirements.txt Update to fix a bug in the shexc generator Oct 29, 2018
setup.py Added comparefiles tool and added to makefile. Nov 4, 2018
tox.ini Resync build Sep 19, 2018

README.md

Build Status DOI

biolink-models

Quickstart docs:

See the slides

Conversion/validation code: https://github.com/NCATS-Tangerine/kgx

Introduction

The purpose of the biolink datamodel is to provide a high level datamodel of biological entities (genes, diseases, phenotypes, pathways, individuals, substances, etc), their properties, relationships, and ways in which they can be associated.

The representation is independent of storage technology or metamodel (solr/documents, neo4j/property graphs, RDF/OWL, JSON, CSVs, etc). Different mappings to each of these are provided.

The specification of the reference biolink model is a single YAML file following a custom meta-model. The basic elements of the YAML are:

  • definitions of upper level classes representing both named things (genes, phenotypes, etc) and associations between them
  • definitions of slots (aka properties) that can be used to relate members of these classes to other classes or datatypes

This datamodel is being used in the NCATS Translator project. Not all these elements in the datamodel are used by the Translator, a subset is used.

Entity (Node) Types

Protege view: img

Property and Edge Types

We divide these into relationship types which connect two nodes together, and node or edge properties

Association Hierarchy

Identifiers

See biolink json-ld context

Mapping to specific database and modeling platforms

Neo4J Mapping

See mapping to neo4j

RDF Mapping

See mapping to neo4j

Organization

The datamodel source is biolink-model.yaml. This is a yaml file that is intended to be relatively simple to view and edit in its native form.

The yaml definition is currently used to derive:

We leverage existing frameworks where possible. E.g json-schema allows codegen to other languages

Additionally, this repo contains the metamodel definition of itself in yaml, together with code for working with datamodels. In theory this could be used in other domains but there is no plan for this at the moment.

Metamodel

See metamodel for details of the metamodel.

Usage in existing projects

Case study: gene expression in Monarch

Currently this is documented in the ingest artefacts repo, using non-computable cmap images:

bgee model

And also by the gene-anatomy cypher query which maps graphs conforming to the pattern to denormalized tuples for indexing in solr

in the biolink model this is explicitly represented using the gene to expression site association class definition in the model

  - name: gene to expression site association
    is_a: association
    description: >-
      An association between a gene and an expression site, possibly qualified by stage/timing info
    see_also: "https://github.com/monarch-initiative/ingest-artifacts/tree/master/sources/BGee"
    slot_usage:
      - slot: subject
        type: gene or gene product
        description: "gene in which variation is correlated with the phenotypic feature"
      - slot: object
        type: anatomical entity
        description: "location in which the gene is expressed"
        subclass_of: UBERON:0001062
        examples:
          - value: UBERON:0002037
            description: cerebellum
      - slot: relation
        description: "expression relationship"
        subproperty_of: "RO:0002206"
      - slot: stage
        type: developmental stage
        description: "stage at which the gene is expressed in the site"
        examples:
          - value: UBERON:0000069
            description: larval stage
      - slot: quantifier
        description: >-
          can be used to indicate magnitude, or also ranking

This is used to generate various artefacts such as

  • golr view definition
    • (which is itself later compiled to solr xml using the bbop-golr framework)
  • java class
    • generated from json-schema, so inheritance is unfolded
    • in future we may generate directly

Auto-generated image:

img

type GeneToExpressionSiteAssociation {
  qualifiers: [String]
  stageQualifier: LifeStage
  objectExtensions: [PropertyValuePair]
  hasEvidence: String
  publications: [Publication]
  object: AnatomicalEntity!
  hasEvidenceType: EvidenceType
  hasEvidenceGraph: String
  providedBy: Provider
  label: String
  relation: String!
  negated: String
  subject: GeneOrGeneProduct!
  id: String!
  quantifierQualifier: String
  associationType: String
  subjectExtensions: [PropertyValuePair]
}

snippet of generated json-schema

        "GeneToExpressionSiteAssociation": {
            "description": "An association between a gene and an expression site, possibly qualified by stage/timing info. TBD: introduce subclasses for distinction between wild-type and experimental conditions?",
            "properties": {
                "association_type": {
                    "description": "connects an association to the type of association (e.g. gene to phenotype)",
                    "type": "string"
                },
                "has_evidence": {
                    "description": "connects an association to an instance of supporting evidence",
                    "type": "string"
                },
                "has_evidence_graph": {
                    "description": "connects an association to a graph object including a path from subject to object",
                    "type": "string"
                },
                "has_evidence_type": {
                    "description": "connects an association to the class of evidence used",
                    "type": "string"
                },
                "id": {
                    "type": "string"
                },
                "label": {
                    "description": "A human-readable name for a thing",
                    "type": "string"
                },
                "negated": {
                    "description": "if set to true, then the association is negated i.e. is not true",
                    "type": "string"
                },
                "object": {
                    "description": "connects an association to the object of the association. For example, in a gene-to-phenotype association, the gene is subject and phenotype is object.",
                    "type": "string"
                },
                "object_extensions": {
                    "description": "Additional relationships that are true of the object in the context of the association. For example, if the object is an anatomical term in an expression association, the object extensions may include part-of links",
                    "items": {
                        "type": "string"
                    },
                    "type": "array"
                },
                "provided_by": {
                    "description": "connects an association to the agent (person, organization or group) that provided it",
                    "type": "string"
                },
                "publications": {
                    "description": "connects an association to publications supporting the association",
                    "items": {
                        "type": "string"
                    },
                    "type": "array"
                },
                "qualifiers": {
                    "description": "connects an association to qualifiers that modify or qualify the meaning of that association",
                    "items": {
                        "type": "string"
                    },
                    "type": "array"
                },
                "quantifier_qualifier": {
                    "description": "A measurable quantity for the object of the association",
                    "type": "string"
                },
                "relation": {
                    "description": "the relationship type by which a subject is connected to an object in an association",
                    "type": "string"
                },
                "stage_qualifier": {
                    "description": "stage at which expression takes place",
                    "type": "string"
                },
                "subject": {
                    "description": "connects an association to the subject of the association. For example, in a gene-to-phenotype association, the gene is subject and phenotype is object.",
                    "type": "string"
                },
                "subject_extensions": {
                    "description": "Additional relationships that are true of the subject in the context of the association. For example, if the subject is a gene product in a functional association, the subject extensions may represent  an isoform or a specific post-translational state",
                    "items": {
                        "type": "string"
                    },
                    "type": "array"
                }
            },
            "required": [],
            "title": "GeneToExpressionSiteAssociation",
            "type": "object"
        },

FAQ

Why not use X as the modeling framework?

Why invent our own yaml and not use JSON-Schema, SQL, UML, ProtoBuf, OWL, ...

each of these is tied to a particular formalisms. E.g. JSON-Schema to trees. OWL to open world logic. There are various impedance mismatches in converting between these. The goal was to develop something simple and more general that is not tied to any one serialization format or set of assumptions.

There are other projects with similar goals, e.g https://github.com/common-workflow-language/schema_salad

It may be possible to align with these.

Why not use X as the datamodel

Here X may be bioschemas, some upper ontology (BioTop), UMLS metathesaurus, bio*, various other attempts to model all of biology in an object model.

Currently as far as we know there is no existing reference datamodel that is flexible enough to be used here.

Make and build instructions

Note: the make file requires jsonschema2pojo -- see https://github.com/joelittlejohn/jsonschema2pojo. If you are on a Mac, it can be installed using brew.