Skip to content

cmungall/linkml-phenopackets

Repository files navigation

Phenopackets EXPERIMENTAL linkml schema

Browse the autogenerated schema documentation here:

Note the linkml markdown rendering is still incomplete, for a full schema, see

What is this repo and how was it made?

This is an experiment in rendering the phenopackets in LinkML. It is NOT the official GA4GH phenopackets schema.

The intent is to demonstrate some of the tooling and integrative capabilities of LinkML over Protobuf, in particular:

  • Additional validation not possible in Protobuf, including:
    • required fields
    • ontology constraints
  • Generation of python classes and tooling
  • Export/Import phenopackets from to and from RDF
  • Semantic annotation of schemas
  • Cross-schema integration

The LinkML schema is generated using a script proto2linkml that converts from the Protobuf source. This makes use of specific conventions in the Protobuf source, such as the use of particular controlled keywords in // comments. As such, this code is not generalizable to other protobuf schemas.

Note we intentionally don't use out of band-info. Currently some records in the protobuf are undefined, so they will be undefined in the LinkML. We have not done additional curation based on the .rst docs, it comes from protobuf.

Ontology Enhancements

The file cv_terms.yaml is hand-curated, rather than derived from the YAML. It is based on: recommended ontologies from the official phenopackets repo. It makes use of dynamic enums which allows for more advanced ontology checking; for example:

  • Uberon anatomy terms must be found under the "anatomical entity branch"
  • HPO abnormality terms must be found under the "phenotypic abnormality branch"

etc

We also include constants.yaml which is a direct transform from the phenopackets-tools repo.

How to use this repo

You can browse the schema docs which are generated from the LinkML schema.

You can also explore the schema in the schema directory.

As part of the build process, we also validate and convert all canonical Phenopacket examples into YAML, JSON, and RDF.

You can also use the generated python classes in combination with the linkml-runtime. Note that we have NOT released this to PyPI to avoid confusion with official Phenopackets libraries, so to run this you will need to clone the repo and install it locally.

poetry install

There will also be demonstrator Jupyter notebooks here:

Validation

Use p3 validate to validate objects. This goes beyond what can be done with JSON-Schema alone, and includes ontology validation using OAK and CURIE validation using BioRegistry.

See this notebook

Repairing ontology terms

Phenopackets include ontology terms, which are liable to become stale.

This toolkit uses OAK to assist in auto-migration of obsoletes or stale labels.

See this notebook

Querying Phenopackets as RDF

TODO: Add documentation here

Using Phenopackets in conjunction with OAK

TODO: Add documentation here