Skip to content

Making a new subset

Raissa Meyer edited this page May 7, 2021 · 3 revisions

We export subsets of ENVO for users that need a slimmed down, but still logically sound version of the ontology.

Classes are added to these subsets by adding the [in_subset](http://www.geneontology.org/formats/oboInOwl#inSubset) annotation property with a tag corresponding the the slim name.

For example, all classes tagged with the "wwfBiome" tag will end up in the wwfBiome.owl subset in our /subsets/ release folder (staged in the envo/src/envo/subsets folder).

Adding new terms to an existing subset is just a question of adding the annotation property and tag.

Creating a new subset takes a little more, and is outlined below. For an example, please also see PR #1092.

Declare the Annotation Property

You'll need a new annotation property to be compatible with the OBO export. Look for where others are declared and add a new declaration for your subset:

Declaration(AnnotationProperty(<http://purl.obolibrary.org/obo/envo#[MY_SUBSET_TAG]>))

Add Annotation Property

Once declared, it's time to define the AP:

# Annotation Property: <http://purl.obolibrary.org/obo/envo#[MY_SUBSET_TAG]> (<http://purl.obolibrary.org/obo/envo#[MY_SUBSET_TAG]>)

AnnotationAssertion(rdfs:comment <http://purl.obolibrary.org/obo/envo#[MY_SUBSET_TAG]> "[MY_SUBSET_TAG]"^^xsd:string)
SubAnnotationPropertyOf(<http://purl.obolibrary.org/obo/envo#[MY_SUBSET_TAG]> <http://www.geneontology.org/formats/oboInOwl#SubsetProperty>)

Update Subsets list in Makefile

For the Make process to generate your subset, you'll need to append the name(s) of the new subset(s) in the section noted below.

# ----------------------------------------
# Subsets
# ----------------------------------------

## -- subset targets --
##
## By default this is the cross-product of SUBSETS x FORMATS
## Note we also include TSV as a format


SUBSETS = envo-basic EnvO-Lite-GSC envoEmpo envoAstro envoPolar envoEmpo envoOmics envoCesab environmental_hazards \
   biome-hierarchy astronomical-body-part-hierarchy material-hierarchy process-hierarchy [MY_SUBSET_TAG]


That's all you need to do to cue the subset creation for the next release.

⬇️ However, if you would like to test if the subset was built successfully or even use the subset you've created in further work before said release, continue reading below. ⬇️

NOTE: The docker section below will be generalised and separated later on.


Build ENVO subsets locally

In some cases you'll want to build the subsets locally to test if the changes you made are taking the right effect, or to use the subset.

We recommend using the ODK docker container for convenience.

Boot the docker container

With the docker container we can quickly set up an OBO-compliant ontology development environment using the INCATools Ontology Development Kit (ODK).

Pull the odkfull instance from dockerhub. You only need to do this when you need a new dockerfile or you need updates.

docker pull obolibrary/odkfull

Start the docker container

How much memory and CPUs you need depends on the task you are performing. For building the subsets 4-8 GB of RAM should be sufficient.

# Run a bash terminal in the container interactively with enough memory to handle your task.
docker run --cpus 5 --memory="8g" -it obolibrary/odkfull /bin/bash

## you will now be in your docker container with all ontology software ready at hand

## the following commands will run in the docker container

Clone and work on the ontology inside your docker container

# clone github envo repo to docker and enter the specified directory
git clone https://github.com/EnvironmentOntology/envo.git && cd envo/src/envo/

# MAKE SURE YOU'RE ON THE BRANCH YOU WANT TO BE ON - You'll be on master by default.
git checkout --track origin/[BRANCH_NAME] 
git log # to check if commit history is according to branch you're on

# set credentials to the ones you use for github
git config --global user.email "[EMAIL_YOU_USE_FOR_YOUR_GITHUB_ACCOUNT]" && git config --global user.name "[GITHUB_USER_NAME]"

# safe the credential for 10 hours
git config credential.helper cache 86400

Create your subset

ENVO's Makefile will do most of the routine for you. All it needs to know is which subsets you want, and which terms belong to that subset.

Instead of building the whole ontology, we can call all_subsets, thus making all the subsets specified in this line of the Makefile.

# make the subset files
make all_subsets

# check if the corresponding files have been created in the subsets directory (.json, .obo, .owl, .tsv)
ls subsets/

# check if the expected terms are in the file
# The file should include the term specified in the subset and its ENVO parent terms.
head subsets/envoPlastics.tsv
grep [ID_OF_TERM_IN_SUBSET] subsets/envoPlastics.owl

Double check that it all went smoothly / get your subset

To check that the subset's OWL file is made correctly, you can also get the OWL file to your local system and check that everything is as you'd expect it to be using Protégé or similar.

What you have done above only really happened in your docker container. To make sure the changes will show up on your ENVO branch, git add, commit, push the changes.

Now, pull that branch to your local machine (outside of the docker container) and check in Protégé that everything looks okay: You should see the subset terms and their ENVO parents inside the [SUBSET_NAME].owl file.

Once you have confirmed that the subset was successfully created, exit the docker container (type exit to leave docker) and start a pull request.

🏆