Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the difference between the 4 released OWL files #926

Open
dhimmel opened this issue Dec 21, 2020 · 4 comments
Open

Document the difference between the 4 released OWL files #926

dhimmel opened this issue Dec 21, 2020 · 4 comments
Assignees
Projects

Comments

@dhimmel
Copy link
Contributor

dhimmel commented Dec 21, 2020

EFO Release v3.25.0 contains the following OWL files:

  • efo-base.owl 60.8 MB
  • efo.owl 175 MB
  • efo_otar_profile.owl 147 MB
  • efo_otar_slim.owl 130 MB

I am curious as to how these files differ and haven't been able to find much information.

From an OpenTargets blog post:

we curated an extensive list of therapeutic areas that reflect the most appropriate body system, and therefore slimmed the ontology to ignore higher order terms (e.g. disease by anatomical system). The result is an EFO3-derived Open Targets Platform-specific profile-ontology which will be automatically generated with every monthly EFO release.

From opentargets/OnToma:

The ontology we use in the Open Targets platform is a subset (aka. slim) of the EFO ontology plus any HPO terms for which a valid EFO mapping could not be found.

Is there any other documentation I'm missing?

@ravwojdyla
Copy link

@dhimmel looking at the 3.11.0 release, the 1st release that contains those files there is some info:

For Open Targets, we have also generated an Open Targets profile (which contains all of EFO with the new Open Targets therapeutic areas) and slim file (which contains just Open Targets therapeutic areas). Both are also attached to this release.

@dhimmel
Copy link
Contributor Author

dhimmel commented Jan 14, 2021

EFO vs EFO-OTAR node comparison

I compared efo.owl and efo_otar_profile.owl from v3.25.0 and found that EFO-OTAR adds 10 nodes and removes 57 nodes from EFO.

Nodes added by EFO-OTAR

Here are the nodes EFO-OTAR adds (in purple outline) and their ancestors:

image

identifier name depth n_ancestors n_descendants ic_resnik ic_sanchez uri
MONDO:0018797 None 5 6 1 0.98 1.00 http://purl.obolibrary.org/obo/MONDO_0018797
OTAR:0000010 respiratory or thoracic disease 4 5 1147 0.50 0.31 http://www.ebi.ac.uk/efo/OTAR_0000010
OTAR:0000019 familial disease 5 6 1 0.98 1.00 http://www.ebi.ac.uk/efo/OTAR_0000019
OTAR:0000008 other 4 5 1 0.98 1.00 http://www.ebi.ac.uk/efo/OTAR_0000008
OTAR:0000018 genetic, familial or congenital disease 4 5 7927 0.29 0.12 http://www.ebi.ac.uk/efo/OTAR_0000018
OTAR:0000003 cyst 5 6 1 0.98 1.00 http://www.ebi.ac.uk/efo/OTAR_0000003
OTAR:0000014 pregnancy or perinatal disease 4 5 120 0.71 0.53 http://www.ebi.ac.uk/efo/OTAR_0000014
OTAR:0000009 injury, poisoning or other complication 4 5 117 0.71 0.53 http://www.ebi.ac.uk/efo/OTAR_0000009
OTAR:0000017 reproductive system or breast disease 4 5 859 0.53 0.34 http://www.ebi.ac.uk/efo/OTAR_0000017
OTAR:0000006 musculoskeletal or connective tissue disease 4 5 3002 0.39 0.21 http://www.ebi.ac.uk/efo/OTAR_0000006

One question I have is what is the purpose of adding "familial disease", "other", "cyst", since these are all leaf nodes? Are they actually a helpful way for OpenTargets to categorize disease? CC @d0choa. MONDO:0018797 also has no descendants, but appears to be a relic, soon to be removed, as per #938.

Nodes removed by EFO-OTAR

Here are the nodes EFO-OTAR removes (in purple outline) and their ancestors:

image

Expand for removed nodes table
identifier name depth n_ancestors n_descendants ic_resnik ic_sanchez uri
MONDO:0044999 scalp disease 7 8 8 0.95 0.80 http://purl.obolibrary.org/obo/MONDO_0044999
MONDO:0021017 synaptopathy 6 7 13 0.93 0.75 http://purl.obolibrary.org/obo/MONDO_0021017
MONDO:0019038 rare maxillo-facial surgical disease 8 16 222 0.75 0.47 http://purl.obolibrary.org/obo/MONDO_0019038
MONDO:0043786 serositis 5 6 10 0.94 0.77 http://purl.obolibrary.org/obo/MONDO_0043786
MONDO:0044974 disease of supramolecular complex 6 7 389 0.62 0.42 http://purl.obolibrary.org/obo/MONDO_0044974
MONDO:0021635 neurocristopathy 5 8 134 0.75 0.52 http://purl.obolibrary.org/obo/MONDO_0021635
MONDO:0044969 disease of membrane bound organelle 6 7 403 0.62 0.41 http://purl.obolibrary.org/obo/MONDO_0044969
MONDO:0021668 disorder involving pain 4 5 13 0.90 0.75 http://purl.obolibrary.org/obo/MONDO_0021668
EFO:1000755 pigmentation disease 6 11 117 0.78 0.53 http://www.ebi.ac.uk/efo/EFO_1000755
MONDO:0044980 disease of signal transduction 6 7 125 0.73 0.53 http://purl.obolibrary.org/obo/MONDO_0044980
MONDO:0044979 disease by cell type 6 7 506 0.60 0.39 http://purl.obolibrary.org/obo/MONDO_0044979
MONDO:0021197 disease by cellular component affected 5 6 1339 0.49 0.29 http://purl.obolibrary.org/obo/MONDO_0021197
MONDO:0024623 otorhinolaryngologic disease 6 7 337 0.64 0.43 http://purl.obolibrary.org/obo/MONDO_0024623
MONDO:0044975 disease of transporter activity 6 7 74 0.77 0.58 http://purl.obolibrary.org/obo/MONDO_0044975
MONDO:0024627 phagocytic cell dysfunction 7 8 47 0.83 0.62 http://purl.obolibrary.org/obo/MONDO_0024627
MONDO:0002436 nasal disorder 7 10 40 0.89 0.64 http://purl.obolibrary.org/obo/MONDO_0002436
MONDO:0021073 paraneoplastic syndrome 5 6 9 0.93 0.78 http://purl.obolibrary.org/obo/MONDO_0021073
MONDO:0018652 biological anomaly without phenotypic characterization 5 6 4 0.96 0.86 http://purl.obolibrary.org/obo/MONDO_0018652
MONDO:0044989 foot disease 6 7 10 0.93 0.77 http://purl.obolibrary.org/obo/MONDO_0044989
MONDO:0044987 face disease 7 8 1719 0.50 0.27 http://purl.obolibrary.org/obo/MONDO_0044987
MONDO:0020683 acute disease 4 5 89 0.75 0.56 http://purl.obolibrary.org/obo/MONDO_0020683
MONDO:0021195 disease by cellular process disrupted 5 6 2008 0.45 0.25 http://purl.obolibrary.org/obo/MONDO_0021195
EFO:0000524 head and neck disorder 5 6 2103 0.45 0.25 http://www.ebi.ac.uk/efo/EFO_0000524
EFO:0009470 soft tissue disease 4 5 124 0.72 0.53 http://www.ebi.ac.uk/efo/EFO_0009470
MONDO:0024317 chronic pain syndrome 5 6 6 0.95 0.82 http://purl.obolibrary.org/obo/MONDO_0024317
EFO:0000405 digestive system disease 5 6 1236 0.51 0.30 http://www.ebi.ac.uk/efo/EFO_0000405
MONDO:0021670 post-infectious syndrome 5 7 2 0.99 0.93 http://purl.obolibrary.org/obo/MONDO_0021670
MONDO:0017368 systemic disease with skin involvement 6 7 42 0.83 0.63 http://purl.obolibrary.org/obo/MONDO_0017368
MONDO:0021196 disease by molecular activity disrupted 5 6 251 0.65 0.46 http://purl.obolibrary.org/obo/MONDO_0021196
Orphanet:79389 Premature aging 5 6 83 0.75 0.57 http://www.orpha.net/ORDO/Orphanet_79389
MONDO:0021147 disorder of development or morphogenesis 4 5 3827 0.36 0.19 http://purl.obolibrary.org/obo/MONDO_0021147
MONDO:0044977 disease of receptor activity 6 7 7 0.95 0.81 http://purl.obolibrary.org/obo/MONDO_0044977
MONDO:0017261 systemic diseases with panuveitis 6 7 6 0.96 0.82 http://purl.obolibrary.org/obo/MONDO_0017261
EFO:0009714 chronic disease 4 5 107 0.73 0.54 http://www.ebi.ac.uk/efo/EFO_0009714
MONDO:0021674 post-viral disorder 5 6 56 0.79 0.61 http://purl.obolibrary.org/obo/MONDO_0021674
MONDO:0002254 syndromic disease 4 5 2541 0.39 0.23 http://purl.obolibrary.org/obo/MONDO_0002254
MONDO:0021673 post-bacterial disorder 5 6 1 0.98 1.00 http://purl.obolibrary.org/obo/MONDO_0021673
EFO:0009903 inflammatory disease 4 5 597 0.57 0.37 http://www.ebi.ac.uk/efo/EFO_0009903
MONDO:0021199 disease by anatomical system 4 5 10922 0.27 0.09 http://purl.obolibrary.org/obo/MONDO_0021199
MONDO:0005042 head disease 6 7 2012 0.47 0.25 http://purl.obolibrary.org/obo/MONDO_0005042
MONDO:0024626 defective phagocytic cell engulfment 6 10 8 0.96 0.80 http://purl.obolibrary.org/obo/MONDO_0024626
MONDO:0044971 disease of macromolecular complex 6 7 155 0.72 0.51 http://purl.obolibrary.org/obo/MONDO_0044971
MONDO:0020595 disease of retroperitoneum 6 7 18 0.94 0.72 http://purl.obolibrary.org/obo/MONDO_0020595
EFO:0009479 throat disease 6 7 1 0.99 1.00 http://www.ebi.ac.uk/efo/EFO_0009479
MONDO:0017259 systemic diseases with anterior uveitis 6 7 13 0.92 0.75 http://purl.obolibrary.org/obo/MONDO_0017259
MONDO:0021016 channelopathy 7 8 57 0.81 0.60 http://purl.obolibrary.org/obo/MONDO_0021016
MONDO:0044965 abdominal and pelvic region disorder 5 6 977 0.53 0.32 http://purl.obolibrary.org/obo/MONDO_0044965
MONDO:0020012 systemic or rheumatic disease 4 5 312 0.62 0.44 http://purl.obolibrary.org/obo/MONDO_0020012
MONDO:0024505 disorder by anatomical region 4 5 4746 0.35 0.17 http://purl.obolibrary.org/obo/MONDO_0024505
MONDO:0015938 systemic disease 5 6 257 0.65 0.46 http://purl.obolibrary.org/obo/MONDO_0015938
MONDO:0044976 disease of catalytic activity 6 7 173 0.70 0.49 http://purl.obolibrary.org/obo/MONDO_0044976
MONDO:0017260 systemic diseases with posterior uveitis 7 8 4 0.98 0.86 http://purl.obolibrary.org/obo/MONDO_0017260
EFO:0009664 disease of orbital region 6 9 1481 0.52 0.28 http://www.ebi.ac.uk/efo/EFO_0009664
MONDO:0044967 limb disorder 5 6 69 0.78 0.58 http://purl.obolibrary.org/obo/MONDO_0044967
MONDO:0044990 hand disease 6 7 6 0.95 0.82 http://purl.obolibrary.org/obo/MONDO_0044990
EFO:0001058 sensory system disease 6 7 291 0.64 0.44 http://www.ebi.ac.uk/efo/EFO_0001058
MONDO:0021194 disease by subcellular system affected 4 5 2901 0.40 0.22 http://purl.obolibrary.org/obo/MONDO_0021194

Code

Code to produce these figures and tables is not yet available, but is based on nxontology. I hope to make the nxontology importer for EFO available soon.

@d0choa
Copy link

d0choa commented Jan 14, 2021

Thanks @dhimmel for the analysis. It's really useful. @zoependlington can provide more details.

From the Open Targets perspective, the background story behind the slim was that we wanted to align EFO to a more clinical interpretation. EFO has a lot of high-level organisational nodes that attend to anatomical characteristics (many of them can be seen on your analysis). However, they have little or no clinical value (e.g. disease by anatomical system). Instead, the top-nodes of the slim resemble other clinical classifications like Meddra.

In the process of reorganising the terms, a few terms have to be removed, relocated or split. You can find the logic behind most of the changes in the respective tickets. For the ones that you raised I found the next:

  • Familial disease. In this case, I believe the term was not populated. The ticket describes the actions to take but it's still open and probably never implemented.
  • Cyst. I suspect this is a similar issue. The Cyst term was never populated with the many cyst-related diseases contained in EFO.

@paolaroncaglia and @zoependlington can comment on these two.

Regarding Other, it's a placeholder for newly introduced terms in EFO that have no parentage relationship in the slim. We aimed to have it empty, as all diseases should be children of other root level terms (therapeutic areas). You can consider it an artefact of the process and we should eventually remove it.

@dhimmel
Copy link
Contributor Author

dhimmel commented Jan 15, 2021

Quoting @zoependlington from #927 (comment) regarding forced relationships in EFO-OTAR:

The forced relationships are defined in the subclasses templates file found in the temporary/working home of OTAR_profiler here: https://github.com/EBISPOT/otar_profiler

Just a note that the "final" version for use by Open Targets is the slim file, which only contains the therapeutic areas that are useful for annotating their data. The profile is our master EFO with a few extra terms, which will eventually be added to the master EFO file once we have completed the ongoing work with our profile and slim files to be compatible with the Open Targets pipelines and their needs.

Great to know about EBISPOT/otar_profiler. I see that otar_ta.sh is the script that creates efo_otar_profile.owl and efo_otar_slim.owl. allTAs.txt contains a list of therapeutic areas and newterms.tsv contains nodes added by EFO-OTAR.

Based on otar_ta.sh, it looks like efo_otar_slim.owl is derived from efo_otar_profile.owl by filtering to therapeutic areas and their descendants (via robot MIREOT --branch-from-terms. So this is is useful for OpenTargets which wants a hierarchy of diseases only without other parts of the ontology?

Regarding "the profile is our master EFO with a few extra terms, which will eventually be added to the master EFO file", does that mean the eventual plan is to take all the modifications in efo_otar_profile.owl and move them upstream to efo.owl? If so, does that mean efo_otar_profile.owl might eventually go away, because it would be the same as efo.owl? And does this also mean EFO intends to remove the "organisational nodes that attend to anatomical characteristics" in favor of EFO-OTAR's "clinical interpretation"?

Getting back to the original documentation request, it would be nice to have guidance in the README regarding when to use efo-base.owl, efo.owl, efo_otar_profile.owl, versus efo_otar_slim.owl. My current understanding is:

  1. efo-base.owl: use if you only want terms from the EFO namespace (subClassOf relationships might be incomplete?)
  2. efo.owl: use if you want the primary EFO release with terms from the EFO namespace and those imported from other ontologies
  3. efo_otar_profile.owl: use if you want the complete ontology, with modifications introduced by OpenTargets, which might eventually be adopted in efo.owl.
  4. efo_otar_slim.owl: use if you want an ontology of diseases rooted to therapeutic areas, as defined and used by OpenTargets

Is this understanding correct?

@zoependlington zoependlington added this to To Do in Zoë Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Zoë
To Do
Development

No branches or pull requests

4 participants