# Exploration of MAxO

The Medical Action Ontology (MAxO) (https://obofoundry.org/ontology/maxo) is an ontology of medical procedures, interventions, therapies, and treatments for disease with an emphasis on rare disease (RD).

This notebooks shows an exploration of the contents of MAxO. It focuses on the ontology content, but as
annotations are added we will include these.

## Setup

We will use the sqlite version of MAxO:

In [1]:
%alias maxo runoak -i sqlite:obo:maxo

In [2]:
maxo ontology-metadata --all

dce:creator:
- Leigh Carmody
- Peter Robinson
dce:description:
- An ontology to represent medically relevant actions, procedures, therapies, interventions,
  and recommendations.
dce:title:
- Medical Action Ontology
dcterms:license:
- <https://creativecommons.org/licenses/by/4.0/>
id:
- obo:maxo.owl
owl:versionIRI:
- obo:maxo/releases/2022-12-09/maxo.owl
owl:versionInfo:
- '2022-12-09'
rdf:type:
- owl:Ontology
rdfs:isDefinedBy:
- http://purl.obolibrary.org/obo/obo.owl
schema:url:
- http://purl.obolibrary.org/obo/maxo.owl
sh:prefix:
- obo


## Root terms

We will query for roots (all terms without an is-a parent)

This reveals a few dangling classes that should be cleared up

In [3]:
maxo roots -p i

BFO:0000001 ! entity
CARO:0000003 ! None
CARO:0000006 ! None
CARO:0001001 ! None
CARO:0001010 ! None
CARO:0010000 ! None
CL:0000000 ! cell
CL:0017500 ! neutrophillic cytoplasm
CL:0017502 ! acidophilic cytoplasm
CL:0017503 ! basophilic cytoplasm
CL:0017504 ! polychromatophilic cytoplasm
CL:0017505 ! increased nucleus size
HP:0000001 ! All
MAXO:0000001 ! medical action
MAXO:0000058 ! pharmacotherapy
NCBITaxon:110815 ! None
NCBITaxon:147099 ! None
NCBITaxon:147554 ! None
NCBITaxon:189497 ! None
NCBITaxon:3176 ! None
NCBITaxon:3312 ! None
NCBITaxon:33630 ! None
NCBITaxon:33682 ! None
NCBITaxon:3378 ! None
NCBITaxon:38254 ! None
NCBITaxon:4891 ! None
NCBITaxon:4895 ! None
NCBITaxon:4896 ! None
NCBITaxon:4932 ! None
NCBITaxon:5782 ! None
NCBITaxon:Union_0000023 ! None
SO:0000110 ! sequence_feature


### Roots in the MAXO id space

As can be seen above there are only two root nodes in the MAXO namespace

We can also query for MAXO-roots, i.e classes in MAXO that have no MAXO is-a parent:

In [4]:
maxo roots -p i --has-prefix MAXO

MAXO:0000001 ! medical action
MAXO:0000058 ! pharmacotherapy


## Visualization

You might be wondering why all of these collection and biopsy classes show as MAXO roots.

We can use the `viz` command to see the full ancestry.

For now, we will restrict to is-a parents (`-p i`):

In [5]:
maxo viz -p i 'biopsy of thymus' 'biopsy of thyroid gland' -o output/maxo-biopsy.png

![img](output/maxo-biopsy.png)

We can see there is a "striping" pattern, MAXO isa OBI isa MAXO

## Upper level

We can use the `tree` command to explore the upper level:



In [6]:
maxo tree -p i MAXO:0000001 MAXO:0000058 MAXO:0000002 MAXO:0000003 MAXO:0000013 MAXO:0000017 MAXO:0000021 MAXO:0001014

* [] **MAXO:0000001 ! medical action**
    * [i] **MAXO:0000002 ! therapeutic procedure**
    * [i] **MAXO:0000003 ! diagnostic procedure**
    * [i] **MAXO:0000013 ! complementary and alternative medical therapy**
    * [i] **MAXO:0000017 ! preventative therapeutics**
    * [i] **MAXO:0000021 ! palliative care**
    * [i] **MAXO:0001014 ! medical action avoidance**


## Summary Statistics

We can get summary statistics using the `statistics` command.

Note that like many ontologies, MAXO comes "bundled" with other ontologies, so it's important to pass in
`--has-prefix` to restrict summary statistics only to things in the MAXO space.


In [7]:
maxo statistics --has-prefix MAXO > output/maxo-stats.yaml

let's take a look at the output. It's quite large so we'll use the unix `head` command:

In [9]:
!head output/maxo-stats.yaml

annotation_property_count: 0
class_count: 1440
class_count_with_text_definitions: 1373
class_count_without_text_definitions: 67
contributor_summary:
  <http://orcid.org/0000-0001-7941-2961>:
    contributor_id: <http://orcid.org/0000-0001-7941-2961>
    role_counts:
      oio:created_by:
        facet: oio:created_by


hmm, maybe you'd prefer a tabular view?

We can do that with `-O csv` (the `-O` is short for `--output-type`, which is available
for more OAK commands)

In [19]:
maxo statistics --has-prefix MAXO -O csv -o output/maxo-stats.tsv

if you are running this on the command line yourself you can then browse the output
using whatever method you prefer, e.g. load into excel.

In a Jupyter notebook, pandas dataframes show up nicely:

In [20]:
import pandas as pd

In [21]:
df = pd.read_csv("output/maxo-stats.tsv", sep="\t").fillna("")
df

Unnamed: 0,id,compared_with,agents,class_count,deprecated_class_count,non_deprecated_class_count,class_count_with_text_definitions,class_count_without_text_definitions,object_property_count,annotation_property_count,...,synonym_statement_count_by_predicate_hasExactSynonym,synonym_statement_count_by_predicate_hasNarrowSynonym,synonym_statement_count_by_predicate_hasRelatedSynonym,mapping_statement_count_by_predicate_hasDbXref,was_generated_by_started_at_time,was_generated_by_was_associated_with,was_generated_by_acted_on_behalf_of,ontologies_id,ontologies_version,ontologies_version_info
0,AllOntologies,[],[],1440,53,1387,1373,67,4,0,...,4246,97,19,552,2022-12-18T12:27:10.229667,OAK,cjm,obo:maxo.owl,obo:maxo/releases/2022-12-09/maxo.owl,2022-12-09


the default TSV output for most OAK commands follows a TidyData format, with one row per item.

Let's unpivot this for easy viewing, using pandas `melt`:

In [23]:
df.melt()

Unnamed: 0,variable,value
0,id,AllOntologies
1,compared_with,[]
2,agents,[]
3,class_count,1440
4,deprecated_class_count,53
5,non_deprecated_class_count,1387
6,class_count_with_text_definitions,1373
7,class_count_without_text_definitions,67
8,object_property_count,4
9,annotation_property_count,0


Note that the original YAML report included many "nested" summaries, where counts were broken down by a facet - e.g. edge counts:

```yaml
edge_count_by_predicate:
  BFO:0000051:
    facet: BFO:0000051
    filtered_count: 8
  MAXO:0000521:
    facet: MAXO:0000521
    filtered_count: 20
  MAXO:0000864:
    facet: MAXO:0000864
    filtered_count: 134
...
```

These get flattened ("denormalized") in the conversion to TSV, creating a new column for each facet, but this isn't done for all objects since this would make too wide a table (e.g. the contributor report, some ontologies may have 100s of contributors).

The fields names may be verbose but they are intended to be unambiguous. One problem with ontology stats reporting is that there is often built in ambiguity; e.g what does "number of terms" mean?

- number of classes
- number of classes plus relations (object properties?)
- are obsolete classes included? what about merged classes?

If you want to look up documentation for any of these fields, just prefix with `https://incatools.github.io/ontology-access-kit/datamodels/summary-statistics/`

E.g. https://incatools.github.io/ontology-access-kit/datamodels/summary-statistics/non_deprecated_class_count.html

(in future these elements may have their own w3id URIs)

Note this only works for "non-denormalized" fields - to interpret "edge_count_by_predicate_BFO:0000051" you must split this into the base statistic `edge_count_by_predicate` and the entity (BFO:0000051, aka has-part). Or you can simply look at the native normalized YAML output.

## Summarizing by branches

We can pass in a list of branches:

In [24]:
maxo statistics --has-prefix MAXO -O csv -o output/maxo-stats-roots.tsv MAXO:0000001 MAXO:0000058 MAXO:0000002 MAXO:0000003 MAXO:0000013 MAXO:0000017 MAXO:0000021 MAXO:0001014

In [32]:
df = pd.read_csv("output/maxo-stats-roots.tsv", sep="\t").fillna("")
df

Unnamed: 0,id,compared_with,agents,class_count,deprecated_class_count,non_deprecated_class_count,class_count_with_text_definitions,class_count_without_text_definitions,object_property_count,annotation_property_count,...,edge_count_by_predicate_MAXO:0000864,edge_count_by_predicate_MAXO:0001015,edge_count_by_predicate_MAXO:0001027,edge_count_by_predicate_RO:0002233,edge_count_by_predicate_rdfs:subClassOf,synonym_statement_count_by_predicate_hasBroadSynonym,synonym_statement_count_by_predicate_hasExactSynonym,synonym_statement_count_by_predicate_hasNarrowSynonym,synonym_statement_count_by_predicate_hasRelatedSynonym,mapping_statement_count_by_predicate_hasDbXref
0,medical action,[],[],1145,0,1145,1137,8,0,0,...,6.0,9.0,76.0,441.0,1706,18.0,2814,78.0,10.0,369.0
1,pharmacotherapy,[],[],256,0,256,248,8,0,0,...,134.0,,,,421,12.0,1517,19.0,7.0,196.0
2,therapeutic procedure,[],[],870,0,870,862,8,0,0,...,6.0,,76.0,336.0,1294,17.0,2218,54.0,8.0,284.0
3,diagnostic procedure,[],[],556,0,556,555,1,0,0,...,,,,246.0,889,2.0,1446,32.0,6.0,178.0
4,complementary and alternative medical therapy,[],[],6,0,6,6,0,0,0,...,,,,,6,,7,2.0,,8.0
5,preventative therapeutics,[],[],22,0,22,22,0,0,0,...,,,,,23,,46,,,14.0
6,palliative care,[],[],6,0,6,6,0,0,0,...,,,,,8,,7,8.0,,6.0
7,medical action avoidance,[],[],9,0,9,9,0,0,0,...,,9.0,,,9,,18,,,


Here each row is a branch (note that numbers are cumulative, so numbers for a parent include
numbers for children, even if there is a separate row for that child).

Note that you will never see deprecated classes when using this option, if the deprecated classes are not in the hierarchy (as is conventional for OBO ontologies)

We may prefer to view the transposed form:

In [29]:
df.convert_dtypes().transpose()

Unnamed: 0,0,1,2,3,4,5,6,7
id,medical action,pharmacotherapy,therapeutic procedure,diagnostic procedure,complementary and alternative medical therapy,preventative therapeutics,palliative care,medical action avoidance
compared_with,[],[],[],[],[],[],[],[]
agents,[],[],[],[],[],[],[],[]
class_count,1145,256,870,556,6,22,6,9
deprecated_class_count,0,0,0,0,0,0,0,0
non_deprecated_class_count,1145,256,870,556,6,22,6,9
class_count_with_text_definitions,1137,248,862,555,6,22,6,9
class_count_without_text_definitions,8,8,8,1,0,0,0,0
object_property_count,0,0,0,0,0,0,0,0
annotation_property_count,0,0,0,0,0,0,0,0
