# Subsetting Queries

This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).

This notebook provides examples of subsetting and graph operations.

For more background, see:

- [OAK Expression Language](https://incatools.github.io/ontology-access-kit/howtos/use-oak-expression-language.html)
- [Relationships and Graphs](https://incatools.github.io/ontology-access-kit/guide/relationships-and-graphs.html)



## Set up an alias

For convenience we will set up some aliases for use in this notebook


In [1]:
alias cl runoak -i sqlite:obo:cl

## Example Terms

We'll pick a few example terms from immune and nervous systems as examples:

In [3]:
cl info CL:0000601  CL:0017006 CL:0002128

CL:0000601 ! cochlear outer hair cell
CL:0017006 ! B-lymphoblast
CL:0002128 ! Tc17 cell


In [4]:
!mkdir -p output

## Visualization (no subsetting)

First we'll visualize the graph and its is-a closure

In [6]:
cl viz -p i CL:0000601  CL:0017006 CL:0002128 -o output/cl-example-3-terms.png

![img](output/cl-example-3-terms.png)

__TODO__: regenerate after this is fixed: https://github.com/obophenotype/cell-ontology/issues/2923

We can see that there are a lot of terms between each highlighted seed term and root. This notebook will explore
ways of reducing that space.

First, we will show the `--gap-fill` option in the `viz` command, as well as `--add-mrcas`.

 * `--add-mrcas` will extend the seed set with all MRCAs of all combinations of seeds
 * `--gap-fill` will traverse intermediate nodes not in the extended seed set

In [17]:
cl viz --gap-fill --add-mrcas -p i CL:0000601  CL:0017006 CL:0002128 -o output/cl-example-3-terms-mrca.png

![img](output/cl-example-3-terms-mrca.png)

This looks a little imbalanced, we can balance it out with another neuron

In [20]:
cl viz --gap-fill --add-mrcas -p i CL:4023050 CL:0000601  CL:0017006 CL:0002128 -o output/cl-example-4-terms-mrca.png

![img](output/cl-example-4-terms-mrca.png)

The `viz` command is doing a lot of work here. We'll explore how this can be broken down and composed more flexibly

## Most Recent Common Ancestors (MRCAs)

The `.mrca` expression finds the MRCA of *all* specified terms. Like many OAK commands and expressions, it is parameterized by predicate. This is important as we frequently want to use other relations such as part-of here.

Note that we have to provide the list within `[ ... ]` brackets (the surrounding space is important for unix reasons)

In [11]:
cl info .mrca//p=i [ CL:0000601  CL:0017006 CL:0002128 ]

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
CL:0000255 ! eukaryotic cell
BFO:0000002 ! None


Let's ignore the strange BFO class for now. The MRCA of our terms (from neuron and lymphocyte branches) is a very high level
_eukaryotic cell_, indicating our selected terms don't have much in common.

We can restrict this to the two immune cells:

In [12]:
cl info .mrca//p=i [ CL:0017006 CL:0002128 ]

CL:0000542 ! lymphocyte


This makes sense, this is just a pairwise MRCA.

If instead we want *all* pairwise MRCAs we can use `.multimrca`:

In [14]:
cl info .mrca//p=i [ CL:0000601 CL:0002128 ]

CL:0000255 ! eukaryotic cell
BFO:0000002 ! None


In [15]:
cl info .multimrca//p=i [ CL:0000601  CL:0017006 CL:0002128 ]

CL:0000542 ! lymphocyte
BFO:0000002 ! None
CL:0000255 ! eukaryotic cell


We can also use the `reflexive` parameter (which takes boolean in YAML syntax as value) to include the initial terms.
This is useful for composing operations together:

In [16]:
cl info .multimrca//p=i//reflexive=true [ CL:0000601  CL:0017006 CL:0002128 ]

BFO:0000002 ! None
CL:0000255 ! eukaryotic cell
CL:0000542 ! lymphocyte
CL:0000601 ! cochlear outer hair cell
CL:0017006 ! B-lymphoblast
CL:0002128 ! Tc17 cell


Next we can use the output of the expression as an input to `viz`, with `--gap-fill` on:

In [24]:
cl viz -p i --gap-fill .multimrca//p=i//reflexive=true [ CL:0000601  CL:0017006 CL:0002128 ] -o output/cl-example-combined.png

![img](output/cl-example-combined.png)

This is the same as before, but we have broken this out into a query expression, which can be combined with others:


In [26]:
cl viz -p i --gap-fill .multimrca//p=i//reflexive=true [ CL:0000601  CL:0017006 CL:0002128 ] .minus BFO:0000002 -o output/cl-example-filtered.png

![img](output/cl-example-filtered.png)

Much cleaner!

## ENVO subsets use case

In [27]:
alias envo runoak -i sqlite:obo:envo

In [28]:
envo info .idfile input/water_env_local_scale.tsv

ENVO:00012408 ! aquifer
ENVO:00000067 ! cave
ENVO:00000076 ! mine
ENVO:00000025 ! reservoir
ENVO:00000153 ! headwater
ENVO:02000145 ! subterranean lake
ENVO:03600052 ! water tap
ENVO:01000142 ! wood fall
ENVO:01000140 ! whale fall
ENVO:01001871 ! pit
ENVO:00000055 ! saline evaporation pond
ENVO:01000002 ! water well
ENVO:00000044 ! peatland
ENVO:00002034 ! biofilm
ENVO:00000035 ! marsh
ENVO:03600074 ! aquaculture farm
ENVO:00000133 ! glacier
ENVO:00000054 ! saline marsh
ENVO:00000244 ! abyssal plain
ENVO:00001997 ! acid mine drainage
ENVO:00000114 ! agricultural field
ENVO:01001072 ! anoxic lake
ENVO:00000220 ! archipelago
ENVO:00000091 ! beach
ENVO:00000218 ! black smoker
ENVO:00000057 ! mangrove swamp
ENVO:01000687 ! coast
ENVO:01000263 ! cold seep
ENVO:01000298 ! continental margin
ENVO:00000150 ! coral reef
ENVO:03600071 ! cyanobacterial bloom
ENVO:02000139 ! desert spring
ENVO:00002131 ! epilimnion
ENVO:00000045 ! estuary
ENVO:00000039 ! fjord
ENVO:00000255 ! flood plain
ENVO:0000

In [32]:
envo viz -p i,p .idfile input/water_env_local_scale.tsv -o output/envo-water-local.png

![img](output/envo-water-local.png)

In [33]:
envo viz --gap-fill --add-mrcas -p i,p .idfile input/water_env_local_scale.tsv -o output/envo-water-local-mrcas.png

![img](output/envo-water-local-mrcas.png)

In [43]:
!runoak -i sqlite:obo:envo viz --gap-fill -p i,p \
  .multimrca//p=i,p [ .idfile input/water_env_local_scale.tsv ] \
  .minus [ system object l~astronomical layer planet 'environmental system' ] \
  .or [ 'manufactured product' ] \
  -o output/envo-water-local-filtered.png

![img](output/envo-water-local-filtered.png)