# OAK mappings command

This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).

This notebook provides examples for the `mappings` command, which can be used to lookup mappings that are bundled with ontologies.

Overall background on the concepts here can be found in the [OAK Guide to Mappings](https://incatools.github.io/ontology-access-kit/guide/mappings.html).

## Help Option

You can get help on any OAK command using `--help`

In [1]:
!runoak mappings --help

Usage: runoak mappings [OPTIONS] [TERMS]...

  List all mappings encoded in the ontology

  Example:

      runoak -i sqlite:obo:envo mappings

  The default output is SSSOM YAML. To use the (canonical) csv format:

      runoak -i sqlite:obo:envo mappings -O sssom

  By default, labels are not included. Use --autolabel to include labels (but
  note that if the label is not in the source ontology, then no label will be
  retrieved)

      runoak -i sqlite:obo:envo mappings -O sssom

  To constrain the mapped object source:

      runoak -i sqlite:obo:foodon mappings -O sssom --maps-to-source
      SUBSET_SIREN

  Python API:

     https://incatools.github.io/ontology-access-kit/interfaces/mapping-
     provider

  Data model:

     https://w3id.org/oak/mapping-provider

Options:
  -o, --output FILENAME         Output file, e.g. obo file
  -O, --output-type TEXT        Desired output type
  --autolabel / --no-autolabel  If set, results will automatical

## Set up an alias

For convenience we will set up an alias for use in this notebook. This will allow us to use `uberon ...` rather than `runoak -i sqlite:obo:uberon ...` for the rest of the notebook.

We use Uberon as an example, as Uberon bundles a lot of diverse mappings. See [Uberon docs](https://obophenotype.github.io/uberon/bridges/).

In [2]:
alias uberon runoak -i sqlite:obo:uberon

## Direct mappings for a subject term

First we will look up the mappings for the Uberon term for the CA4 region of the hippocampus

In [4]:
uberon mappings UBERON:0003884

subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: DHBA:10300
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: DHBA

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: EFO:0002457
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: EFO

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: EMAPA:32771
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: EMAPA

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: FMA:75741
mapping_justification: semapv:UnspecifiedMatching
subject_label: CA4 field of hippocampus
subject_source: UBERON
object_source: FMA

---
subject_id: UBERON:0003884
predicate_id: oio:hasDbXref
object_id: HBA:12895
mapping_justification: se

The above YAML follow the SSSOM datamodel (https://w3id.org/sssom).

We can get the results back in SSSOM tsv format (this time querying for "brain"). Here we will view it via pandas:

In [11]:
uberon mappings UBERON:0000955 -o output/brain-mappings.tsv -O sssom

  df.replace("", np.nan, inplace=True)


In [13]:
import pandas as pd
df = pd.read_csv("output/brain-mappings.tsv", sep="\t", comment="#")
df

Unnamed: 0,subject_id,subject_label,predicate_id,object_id,object_label,mapping_justification,subject_source,object_source
0,UBERON:0000955,brain,oio:hasDbXref,AAO:0010478,,semapv:UnspecifiedMatching,UBERON,AAO
1,UBERON:0000955,brain,oio:hasDbXref,ABA:Brain,,semapv:UnspecifiedMatching,UBERON,ABA
2,UBERON:0000955,brain,oio:hasDbXref,BAMS:Br,,semapv:UnspecifiedMatching,UBERON,BAMS
3,UBERON:0000955,brain,oio:hasDbXref,BAMS:Brain,,semapv:UnspecifiedMatching,UBERON,BAMS
4,UBERON:0000955,brain,oio:hasDbXref,BILA:0000135,,semapv:UnspecifiedMatching,UBERON,BILA
5,UBERON:0000955,brain,oio:hasDbXref,BIRNLEX:796,,semapv:UnspecifiedMatching,UBERON,BIRNLEX
6,UBERON:0000955,brain,oio:hasDbXref,BTO:0000142,,semapv:UnspecifiedMatching,UBERON,BTO
7,UBERON:0000955,brain,oio:hasDbXref,CALOHA:TS-0095,,semapv:UnspecifiedMatching,UBERON,CALOHA
8,UBERON:0000955,brain,oio:hasDbXref,DHBA:10155,,semapv:UnspecifiedMatching,UBERON,DHBA
9,UBERON:0000955,brain,oio:hasDbXref,EFO:0000302,,semapv:UnspecifiedMatching,UBERON,EFO


If we are only interested in a particular source we can use `--maps-to-source` (`-M`).

E.g to filter to the Allen institute Developmental Human Brain Atlas (DHBA):

In [15]:
 uberon mappings UBERON:0000955 -M DHBA

subject_id: UBERON:0000955
predicate_id: oio:hasDbXref
object_id: DHBA:10155
mapping_justification: semapv:UnspecifiedMatching
subject_label: brain
subject_source: UBERON
object_source: DHBA


In theory all mappings should be to CURIEs registered in bioregistry.io, but in practice different ontologies may have
a number of ad-hoc unmapped targets,

## Mapping via reciprocal term

We can also query Uberon for mappings to an external term:

In [17]:
 uberon mappings DHBA:10155

subject_id: UBERON:0000955
predicate_id: oio:hasDbXref
object_id: DHBA:10155
mapping_justification: semapv:UnspecifiedMatching
subject_label: brain
subject_source: UBERON
object_source: DHBA


## Complex queries

Like most OAK commands, the `mapping` command can take lists of labels, lists of IDs, or even complex query terms (which might themselves involve graphs).

For example, we can look up mappings for all brain regions:

In [18]:
uberon mappings .desc//p=i,p brain -M ZFA -O sssom -o output/all-brain-zfa-mappings.tsv

  df.replace("", np.nan, inplace=True)


In [19]:
df = pd.read_csv("output/all-brain-zfa-mappings.tsv", sep="\t", comment="#")
df

Unnamed: 0,subject_id,subject_label,predicate_id,object_id,mapping_justification,subject_source,object_source
0,UBERON:0000007,pituitary gland,oio:hasDbXref,ZFA:0000118,semapv:UnspecifiedMatching,UBERON,ZFA
1,UBERON:0000203,pallium,oio:hasDbXref,ZFA:0000505,semapv:UnspecifiedMatching,UBERON,ZFA
2,UBERON:0000204,ventral part of telencephalon,oio:hasDbXref,ZFA:0000304,semapv:UnspecifiedMatching,UBERON,ZFA
3,UBERON:0000430,ventral intermediate nucleus of thalamus,oio:hasDbXref,ZFA:0000370,semapv:UnspecifiedMatching,UBERON,ZFA
4,UBERON:0000935,anterior commissure,oio:hasDbXref,ZFA:0001108,semapv:UnspecifiedMatching,UBERON,ZFA
...,...,...,...,...,...,...,...
280,UBERON:2005340,nucleus of the posterior recess,oio:hasDbXref,ZFA:0005340,semapv:UnspecifiedMatching,UBERON,ZFA
281,UBERON:2007001,dorso-rostral cluster,oio:hasDbXref,ZFA:0007001,semapv:UnspecifiedMatching,UBERON,ZFA
282,UBERON:2007002,ventro-rostral cluster,oio:hasDbXref,ZFA:0007002,semapv:UnspecifiedMatching,UBERON,ZFA
283,UBERON:2007003,ventro-caudal cluster,oio:hasDbXref,ZFA:0007003,semapv:UnspecifiedMatching,UBERON,ZFA


## Predicates

At the time of writing most ontologies bundle their mappings as oio:hasDbXref in the ontology. Some ontologies are starting to release richer SSSOM files. Other ontologies include both xref mappings and mappings with richer skos predicates as a part of the ontology release (this allows for backwards compatibility with tools that expect xrefs, but allows more modern tools to use the richer mappings).

We will use mondo as an example here

In [20]:
alias mondo runoak -i sqlite:obo:mondo

In [22]:
mondo mappings  MONDO:0000179 -M NCIT

ERROR:root:Skipping statements(subject=MONDO:0000179,predicate=skos:exactMatch,object=<https://omim.org/phenotypicSeries/PS256520>,value=None,datatype=None,language=None,); ValueError: <https://omim.org/phenotypicSeries/PS256520> is not a valid URI or CURIE
subject_id: MONDO:0000179
predicate_id: oio:hasDbXref
object_id: NCIT:C14089
mapping_justification: semapv:UnspecifiedMatching
subject_label: Neu-Laxova syndrome
subject_source: MONDO
object_source: NCIT

---
subject_id: MONDO:0000179
predicate_id: skos:exactMatch
object_id: NCIT:C14089
mapping_justification: semapv:UnspecifiedMatching
subject_label: Neu-Laxova syndrome
subject_source: MONDO
object_source: NCIT


Here we can see what appears to be a duplicate mapping - but this is on purpose, Mondo includes the xref for backwards compatibility, and the skos:exactMatch for more modern tools.

## Generating Mappings

The `lexmatch` command can be used to generate mappings between ontologies. This is a complex topic and is covered in the [OAK Guide to Mappings](https://incatools.github.io/ontology-access-kit/guide/mappings.html).

See also [OBO Academy section of lexmatch](https://oboacademy.github.io/obook/tutorial/lexmatch-tutorial/)

## Validating Mappings

See the [ValidateMappings](ValidateMappings.ipynb) notebook for details on how to validate mappings using rule-based and LLM methods.