# Command line OAK usage for ontology editors (Alpha)

## Background (read this carefully)

This guide is for usage of OAK on the command line with the *-edit* version of ontologies.

In general, most users should **never** use the edit version. The exception is if you are part of the
development team for that ontology, you may want to:

- do quick ad-hoc queries of the edit version
- apply automated changes with KGCL

**If this doesn't apply to you then this guide probably isn't for you!** See some of the other notebooks for examples of working with the release versions of ontologies.

In general, these kinds of operations are typically done with ROBOT, possibly with ad-hoc SPARQL queries.
But for certain kinds of operations, OAK may provide some advantages.

Another thing to bear in mind is that conventions for edit files differ, not least in which format they use.
OAK has different adaptors for different formats, *and these may vary in how complete they are*.

At this time, this guide is primarily for editors of:

- mondo
- go
- uberon
- other ontologies that use .obo for the edit version

If this doesn't cover you, don't worry, we will expand this guide later, but as yet the support for functional
syntax in OAK is incomplete.

Currently this guide is best used by a combination of a non-technical ontology editor working with someone
moderately technical (who need not have domain knowledge). The technical person can help with running commands on
the command line and how to automate certain kinds of tasks, with the editor guiding.

At least one person should have gone through [part 1 of the tutorial](https://incatools.github.io/ontology-access-kit/intro/tutorial01.html)

Note this guide doesn't require any python coding - it's all via the command line

## Local edit files

We assume that you have your project checked out and you are in `src/ontology` (if you don't know what that means, then this guide likely isn't for you - try some other notebooks in this repo!)

But for demo purposes we will download a copy:

In [2]:
!wget https://raw.githubusercontent.com/geneontology/go-ontology/master/src/ontology/go-edit.obo -O input/go-edit.obo

--2022-09-12 14:40:56--  https://raw.githubusercontent.com/geneontology/go-ontology/master/src/ontology/go-edit.obo
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 32434161 (31M) [text/plain]
Saving to: ‘input/go-edit.obo’


2022-09-12 14:40:59 (12.9 MB/s) - ‘input/go-edit.obo’ saved [32434161/32434161]



### Aliases

Next we will set up an alias to avoid repetitive typing

Note the `%alias` syntax is just in the context of a Jupyter notebook, and the path will only work for this notebook

On your command line, just type:

```bash
alias goedit -i simpleobo:$HOME/repos/go-ontology/src/ontology/go-edit.obo
```

or

```bash
alias mondoedit -i simpleobo:$HOME/repos/mondo/src/ontology/mondo-edit.obo
```

or

```bash
alias uberonedit -i simpleobo:$HOME/repos/uberon/src/ontology/uberon-edit.obo
```

This assumes you have a top level repo `repos` - change this accordingly

The `simpleobo` tells OAK to use the simple obo parser. this is slower than pronto but more forgiving of
issues with imports.

Note: some of this tutorial will work with ontologies with other source formats, but you can expect this
to be only partially completed. For now, obo is best supported, but this is temporary.

In [26]:
%alias goedit runoak -i simpleobo:input/go-edit.obo

In [27]:
goedit info nucleus

GO:0005634 ! nucleus


In [28]:
goedit tree nucleus -p i

* [] GO:0043229 ! intracellular organelle
    * [i] GO:0043231 ! intracellular membrane-bounded organelle
        * [i] **GO:0005634 ! nucleus**


**important**

note how the root of the tree appears to be "intracellular organelle"

This is because OAK assumes a *relaxed graph* (whether obo or owl). We can see the *structure* of the obo file:

```yaml
[Term]
id: GO:0043229
name: intracellular organelle
namespace: cellular_component
def: "Organized structure of distinctive morphology and function, occurring within the cell. Includes the nucleus, mitochondria, plastids, vacuoles, vesicles, ribosomes and the cytoskeleton. Excludes the plasma membrane." [GOC:go_curators]
subset: goslim_pir
intersection_of: GO:0043226 ! organelle
intersection_of: part_of GO:0005622 ! intracellular anatomical structure
```

We *intentionally* don't assert an is-a in the editors file, we let the reasoner work it out.

In future we may include a simple *structural reasoner* to OAK, equivalent to the structural reasoner in Protege,
that would [relax](https://robot.obolibrary.org/relax) the equivalence axiom to a SubClassOf.

However, looking at these relaxed structures can be misleading. For Protege editors we **always emphasize always
looking at the reasoned view** with the reasoner synchronized.

For now, it may be less misleading to show the actual structure of the asserted edit file. If you want to look at the reasoner view, run robot reason on the edit file and use that as the input into OAK (with conversion to sqlite for speed).

[In future this reasoning step may be performed more dynamically in OAK](https://github.com/INCATools/semantic-sql/issues/41) but for now, keep this limitation in mind.

## Querying logical definitions



In [29]:
goedit logical-definitions GO:0043229

definedClassId: GO:0043229
genusIds:
- GO:0043226
restrictions:
- fillerId: part_of
  propertyId: GO:0005622

---


In [30]:
goedit logical-definitions GO:0043229 -O csv

meta	definedClassId	genusIds	restrictions
None	GO:0043229	['GO:0043226']	[ExistentialRestrictionExpression(fillerId='part_of', propertyId='GO:0005622')]


## Applying changes

Next we will try applying some changes, using the [apply](https://incatools.github.io/ontology-access-kit/cli.html#runoak-apply) command

These changes will be specified using the KGCL language. You can also pass these changes in as
CSVs or YAML files conforming to the KGCL data model but for now we will restrict to KGCL DSL.

First let's try the rename command:

In [52]:
goedit apply "rename GO:0005634 from 'nucleus' to 'nuclear compartment'" -o input/go-edit-modified.obo

Note that we are specifying a path [input/go-edit-modified.obo](input/go-edit-modified.obo) (this link will only work if you are running the notebook locally).

As a sanity check, let's use the unix diff command to see if our apply command had any effect. We will use the `-u` option to see the diff as it would look in a GitHub PR

**if you don't know what that means or you aren't accustomed to looking at obo format diffs, this guide isn't for you**

In [53]:
!diff -u input/go-edit.obo input/go-edit-modified.obo 

--- input/go-edit.obo	2022-09-12 14:40:59.000000000 -0700
+++ input/go-edit-modified.obo	2022-09-12 17:18:31.000000000 -0700
@@ -53301,7 +53301,7 @@
 
 [Term]
 id: GO:0005634
-name: nucleus
+name: nuclear compartment
 namespace: cellular_component
 def: "A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell's chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent." [GOC:go_curators]
 subset: goslim_agr


Looks like our change too effect, great!

Next let's set up another alias. We are going to be working with this `go-edit-modified.obo` file, to avoid messing
up the original obo file -- this is just a convenience for working with the notebook and may not be something
you do in your own workflow:

In [54]:
%alias goedit2 runoak -i simpleobo:input/go-edit-modified.obo

Let's not try more changes. Let's put nucleus back, but as a synonym:

In [55]:
goedit2 apply "create synonym 'nucleus' for GO:0043229"



uh-oh, what does that mean?

It means that we didn't specify an output file or that changes should be saved in place, so our change had no
material affect. This probably isn't what we want.

But OAK follows best practice command line tooling and will avoid doing things like modifying input files **unless you specifically ask it to**, which we can with the `--overwrite` global option, that will save in place.

In [56]:
goedit2 --overwrite apply "create exact synonym 'nucleus' for GO:0043229"

Note this is a *global* option so it comes before the `apply` subcommand.

Our changes will be saved in place, and remember `goedit2` is aliased to use go-edit-*modified*.obo

Let's look at cumulative changes from the original:

In [57]:
!diff -u input/go-edit.obo input/go-edit-modified.obo 

--- input/go-edit.obo	2022-09-12 14:40:59.000000000 -0700
+++ input/go-edit-modified.obo	2022-09-12 17:18:45.000000000 -0700
@@ -53301,7 +53301,7 @@
 
 [Term]
 id: GO:0005634
-name: nucleus
+name: nuclear compartment
 namespace: cellular_component
 def: "A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell's chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent." [GOC:go_curators]
 subset: goslim_agr
@@ -241775,6 +241775,7 @@
 subset: goslim_pir
 intersection_of: GO:0043226 ! organelle
 intersection_of: part_of GO:0005622 ! intracellular anatomical structure
+synonym: "nucleus" EXACT []
 
 [Term]
 id: GO:0043230


In [58]:
goedit2 --autosave apply "change relationship between vacuole and cytoplasm from part_of to is_a"

In [59]:
!diff -u input/go-edit.obo input/go-edit-modified.obo 

--- input/go-edit.obo	2022-09-12 14:40:59.000000000 -0700
+++ input/go-edit-modified.obo	2022-09-12 17:18:52.000000000 -0700
@@ -53301,7 +53301,7 @@
 
 [Term]
 id: GO:0005634
-name: nucleus
+name: nuclear compartment
 namespace: cellular_component
 def: "A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell's chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent." [GOC:go_curators]
 subset: goslim_agr
@@ -54683,7 +54683,7 @@
 synonym: "vacuolar carboxypeptidase Y" RELATED []
 xref: Wikipedia:Vacuole
 is_a: GO:0043231 ! intracellular membrane-bounded organelle
-relationship: part_of GO:0005737 ! cytoplasm
+is_a: GO:0005737
 
 [Term]
 id: GO:0005774
@@ -241775,6 +241775,7 @@
 subset: goslim_pir
 intersection_of: GO:0043226 ! organelle
 i

## Performing KGCL Diffs

So far we have performed diffing using the unix diff command. It's never a bad idea to look at these diffs,
as they reflect differences at the level of how the file is stored, and you will need to interpret these diffs
to properly evaluate PRs.

However, it is better to look at diffs at the right level of abstraction, such as KGCL.

In OAK, the same language/data model is used for diffs as it is for applying changes!


In [66]:
goedit diff -X simpleobo:input/go-edit-modified.obo -O kgcl

ERROR:root:Cannot render: RemoveSynonym(id='x', type=None, was_generated_by=None, see_also=None, pull_request=None, creator=None, change_date=None, contributor=None, has_undo=None, old_value='nucleus', new_value=None, old_value_type=None, new_value_type=None, new_language=None, old_language=None, new_datatype=None, old_datatype=None, about_node='GO:0005634', about_node_representation=None, language=None)



__TEMPORARY ISSUE__ for now, rendering the change in KGCL syntax doesn't always work, so we instead use the data model

In [70]:
goedit diff -X simpleobo:input/go-edit-modified.obo 

[
{
  "id": "x",
  "old_value": "nucleus",
  "about_node": "GO:0005634",
  "@type": "RemoveSynonym"
}
]


In [71]:
# TODO - check why move is not there

In [75]:
goedit diff -X simpleobo:input/go-edit-modified.obo -o input/change.json

In [76]:
!cat input/change.json

[
{
  "id": "x",
  "old_value": "nucleus",
  "new_value": "nuclear compartment",
  "about_node": "GO:0005634",
  "@type": "NodeRename"
}
,
{
  "id": "x",
  "old_value": "rdfs:subClassOf",
  "new_value": "rdfs:subClassOf",
  "about_edge": {
    "subject": "GO:0005634",
    "predicate": "rdfs:subClassOf",
    "object": "GO:0043231"
  },
  "@type": "PredicateChange"
}
,
{
  "id": "x",
  "old_value": "BFO:0000050",
  "new_value": "rdfs:subClassOf",
  "about_edge": {
    "subject": "GO:0005773",
    "predicate": "BFO:0000050",
    "object": "GO:0005737"
  },
  "@type": "PredicateChange"
}
,
{
  "id": "x",
  "old_value": "rdfs:subClassOf",
  "new_value": "rdfs:subClassOf",
  "about_edge": {
    "subject": "GO:0005773",
    "predicate": "rdfs:subClassOf",
    "object": "GO:0043231"
  },
  "@type": "PredicateChange"
}
,
{
  "id": "x",
  "new_value": "nucleus",
  "about_node": "GO:0043229",
  "@type": "NewSynonym"
}
]


In [None]:
goedit apply --change-file input/change.json -o input/go-edit-sanity-check.obo