# command: generate-synonyms

This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).

This notebook provides examples for the `generate-synonyms` command, which is used to create new synonyms for a set
of terms in an ontology, according to a set of rules.

The command has two main modes of operation:

1. generate new synonyms and apply them, creating a new ontology
2. creating new synonyms and outputting them as a set of [KGCL](https://w3id.org/kgcl)  change directives

The advantage of using 2 is that it allows for more fine-grained control over the changes, and allows for manual inspection before applying the changes. The KGCL output can be fed back (using the [apply](Apply.html) command), or if the ontology allows it via an issue, with ontobot applying the changes.

## Help Option

You can get help on any OAK command using `--help`

In [1]:
!runoak generate-synonyms --help

Usage: runoak generate-synonyms [OPTIONS] [TERMS]...

  Generate synonyms based on a set of synonymizer rules.

  If the `--apply-patch` flag is set, the output will be an ontology file with
  the changes applied. Pass the `--patch` argument to lso get the patch file
  in KGCL format.

  Example:

      runoak -i foo.obo generate-synonyms -R foo_rules.yaml --patch patch.kgcl
      --apply-patch -o foo_syn.obo

  If the `apply-patch` flag is NOT set then the main input will be KGCL
  commands

  Example:

      runoak -i foo.obo generate-synonyms -R foo_rules.yaml -o changes.kgcl

  see https://github.com/INCATools/kgcl.

Options:
  -R, --rules-file TEXT           path to rules file. Conforms to
                                  rules_datamodel.        e.g.
                                  https://github.com/INCATools/ontology-
                                  access-
                                  kit/blob/main/tests/input/matcher_rules.yaml
           

## Synonymizer rules

The rules YAML has a list of rules following the [Synonymizer](https://incatools.github.io/ontology-access-kit/datamodels/mapping-rules/Synonymizer.html) class.

An example rule file for GO, called `go-synonymizer-rules.yaml` is shown below:

```yaml
rules:
  - description: activity
    match: "(.*) activity"
    match_scope: "*"
    replacement: "\\1"
    qualifier: exact
```

This will match any term that ends in "activity" and remove that suffix, creating a new exact synonym.

Note that while it's possible to restrict the scope to only apply to some terms in the rule, this is often easier to do when running the command using [OAK queries](https://incatools.github.io/ontology-access-kit/howtos/use-oak-expression-language.html) on the command line. 

## Test ontology

The most likely scenario is to run this on edit files, although this isn't necessary if you wish to apply the changes. But for test purposes we'll use the GO edit file:


In [7]:
!curl -L -s https://github.com/geneontology/go-ontology/raw/master/src/ontology/go-edit.obo > input/go-edit.obo

In [2]:
!mkdir -p output

Note that the go edit file is in *obo* format. A number of ontologies like GO, Uberon, and Mondo use obo format as the edit format due to the fact obo was designed to make human-readable diffs.

## Generate changes from a rule file

We will run this over all non-obsolete terms in the ontology. The OAK query expression `.non_obsolete` is used to generate a list of all terms that are not obsolete.

In [13]:
!runoak -i simpleobo:input/go-edit.obo generate-synonyms -R  input/go-synonym-rules.yaml -o output/changes.kgcl .non_obsolete

The changes were placed in a file called `output/changes.kgcl`. Let's take a look at the first few lines:

In [14]:
!head -25 output/changes.kgcl

create exact synonym 'high-affinity zinc transmembrane transporter' for GO:0000006
create exact synonym 'high affinity zinc uptake transmembrane transporter' for GO:0000006
create exact synonym 'high-affinity zinc uptake transmembrane transporter' for GO:0000006
create exact synonym 'low-affinity zinc ion transmembrane transporter' for GO:0000007
create exact synonym 'alpha-1,6-mannosyltransferase' for GO:0000009
create exact synonym '1,6-alpha-mannosyltransferase' for GO:0000009
create exact synonym 'trans-hexaprenyltranstransferase' for GO:0000010
create exact synonym 'all-trans-heptaprenyl-diphosphate synthase' for GO:0000010
create exact synonym 'HepPP synthase' for GO:0000010
create exact synonym 'heptaprenyl diphosphate synthase' for GO:0000010
create exact synonym 'heptaprenyl pyrophosphate synthase' for GO:0000010
create exact synonym 'heptaprenyl pyrophosphate synthetase' for GO:0000010
create exact synonym 'single-stranded DNA endodeoxyribonuclease' for GO:0000014

We can then apply these as a separate step (we have the option of editing first)

In [16]:
!runoak -i simpleobo:input/go-edit.obo apply --changes-input output/changes.kgcl -o output/go-edit-modified.obo

lark.exceptions.UnexpectedCharacters: No terminal matches '-' in the current parser context, at line 1 col 26

create exact synonym '3\'-5\'-RNA exonuclease' for GO:0000175
                         ^
Expected one of: 
	* _WS
	* AT



In [17]:
!robot kgcl:apply --input input/go-edit.obo --kgcl output/changes.kgcl --output output/go-edit-modified.obo

org.semanticweb.owlapi.model.UnloadableImportException: Could not load imported ontology: <http://purl.obolibrary.org/obo/go/imports/go-pattern-conformance.ttl> Cause: http://current.geneontology.org/ontology/imports/go-pattern-conformance.ttl
Use the -vvv option to show the stack trace.
Use the --help option to see usage information.


## Generating synonyms and applying them in one step

We don't have to create the intermediate KGCL file. We can generate and apply in one step:

In [11]:
!runoak -i simpleobo:input/go-edit.obo generate-synonyms -R  input/go-synonym-rules.yaml --apply-patch -o output/go-edit-modified.obo --patch output/changes.kgcl .non_obsolete