# Exercise 1: Basic Annotations

This exercise provides an introduction to the basic ruta types and how simple annotations are created.

#### Defining the document text

First, we define some input text for the following examples. In UIMA, this document text is also called **S**ubject **of** **A**nalysis (**SOFA**).

In [None]:
%%documentText
The dog barked at the cat.
Dogs, cats and mice are mammals.
Zander and tuna are fishes.

### Types

A central component in UIMA is the `TypeSystem`. A TypeSystem contains a list of `Types`. Each Type has a distinct name and optionally a list of features. This determines how the information is stored.

#### Ruta Basic Types

Ruta provides some initial annotations that are automatically generated for each document. Important Ruta Basic Types are:
* `ANY`: Any single Token, e.g. “hello” or “123”
* `W`: Any word, e.g. “hello”
* `NUM`: Any number, e.g. “123”
* `SPECIAL`: Any special character, e.g. “-”
* `COMMA` (,) `COLON` (:) `SEMICOLON` (;) `PERIOD` (.) `EXCLAMATION` (!) `QUESTION` (?)

#### Declaring a new Type

We can also declare new types using the `DECLARE` command. In the following, we define a new type `Animal`. With that, we can create annotations that will contain information about mentionings of animals in the text.

In [None]:
DECLARE Animal;

// Highlight Animal annotation in the following output
COLOR(Animal, "lightgreen");

### Creating annotations
In the following, we present different options that can be used to create new annotations of type Animal. 

#### Option 1: Direct string matching
The following line creates a new annotation of type `Animal` on all occurrences of "dog" in the document. Please note that this literal string matching may be inefficient if it is used repeatedly and for large documents.

In [None]:
"dog" {-> Animal};

#### Option 2: General approach using a condition-action block

While the simple string matching in option 1 may be useful for quickly annotating simple keywords, Ruta provides a more powerful logic for complex annotations. The following line illustrates the most basic form of a condition-action.

In [None]:
W{REGEXP("Dogs|cats") -> Animal};

**Explanation**: The rule starts with the Ruta basic type `W` that iterates over all words in the document. For each word, it is checked whether the condition `REGEXP("Dogs|cats")` is satisfied. This condition is a regular expression that matches if the word is "Dogs" or "cats" (case sensitive). If the condition is satisfied, then the action is executed. In that case, the action is to create a new annotation of type `Animal`. You will see more complex conditions and actions in Exercise 4.

*Hint*: Please note that "dog" is still highlighted as the annotations are kept across cells.

An example for a slightly different action block is given below. It matches on any word (W) and references it with the label "w". Then it checks whether its covered text (ct) is "mice" in the condition, and if yes, then it creates a new Animal annotation.

In [None]:
w:W{w.ct == "mice" -> Animal};

#### Option 3: Using a wordlist

If many terms should be annotated, it is useful to place the words in a wordlist. The following snippet shows how we can annotate mentions of fishes by using a wordlist `fishes.txt`, a simple external dictionary file.

In [None]:
WORDLIST fishList = "resources/fishes.txt";
// Perform lookup for fishes and annotate them with the type Animal
// The third parameter specifies whether the lookup should be case insensitive.
MARKFAST(Animal, fishList, true);