# Exercise 6: Visibility

This notebook provides an introduction to a very powerful (and dangerous) feature in the UIMA Ruta language, its coverage-based visibility concept.

### Visibility: A motivating example

So far, we have totally ignored the existence of whitespaces. By default, whitespaces (`WS`) and line breaks are invisible and hence, we did not care about them in our rules. Suppose now that we want our rules to be more specific. Consider the following text where there is an amount of money with currency symbol with and without whitespace.

In [None]:
%%documentText 
10 € are less than 100€.

The action `ADDRETAINTYPE` can be used to activate the visibility of invisible types. In this example, we use it to activate the visibility of whitespaces. The following rule now only matches if there is **no** whitespace between the number and the currency symbol.

In [None]:
ADDRETAINTYPE(WS);
DECLARE MoneyWithoutWS;
(NUM "€") {->MoneyWithoutWS};
COLOR(MoneyWithoutWS,"lightgreen");

Similarly, we can write a rule that only matches if there is a whitespace between the number and the currency symbol.

In [None]:
ADDRETAINTYPE(WS);
DECLARE MoneyWithWS;
(NUM WS "€") {->MoneyWithWS};
COLOR(MoneyWithWS,"lightblue");

The ability to modify visibility is very powerful. But it can also lead to unwanted behavior as we will see in the following example.

### Visibility: How invisible Types can affect annotations

It is important to know that **annotations and positions are invisible if their start or end is covered by something invisible.** We will illustrate the consequences with the following example.

#### Setup

In [None]:
%resetCas

In [None]:
%%documentText
 The dog barked at the cat.
Dogs, cats and mice are mammals.
 Zander and tuna are fishes.
Peter works for Frank. 
 10€ are less than 100€.

##### Utilizing the external PlainTextAnnotator for annotating lines

You can ignore the syntax of the following block for now. Its purpose is to create a `Line` annotation for each line in the given document.

In [None]:
%displayMode CSV
%csvConfig Line

TYPESYSTEM org.apache.uima.ruta.engine.PlainTextTypeSystem;
UIMAFIT org.apache.uima.ruta.engine.PlainTextAnnotator;
EXEC(PlainTextAnnotator,{Line});

#### An unexpected behaviour?

The `Line` annotations look like sentences, however, you might have observed that some lines (1,3,5) start with whitespaces and one line (4) ends with a whitespace. Let us try to create a sentence annotation for each `Line` annotation.

In [None]:
DECLARE Sentence;
Line{-> Sentence};
COLOR(Sentence, "lightgreen");

The `Sentence` annotation was only correctly created for line number 2. This is the case because all other lines start or end with a whitespace. As we mentioned before, *annotations and positions are invisible if their start or end is covered by something invisible*. By default, whitespace and line breaks are invisible. 

Hence, we need to change the visibility to correctly add `Sentence` annotations in this use-case.

#### Changing the visibility

We can change the visibility settings so that all lines are annotated with Sentence annotations. We make all whitespaces (`WS`) visible by using `ADDRETAINTYPE(WS)`. 

In this notebook, the visibility is not saved across cells. However, it is advised to change the visibility back to the original value (using `REMOVERETAINTYPE(WS)`) because modified visibility can badly affect other rules (see the example in the beginning).

In [None]:
// Clean-up: Undo the previous sentences
s:Sentence{-> UNMARK(s)};

ADDRETAINTYPE(WS);
Line{-> Sentence};
REMOVERETAINTYPE(WS);

Next, we make capitalized words invisible and annotate all matchable sentences and see the same effect.

In [None]:
// Clean-up: Undo the previous sentences
s:Sentence{-> UNMARK(s)};

ADDRETAINTYPE(WS);
ADDFILTERTYPE(CW);
Line{-> Sentence};

### Visibility: Removing visibility of (irrelevant) annotations

In the following example, we use the visibility concept to annotate a specific span using a rule that would normally not match. This should illustrate how the visibility concept can be used to hide annotations that we want to ignore in a specific scenario. These annotations may or may not exist — our rules should always match.

Imagine that we want to annotate telephone numbers. Unfortunately, numbers may contain special characters that we want to ignore. 

In [None]:
%resetCas

In [None]:
%%documentText
My number is 0123.4324.54336.
Call me: 2342-2341- 20513.

Using `ADDFILTERTYPE`, we remove the visibility of the `PERIOD` symbol and `SPECIAL` characters. We then match on the longest sequence of numbers. This leaves us with the two telephone numbers of interest.

// TODO: Discuss why the "-PARTOF(PhoneNumber)" is needed and what would happen otherwise (this is great to discuss the "left to right" concept again)

In [None]:
%csvConfig PhoneNumber
DECLARE PhoneNumber;
ADDFILTERTYPE(PERIOD, SPECIAL);
NUM+{-PARTOF(PhoneNumber) ->PhoneNumber};

Instead of ignoring the complete Type (like the `SPECIAL` above), another useful approach is to define a helper type `Invisible`, render it invisible and assign irrelevant tokens (for example `"-"`) to that Type. 

In [None]:
// Remove previous results
pn:PhoneNumber{->UNMARK(pn)};

DECLARE Invisible;
ADDFILTERTYPE(Invisible);
"." {-> Invisible};
"-" {-> Invisible};
NUM+{-PARTOF(PhoneNumber) ->PhoneNumber};