# Exercise 7: Special language elements

This notebook introduces or discusses some special language elements. These elements have maybe already been used in the previous notebooks, but get a separate example here.

### Regex rules

Simple regex rules are a kind of special rules in UIMA Ruta. These rules do not build upon a sequence of rule elements with matching condition but on a single regular expression. This regular expression is applied in the document or the current window (if it is restricted). This enables two interesting use cases for these rules: adding some annotations for fast prototyping and creating smaller annotations that could not be matched with normal rules.

In [None]:
%documentText "This is the first line."
%displayMode DYNAMIC_HTML
%dynamicHtmlAllowedTypes In1 In2 In3 Span Start
DECLARE In1, In2, In3;

// this won't match because any matching condition relies on existing annotations and there is no annotation on 
// "in", only on "line"
"in"{-> In1};

// this will match within a token since regex rules do not care about existing annotations
"in"-> In2;

// this will match because a basis annoation In2 is now present
"in"{-> In3};

DECLARE Span, Start;

// complex regular expression: create Span for the complete match and Start for the first group
"(is).*first" -> Span, 1 = Start;

### Optional matching condition `_`

The optional matching condition _ allows to specify rule elements that are completely optional. They are fulfilled even if they are not able to match at all.

In [None]:
%resetCas
%documentText "This is the first line."
%displayMode RUTA_COLORING
DECLARE NotBeforeNum1, NotBeforeNum2;

// this rule requires some position for the second rule element
ANY{->NotBeforeNum1} ANY{-PARTOF(NUM)};
// this rule is also able to match on the last period
ANY{->NotBeforeNum2} _{-PARTOF(NUM)};

COLOR(NotBeforeNum1, "pink");
COLOR(NotBeforeNum2, "yellow");

### The anchor `@`

In [None]:
%resetCas

In [None]:
%%documentText
This is the first line.
This is the second line.
This is the third line.
This is the fourth line.
This is the fifth line.

The anchor `@` specifies the start point of the matching process of a rule. Normally, a rule matches from left to right starting with the first rule element and continuing with the second and so on. The anchor can be used to write more efficient rules by selecting a more suitable rule element.

In [None]:
DECLARE WPeriod1, WPeriod2;

// a normal rule: look at each W, then check if there is a period afterwards
// this results in 25 matches in the W index and 25 checks on the next position.
W{-> WPeriod1} PERIOD;

// we start at the period and check on the W before it
// this results in 5 matches in the PERIOD index and 5 checks on the previous postion.
W{-> WPeriod2} @PERIOD;

COLOR(WPeriod2, "pink");

The selection of the anchor can also change the outcome as the anchoring iterator initially defined the amount of rule matches, and the actions are applied for each rule match.

In [None]:
%displayMode CSV
%csvConfig Span1

DECLARE Span1;
W+{-> Span1} PERIOD;

In [None]:
%displayMode CSV
%csvConfig Span2

DECLARE Span2;
W+{-> Span2} @PERIOD;

### The wildcard element `#`

The wildcard "#" can be described as "I do not care". This matching condition of a rule element does not match itself but considers the next rule element and assigns its matching span according to a valid match of the next element. A similar behavior can be specified using a general annotation type like `ANY` in combination with a reluctant quantifier. However, the wildcard can span long distances more efficiently since it does not explicitly match on any annotations like ANY.



In [None]:
%displayMode RUTA_COLORING

DECLARE Span3, Span4;

// just start matching up to the first period
#{-> Span3} PERIOD;

// match from the word "fourth" to the next period before
(PERIOD # @"fourth"){-> Span4};


COLOR(Span3, "lightgreen");
COLOR(Span4, "red");

### Conjunctive and disjunctive composed rule elements using `&` and `|`

In the following example, we take a look at conjunctive and disjunctive composed rule elements. These rule element should be used with care or avoided if possible since they produce many matches internally.

In [None]:
%resetCas

In [None]:
%%documentText
This is the first line.
This is the second line.
This is the third line.
This is the fourth line.
This is the fifth line.

In [None]:
DECLARE Span;

PERIOD (# PERIOD){-> Span};

DECLARE Conjunctive, Disjunctive;

// a position that is any token with the covered text "is" AND a word that is part of a Span annotation
(ANY{REGEXP("is")} & W{PARTOF(Span)}){-> Conjunctive};

// a word that is not part of a Span annoation OR the the word "fourth"
(W{-PARTOF(Span)} | "fourth"){-> Disjunctive};

COLOR(Conjunctive, "lightblue");
COLOR(Disjunctive, "yellow");

### Conjunction of rules with the operator `%`

The next language element is the conjunction of rules with the operator `%`. Here, two rules are combined without sequential constraints. The combination matches if both linked rules are able to match independently of each other. 

In [None]:
DECLARE ConjunctRules;

Span{-> ConjunctRules}<-{
  // a Conjunctive preceeded by a SW 
  // and possibly somewhere else within the span
  // a Disjunctive followed by an ANY and a PERIOD
  SW Conjunctive % Disjunctive ANY PERIOD;  
};

COLOR(ConjunctRules, "lightgreen");

### Advanced functionality of type expression

Type expression are often used to refer to annotations within a certain matching context. Type expression can also be extended with additional conditions in order to restrict these annotations. This extension can be useful if no new types and annotations should be created to represent the restriction. The rule in the following example assigned all annotations of the type `W` to the feature targets that fulfill two conditions. They either match the regular expression or they are a capitalized word.

In [None]:
%resetCas
%documentText "This is a simple text."
%displayMode DYNAMIC_HTML
%dynamicHtmlAllowedTypes Structure

DECLARE Structure (FSArray targets);

Document{-> CREATE(Structure, "targets" = W{OR(REGEXP("simpl.*"), IS(CW))})};


### Assigning annotations to a feature structure array

UIMA Ruta provides some support for lists and arrays as the previous example already has shown by assigning annotations to a feature structure array. There is also some restricted functionality to access the annotations of a feature structure array separately. In the next example, we use an indexed expression to match on the annotations within the array. 

In [None]:
DECLARE First, Second;

%dynamicHtmlAllowedTypes Structure First Second

Structure.targets[0]{-> First};
Structure.targets[1]{-> Second};