# Welcome to IRuta Notebooks!

An IRuta Notebook is a Notebook that allows execute [UIMA Ruta](http://uima.apache.org/ruta.html) code. Ruta is a rule-based language for quickly adding and removing annotations to a document. In that context, an annotation is simply an annotated text span which holds information about the position (begin/end) and other features (e.g. a count, a category, etc.).

### Why using IRuta Notebooks? 
- ✅ Easy to set up
- ✅ Allow for a rapid iterative development of Ruta rules
- ✅ Full support of Ruta functionality (including loading of resources, typesystem and reading input files as single CAS or in batch mode)
- ✅ Extensive IDE support
    - Syntax Highlighting is automatically working.
    - *Hint 1*: Autocompletion can be trigged by start typing and pressing the `Tab` key.
    - *Hint 2*: Documentation for functions can be accessed by pressing `Shift` + `Tab`.
- ✅ Can be combined with markdown to create a self-explanatory scripts
- ✅ Can easily be combined with Python in the same Notebook

---
# A first example: Annotating names

IRuta Notebooks have two major components:
1. When a code cell is executed in this notebook, then its content is interpret by an IRuta kernel.
2. On top of the IRuta Kernel, there are so-called *magic commands* that allow interacting with the kernel.

#### 1. (CellMagic) Settings the document text

In [1]:
%%documentText 
Anna likes to play football.
Bryan eats a sandwich.

#### 2. (Ruta) Delcaring a new type in Ruta

In [2]:
 DECLARE Name;

#### 3. (Ruta) Annotating names

In [3]:
"Anna" -> Name;

#### 4. (Ruta) The annotation is not yet visible. We need to highlight it using `COLOR()`

In [4]:
COLOR(Name,"lightgreen");

#### 5. (Ruta) Annotate another name.

In [5]:
"Bryan" -> Name;

# A second example: Detecting negations for medical problems

#### 1. (CellMagic) Set document text

In [6]:
%%documentText 
Patient has severe headache and nausea. 
He denies fevers, chills or sweats.

#### 2. (Ruta) Detecting negations for medical problems

In [7]:
DECLARE Problem, NoProblem;
DECLARE NegationCue;
DECLARE Conj, Sentence, ProblemEnumeration;

// Mock annotations
"fevers?|chills?|sweats?|headache|nausea" -> Problem;
"deny|denies" -> NegationCue;
"and|or" -> Conj;

// Detect enumerations
ANY+{-PARTOF(Sentence),-PARTOF(PERIOD)-> Sentence};
(Problem (COMMA Problem)* Conj Problem){-> ProblemEnumeration};

// Find negations and apply them to enumeration
Sentence{CONTAINS(Problem)} -> {
    cue:NegationCue # ProblemEnumeration 
        ->{p:@Problem{-> UNMARK(p), NoProblem};};
    };

// Coloring specific annotations
COLOR(Problem, "red");
COLOR(NoProblem, "lightgreen");

---

# Advanced topics

## Topic 1: Important magics
So-called *magics* are used for communicating with the IRuta kernel. They allow settings a variety of parameters or triggering specific actions. Please see the list of the most important magics.
- `%documentText [text]`: Sets the text of the Cas, optionally also takes a language parameters, i.e. ```%documentText en "my great covered text".```
- `%loadCas [path]`: Loads a Cas from a path
- `%inputDir [directory]`: Sets the input directory
- `%outputDir [directory]`: Sets the output directory
- `%typeSystemDir [directory]`: Sets the path of the descriptors (e.g. for loading external TypeSystems)
- `%resourceDir [directory]`: Sets the path of the ressources (e.g. wordlists)
- `%scriptPaths [directory]`: Sets the path for loading auxillary Ruta scripts
- `%displayMode [NONE|RUTA_COLORING|DYNAMIC_HTML|CSV]`: Determines the output format (see below for more information)
- `%configParams [--key1=value1] [--key2=value2] ... `: Can be used to set parameters of the RutaEngine, e.g. whether strict imports are activated
- `%writerules [directory]`: Writes the current cell to the specified directory (Ruta compatible, i.e. line magics are commented out).

Please note that autocompletion also works on these magics.

## Topic 2: Importing documents into the Common Analysis Structure (CAS)
There are 3 options to load data into the Cas.

#### Option 1: Using `documentText` as cell or line magic

In [8]:
%%documentText en
This is my
great example text.

#### Option 2: Using `%loadCas` to load a single CAS from disk.

In [9]:
%loadCas input/xmi/short_example.xmi

#### Option 3: Setting an input directory. This can be used for processing multiple files in batch mode.
- Hint: `%displayMode NONE` suppresses the output.

In [10]:
%displayMode NONE
%inputDir input/xmi

Processed 3/3 files. (took 0s)


## Topic 3: Files & Resource Loading
- Files and resources can be loaded from disk
- The paths can be adapted using the magic commands above

#### The example shows how to:
   - load an existing TypeSystem (`description/MergedTypeSystem.xml`) 
   - loads the Cas (`input/xmi/example_en.xmi`)
   - load an external wordlist (`wordlists/section_header.txt`) for marking section headers.

In [11]:
%displayMode RUTA_COLORING
%typeSystemDir typesystems/
%resourceDir wordlists/
%loadCas input/xmi/example_en.xmi
%outputDir output/
TYPESYSTEM MergedTypeSystem;
DECLARE DiagnosisInWrongSection;

// Mark all words from section_header.txt with the annotation type SectionHeader
DECLARE SectionHeader;
WORDLIST sectionHeader = 'section_header.txt';
MARKFAST(SectionHeader, sectionHeader);

Document{-> COLOR(Diagnosis, "yellow")};
Document{-> COLOR(SectionHeader, "green")};

## Topic 4: DisplayMode/ Cas Viewer
- The easiest way of highlighting annotations is by using the Ruta function `COLOR()` in combination with `%displayMode RUTA_COLORING` (default) as demonstrated in the example above
- A more advanced HTML/JS/CSS CasViewer can be activated using `%displayMode DYNAMIC_HTML`

In [12]:
%loadCas input/xmi/example_en.xmi
%displayMode DYNAMIC_HTML