# Integration with Python
In many use-cases, an integration of Python with Ruta is desirable. Python can be used for all sort of general-purpose operations (e.g. loading/preprocessing the documents or a statistical evaluation of the results) while Ruta is used for creating/removing annotations.

For this purpose, we use [SoS Notebooks](https://vatlab.github.io/sos-docs/). They are based on the SoS kernel which is a python-based kernel that allows communication with other kernel, e.g. with the IRuta kernel. 

In an SoS Notebook, Python and Ruta code can exist side by side. On the right, you can see a dropdown menu that can be used to select the kernel of each cell.

![image-3.png](attachment:image-3.png)

Each kernel has its own workspace. To share variables across workspace, custom magic commands are used. In the following, we will explain how to
   1. Passing content of String variables from Python (SoS Kernel) to IRuta cell using `%expand`
   2. Passing a UIMA CAS object from Python (SoS Kernel) to IRuta kernel using `%cas`
   3. Pass a UIMA CAS from IRuta kernel to Python (SoS Kernel) using `%put`

# 1. Passing content of String variables from Python to Ruta using `%expand`

### Variables in a Python Cell

In [1]:
documentText = '"Patient has fevers, but no chills."'
problem_list = '"fevers|chills|nausea"'
newTypeName  = "Diagnosis"

### Using `%expand`, these values can be transferred to the IRuta kernel.
- Everything in brackets { } is replaced by the values of the python variable (similar to f-strings in Python)
- This allows for passing the document text, wordlists, generated Ruta code and other configuration parameters

In [2]:
%expand
%documentText {documentText}
DECLARE {newTypeName};
{problem_list} -> {newTypeName};
COLOR({newTypeName},"green");

##### is identical to

In [3]:
%documentText "Patient has fevers, but no chills."
DECLARE Diagnosis;
"fevers|chills|nausea" -> Diagnosis;
COLOR(Diagnosis,"green");

# 2. Passing a UIMA CAS object from Python (SoS Kernel) to IRuta using `%cas`

A Common Analysis Structure (CAS) is an object that contains the document together with all annotations and a TypeSystem. You can read more about it in the [UIMA glossary](https://uima.apache.org/d/uimaj-current/overview_and_setup.html#ugr.glossary).

While UIMA is originally based on Java, the [dkpro-cassis](https://github.com/dkpro/dkpro-cassis) module is used for handling CAS objects in Python.

### Loading the UIMA CAS in Python using dkpro-cassis

In [4]:
import cassis
with open('typesystems/MergedTypeSystem.xml', 'rb') as f:
    typesystem = cassis.load_typesystem(f)   
    
with open("input/xmi/example_en.xmi", "rb") as f:
    cas1 = cassis.load_cas_from_xmi(f, typesystem=typesystem)
print(f"Loaded a document of length {len(cas1.sofa_string)} characters with {len(cas1.select('de.averbis.textanalysis.types.health.Drug'))} drug mentionings.")

Loaded a document of length 1510 characters with 3 drug mentionings.


### Loading it into IRuta kernel with line magic `%get`, highlight all "Drug" mentionings

In [5]:
%get cas1
%displayMode RUTA_COLORING
COLOR(Drug,"lightgreen");

#### Remove all drug mentionings.

In [6]:
d:Drug{-> UNMARK(d)};

# 3. Pass a CAS from IRuta kernel to Python (SoS Kernel) using `%put`
`%put modified_cas` => Saves current CAS into the SoS-kernel (Python) variable with name `modified_cas`

In [7]:
%put modified_cas

In [8]:
print(f"The document has {len(modified_cas.sofa_string)} characters with {len(modified_cas.select('de.averbis.textanalysis.types.health.Drug'))} drug mentionings.")

The document has 1510 characters with 0 drug mentionings.
