# AVID (Analysis of Variation in Interfractional Dose) in a nutshell:

### Purpose

AVID is a **data processing tool for automated image analysis** which for example allows for 
- batch-processing of a large bunch of data
- repeatedly running a workflow to update results (e.g. when new datasets arrive)
- running a processing pipeline with as little manual user interaction as possible in order to avoid tedious, error prone manual analysis workflows
- parallel analyses of many different alternatives: "multiverse"-support

### Advantages

- **simple**: AVID is a light-weight application in the sense of compact and flexible workflows.
- **portable**: The AVID workflows are operating system independent and can be easily deployed as Python script or as executable.
- **data-driven & scalable**: The data-driven approach enables AVID to automatically scale to the data from single cases to cohorts.
- **extendable**: New tools can be easily added.

### Technical note

AVID is written in Python.



# Basic concepts

AVID is a tool that facilitates the composition of workflows for data analysis. A variety of processing steps can be combined to obtain the overall workflow.

The concrete flow of the data through the workflow and how the data is combined as input for the execution of individual processing steps is primarily determined by the properties of the data itself, rather than being defined by a fixed set of algorithms. 
This makes AVID automatically scalable to a variety of processing scenarios. It doesn't matter if a single case or a cohort is being processed, or if the image modalities, or the number of sequences or timepoints vary. This makes AVID very flexible in regard to the varying demands of medical image processing.      
<br>
This is realized by the following design:


<img src="AVIDDesign.PNG" width="681" height="342" />



### Data 
This is the dataset which is being processed in the workflow. It contains the "raw" input data to the workflow and the output data of each processing step. The data can be located for example in a data folder or a database such as a PACS system. Each data item within the dataset has a URL pointing to where it is located.

### Session 
The session is the central data repository of AVID. It contains all the relevant metadata about the *data* and the processing pipeline in the form of *artefacts*. The session can be read from and written to by the *actions*. The user can directly feed information about the initial input data to the session in the form of an xml file. It is also possible to gain insights into the current session by writing out the artefacts as items of an xml-file.

### Artefacts 
In AVID data is handled in the form of artefacts. Each artefact is stored as an item of the session and refers to a specific data entry, e.g. a data folder. It contains relavant metadata referred to as *properties* such as the patient case, time point, data format as well as the URL to where the data entry is located.<br>
The artefacts of a session can be written out in an xml-file. An exemplary artefact looks like this:<br>
```xml
<avid:artefact>
    <avid:property key="case">pat1</avid:property>
    <avid:property key="timePoint">TP1</avid:property>
    <avid:property key="actionTag">CT</avid:property>
    <avid:property key="type">result</avid:property>
    <avid:property key="format">itk</avid:property>
    <avid:property key="url">../data/img/pat1_TP1_CT.txt</avid:property>
    <avid:property key="objective">CT</avid:property>
    <avid:property key="invalid">False</avid:property>
    <avid:property key="id">bbe232b8-5740-11ec-85a6-e9d058c65a83</avid:property>
    <avid:property key="timestamp">1638869608.3330662</avid:property>
</avid:artefact>
```

### Actions
Actions make things happen in AVID. Each action corresponds to a specific independent step of the data processing pipeline. <br>
Various actions are available in AVID in the form of Python sripts and command line interface (CLI) applications. A variety of CLI application is provided in the form of the so-called *Mini-Apps*. Mini-Apps are CLI applications (implemented in C++) imported from MITK. 
Actions can do all sorts of things, including fitting, registration, resampling. The available actions implemented as Python scripts are located in the folder `AVID\avid\actions`. AVID can be straightforwardly extended to new actions if a required action is not available.<br>
Once an action has been called, the backboard will be updated and in case new data have been generated by the action, new artefacts will be added. In this context, the artefact property *actionTag* exists which describes the action that has created a data entry. If *action_1* has created new processed data, the default actionTag of the corresponding artefact will look like this:
```xml
<avid:property key="actionTag">action_1</avid:property>
```


### Workflow script
The workflow script is a Python script which orchestrates the interaction between the session and the actions. It determines which actions to activate and in which order activate them. The workflow script can be triggered by the data, e.g. it can be called when new data arrive in a database. All currently available workflow scripts can be found in the folder `AVIDWorkflows`. AVID can be easily extended to new workflow scripts.

### Selector 
Data are not explicitly handed to actions. Instead, *selectors* and *linkers* used. They allow us to specify which artefacts should be used as input to an action. This way, instead of using all currently available artefacts from the session we can choose only a selection. A selector selects artefacts based on properties. For example, we can tell the selector of `action_2` to perform the action only on artefacts with the property `ActionTag`=`action_1`. Then action_2 is performed only on artefacts which have been generated by action_1. All available selectors are located in the folder `AVID\avid\selectors`. 

### Linker
Actions can also be given more than a single input, meaning they don't have to work on individual artefacts, but can also work on pairs of artefacts (or even more). For example we could wish to perform a registration of MR and CT images, which takes both images as input. We can use a selector to select the desired images, but there is a problem. How do we clarify which artefacts belong together in a pair? Theoretically, each MR image could be paired with each CT image, across patients and time points. To get exactly what we want, we use *linkers*. All available linkers are located in the folder `AVID\avid\linkers`. 

### Splitter
*Splitters* can be used to split an artefact list by certain criteria and return the splitted artefact lists. For example an artefact list can be split by a *splitter* in such a way that all artefacts of same case are in one split. 

### Sorter
*Sorters* can be used to sort an artefact list by certain criteria and pass back the sorted list. For example an artefact list can be sorted using a *sorter* such that the artefacts in the list are sorted by the time point property of the artefacts.


### Datacrawler
To start the data-driven workflow, there needs to be an initial *bootstrap session* which contains artefacts of the input data to the processing pipeline. This is generated by a Python script typically called ```datacrawler.py```. The datacrawler generates an xml-file which can be read in by the session during the initialization of a workflow. Typically a datacrawler script can be found along with a workflow script. Usually, the datacrawler assumes a certain pre-defined naming convention of the data folders in order to set the properties.
Remark: Using the datacrawler is a design choice here. The bootstrap artefacts could also be provided to the session in a different way.


# Basic example:

In this easy example we will go over the above described basic concepts with a concrete example.<br>
<br>
Assume the following example data set found at ```examples\data```. For the purpose of this exercise each of these files consist of an empty text file. But the same concepts apply to actual patient data.

The folder ```img``` contains the data:
 - Patient1
   - pat1_TP1_MR
   - pat1_TP2_MR
   - pat1_TP1_CT
 - Patient2
   - pat2_TP1_MR1
   - pat2_TP1_MR2
   - pat2_TP2_MR1
   - pat2_TP1_CT
   - pat2_TP2_CT
 - Patient3
   - pat3_TP1_MR
   
with the following segmentations located the ```mask``` folder:
 - Patient1
   - pat1_TP1_Seg1
   - pat1_TP1_Seg2
   - pat1_TP2_Seg1
 - Patient2
   - pat2_TP1_Seg1
   - pat2_TP2_Seg1
 - Patient3
   - pat3_TP1_Seg1
 
The naming convention `[case]_[timepoint]_[actiontag]` encodes properties that are used in this example.

We will assume the bootstrap artefacts for our input data set to have already been generated. You can find them in a .xml-like format under ```examples/output/example.avid```.<br> 
If you are interested how that was done, take a look at the ```datacrawler.py```.
Comment: Maybe better to do this yourself?

Ok, now that we have the bootstrap artefacts, we are ready to run our workflow step-by-step.
First, we import all the libraries we need. Specifically, we import the actions, selectors and linkers we need for this example. <br>
We also initialize the workflow, which e.g. loads the bootstrap artefact xml-file, defines the workflow name and sets the output path for the resulting output data.

In [3]:
###############################################################################
# Imports
###############################################################################
import os
import avid.common.workflow as workflow

from avid.actions.pythonAction import PythonUnaryBatchAction, PythonBinaryBatchAction
from avid.selectors import ActionTagSelector, CaseSelector
from avid.linkers import CaseLinker, TimePointLinker, FractionLinker

In [4]:
###############################################################################
# Initialize session with existing Artefacts
###############################################################################
session =  workflow.initSession(bootstrapArtefacts=os.path.join(os.getcwd(),'output', 'example.avid'),
                                sessionPath=os.path.join(os.getcwd(),'output', 'example'),
                                name = "example_session",
                                expandPaths=True,
                                debug=True,
                                autoSave = True)

Now we want to do something with the initial artefacts. Remeber from the basic concepts: What makes things happen in AVID are *actions*.<br>  
As a start, we will define a simple action that calls a specified Python-function `my_function` for each input-artefact and writes into a text file "Result for file (name of file)".<br> 
Instead of using all artefacts of the bootstrap xml file we select only a selection of artefacts using the *Selector*. For example, let's call the function `my_function` for each data entry of patient 1 based on the `case` property.

In [5]:
def my_function(outputs, inputs, **kwargs):
    """
        Simple callable that outputs a sentence including the filename of the input
    """
    inputName = os.path.basename(inputs[0])
    
    with open(outputs[0], "w") as ofile:
        ofile.write(f"Result for file '{inputName}'")

In [6]:
pat1_selector = CaseSelector('pat1')

with session:
    PythonUnaryBatchAction(
        inputSelector=pat1_selector,
        actionTag="basic_example1",
        generateCallable=my_function,
        defaultoutputextension="txt"
    ).do()

2023-10-05 10:37:36,006 [INFO] Starting action: PythonUnaryBatchAction_basic_example1 (UID: 5af958a3-1439-4586-b45f-556d91e57c21) ...
2023-10-05 10:37:36,008 [INFO] Starting action: my_function (UID: 73a9c564-7083-4a68-bb47-6c5e4e6e640a) ...
2023-10-05 10:37:36,010 [INFO] Finished action: my_function (UID: 73a9c564-7083-4a68-bb47-6c5e4e6e640a) -> SUCCESS
2023-10-05 10:37:36,011 [INFO] Starting action: my_function (UID: 40d0fdc3-6595-4a92-ab91-73cfb60ed7a1) ...
2023-10-05 10:37:36,014 [INFO] Finished action: my_function (UID: 40d0fdc3-6595-4a92-ab91-73cfb60ed7a1) -> SUCCESS
2023-10-05 10:37:36,015 [INFO] Starting action: my_function (UID: 72fdb7fb-ef70-4330-b24f-f1fc416b4365) ...
2023-10-05 10:37:36,018 [INFO] Finished action: my_function (UID: 72fdb7fb-ef70-4330-b24f-f1fc416b4365) -> SUCCESS
2023-10-05 10:37:36,019 [INFO] Starting action: my_function (UID: a8e00334-5871-47c2-9b4a-97f49b480efd) ...
2023-10-05 10:37:36,022 [INFO] Finished action: my_function (UID: a8e00334-5871-47c2-9b4a

The results can be found in ```examples/output/example_session/basic_example1/result```. Typically, a new folder will be created for the results of each action.  
For each output-file there is now also an artefact we can use in further actions.  
To easily chain actions together, the *Action Tag* is useful. In our previous action, every resulting artefact has its action tag set to *basic_example1*. If no name is specified, by default it will just be the name of the action (here: *PythonUnaryBatchAction*). For the initial artefacts it can make sense to set the action tag based on a certain role the data entry entails, such as *CT* or *MR* here.

Let's use the output we produced in the previous step and use it in a different action.

In [6]:
def extend_content(outputs, inputs):
    """
        Simple callable that reads the content of an input file and writes a new file, including the previous content
    """
    inputName = os.path.basename(inputs[0])
    
    with open(inputs[0], "r") as ifile:
        content = ifile.read()
    with open(outputs[0], "w") as ofile:
        ofile.write(f"New content based on '{inputName}'\nOriginal content: '{content}'")

In [7]:
example1_selector = ActionTagSelector('basic_example1')

with session:
    PythonUnaryBatchAction(inputSelector=example1_selector, 
                        actionTag="basic_example2", 
                        generateCallable=extend_content,
                        defaultoutputextension="txt"
                        ).do()

2023-09-29 10:01:10,405 [INFO] Starting action: PythonUnaryBatchAction_basic_example2 (UID: 0393a45b-8801-44ae-a2c9-6665aea27bff) ...
2023-09-29 10:01:10,406 [INFO] Starting action: extend_content (UID: ff2a1554-d8b0-49d1-b835-9c74b8f6b3ef) ...
2023-09-29 10:01:10,410 [INFO] Finished action: extend_content (UID: ff2a1554-d8b0-49d1-b835-9c74b8f6b3ef) -> SUCCESS
2023-09-29 10:01:10,411 [INFO] Starting action: extend_content (UID: f179c65d-5eda-4b27-8e11-89ee86def2ef) ...
2023-09-29 10:01:10,414 [INFO] Finished action: extend_content (UID: f179c65d-5eda-4b27-8e11-89ee86def2ef) -> SUCCESS
2023-09-29 10:01:10,414 [INFO] Starting action: extend_content (UID: a4c4b4a5-a524-4e17-af01-84bd0f79bda2) ...
2023-09-29 10:01:10,418 [INFO] Finished action: extend_content (UID: a4c4b4a5-a524-4e17-af01-84bd0f79bda2) -> SUCCESS
2023-09-29 10:01:10,419 [INFO] Starting action: extend_content (UID: 963d0443-3515-4bdc-b452-5fe156bd6f3f) ...
2023-09-29 10:01:10,422 [INFO] Finished action: extend_content (UID:

**comment: Das kommt alles schon oben in der Erklärung vor. Soll dieser Absatz dann einfach weg? Oder stark gekürzt werden?**
Actions can also be given more than a single input, meaning they don't have to work on individual artefacts, but can also work on pairs of artefacts (or even more).  
For example, we could wish to pair up MR images with CT images. We can use ActionTagSelectors to select the desired images, but there is a problem. How do we clarify which artefacts belong together in a pair? Theoretically, each MR image could be paired with each CT image, across patients and time points. To get exactly what we want, there are *Linkers*. In our case, the *CaseLinker* will ensure pairs will only be created between artefacts that share the same case.

In [8]:
def pair_two_images(inputs1, inputs2, outputs):
    """
        Simple callable that outputs the names of the two inputs
    """
    text = f"Matched up two images.  Input 1: {os.path.basename(inputs1[0])}  Input 2: {os.path.basename(inputs2[0])}"
    
    with open(outputs[0], "w") as ofile:
        ofile.write(text)

In [9]:
mr_selector = ActionTagSelector('MR')
ct_selector = ActionTagSelector('CT')

with session:
    PythonBinaryBatchAction(
        inputs1Selector=mr_selector,
        inputs2Selector=ct_selector,
        inputLinker=CaseLinker(),
        actionTag="basic_example3",
        generateCallable=pair_two_images,
        defaultoutputextension="txt"
    ).do()

2023-09-29 10:01:58,598 [INFO] Starting action: PythonBinaryBatchAction_basic_example3 (UID: 4d72bb73-6005-499d-b364-cfb79a38d5e1) ...
2023-09-29 10:01:58,600 [INFO] Starting action: pair_two_images (UID: 0e4f7978-c48a-4bf7-a7c5-716789a7e8e1) ...
2023-09-29 10:01:58,605 [INFO] Finished action: pair_two_images (UID: 0e4f7978-c48a-4bf7-a7c5-716789a7e8e1) -> SUCCESS
2023-09-29 10:01:58,606 [INFO] Starting action: pair_two_images (UID: 6a5de30a-3794-4128-8e92-9b900145c1a2) ...
2023-09-29 10:01:58,609 [INFO] Finished action: pair_two_images (UID: 6a5de30a-3794-4128-8e92-9b900145c1a2) -> SUCCESS
2023-09-29 10:01:58,610 [INFO] Starting action: pair_two_images (UID: b8cd7fea-0a0f-460e-9c0a-71810c5d3bcc) ...
2023-09-29 10:01:58,612 [INFO] Finished action: pair_two_images (UID: b8cd7fea-0a0f-460e-9c0a-71810c5d3bcc) -> SUCCESS
2023-09-29 10:01:58,612 [INFO] Starting action: pair_two_images (UID: f5103194-03e9-46ef-b63a-abc25e1cf398) ...
2023-09-29 10:01:58,616 [INFO] Finished action: pair_two_ima

Looking into the output folder of *basic_example3*, we can see the pairs that were matched up.  
For pat1, there are two results: pat1_TP1_MR + pat1_TP1_CT and pat1_TP2_MR + pat1_TP1_CT . We can see, that the CT image for timepoint 1 is matched up twice, with the MR images of timepoint 1 and 2. Alternatively, if we wanted to only link pairs with the same case and timepoint, we can combine different linkers like so: `CaseLinker() + TimePointLinker()`

In [10]:
combined_linker = CaseLinker() + TimePointLinker()

with session:
    PythonBinaryBatchAction(
        inputs1Selector=mr_selector,
        inputs2Selector=ct_selector,
        inputLinker=combined_linker,
        actionTag="basic_example4",
        generateCallable=pair_two_images,
        defaultoutputextension="txt"
    ).do()

2023-09-29 10:03:05,113 [INFO] Starting action: PythonBinaryBatchAction_basic_example4 (UID: cdb152f5-108a-4eef-b3bc-f64c430aa269) ...
2023-09-29 10:03:05,115 [INFO] Starting action: pair_two_images (UID: 6fb3d589-2241-41de-841b-2986abc1989c) ...
2023-09-29 10:03:05,118 [INFO] Finished action: pair_two_images (UID: 6fb3d589-2241-41de-841b-2986abc1989c) -> SUCCESS
2023-09-29 10:03:05,119 [INFO] Starting action: pair_two_images (UID: e892eb0b-b5a1-4b8b-b72d-65e7bebecfd1) ...
2023-09-29 10:03:05,122 [INFO] Finished action: pair_two_images (UID: e892eb0b-b5a1-4b8b-b72d-65e7bebecfd1) -> SUCCESS
2023-09-29 10:03:05,123 [INFO] Starting action: pair_two_images (UID: 912631b8-7de4-41a7-893e-9af4deb03b5e) ...
2023-09-29 10:03:05,126 [INFO] Finished action: pair_two_images (UID: 912631b8-7de4-41a7-893e-9af4deb03b5e) -> SUCCESS
2023-09-29 10:03:05,126 [INFO] Starting action: pair_two_images (UID: 8111cf49-e181-45a5-9759-611a2e1e2f57) ...
2023-09-29 10:03:05,130 [INFO] Finished action: pair_two_ima