# Basic Example:

In this easy example we will go over the above described basic concepts with a concrete example. We will learn how to 
- run a datacrawler to generate bootstrap artefacts
- run actions
- chain actions together
- use a selector to apply actions onto to a selection of data
- use a linker to apply actions to a combination of data

*Remark: Before running this example please make sure the ```examples\output``` folder is empty. This is especially relevant in case you run the whole example multiple times. Additionally, be aware to only execute the cells in the given order and not to re-run cells multiple times since this might lead to errors or change the example output.*

## Example Data:

Assume the following example data set found at ```examples\data\img```:

We have data from 2 patients "pat1" and "pat2", acquired at different time points "TP1" and "TP2" with the different modalities "MR" and "CT": 
- Patient1
  - pat1_TP1_CT.txt
  - pat1_TP1_MR.txt
  - pat1_TP2_MR.txt
- Patient2
    - pat2_TP1_CT.txt
    - pat2_TP1_MR.txt

For the purpose of this exercise each of these files initially consists of empty text files and the actions will write text into them. However, the same concepts would apply when applying more complex image processing actions to actual image data.

We use the naming convention `[case]_[timepoint]_[actiontag]` which encodes the properties that are used in this example. For the initial raw data we set the actionTag to *CT* or *MR* since the modality can be interpreted as an "action" that generates the images.

## Bootstrap Artefacts and Datacrawler

As a first step we need to create the bootstrap artefacts. More specifically, in our case it is an XML file containing a collection of artefact items for each input dataset, serving as a starting point for our pipeline.

The bootstrap artefacts XML file is generated by running the script ```examples\datacrawler.py```. It takes as input
- the data root folder, in this example: ```data/```
- the output file path, in this example: ```output/bootstrap.avid```

As the name implies the datacrawler "crawls" through the specified root folder and its subfolders and creates an artefact item for each file it encounters. Some of the metadata are set based on the naming scheme of the files, e.g. in our case the naming convention `[case]_[timepoint]_[actiontag]` is used to set the *case*, the *timePoint* and the *actionTag* property. 

Typically, the datacrawling process needs to be done only one time even if the pipeline is executed multiple times. This is useful since for large datasets generating the bootstrap artefacts might be time-consuming.
In practice, when using AVID, the bootstrap artefacts might be already provided by us. Then you can skip this step.

In [53]:
###############################################################################
# Run Datacrawler
###############################################################################
!python datacrawler.py data/ output/bootstrap.avid



Found folders to scan ----------------------------------------   

Found a total of 3 folders to scan. Starting to analyse folders ...
Found folders to scan ----------------------------------------   
Found folders to scan ----------------------------------------             
Finished folders      ----------------------------------------   0% -:--:--
Found folders to scan ----------------------------------------             
Finished folders      ----------------------------------------   0% -:--:--
Found folders to scan ----------------------------------------             
Finished folders      ---------------------------------------- 100% 0:00:00

Finished crawling. Number of generated artefacts: 5


*Result*: The output file ```examples\output\bootstrap.avid``` contains artefact items for each text file found in the ```examples\data``` folder, including the subfolder ```examples\data\img```.

For example, the corresponding artefact item of pat1_TP1_CT.txt looks something like this (with a varying timestamp):<br>
```xml
  <avid:artefact>
    <avid:property key="case">pat1</avid:property>
    <avid:property key="timePoint">TP1</avid:property>
    <avid:property key="actionTag">CT</avid:property>
    <avid:property key="type">result</avid:property>
    <avid:property key="format">itk</avid:property>
    <avid:property key="url">../data/img/pat1_TP1_CT.txt</avid:property>
    <avid:property key="objective">CT</avid:property>
    <avid:property key="invalid">False</avid:property>
    <avid:property key="id">bbe232b8-5740-11ec-85a6-e9d058c65a83</avid:property>
    <avid:property key="timestamp">1638869608.3330662</avid:property>
  </avid:artefact>
```

## Required Libraries

Ok, now that we have the bootstrap artefacts, we are ready to run our workflow step-by-step.<br>
First, we import all the libraries we need. Specifically, we import the workflow script, actions, selectors and linkers we need for this example. We will dive deeper into the details of the specific actions, selectors and linkers once we encounter them again in the example workflow. <br>  

In [54]:
###############################################################################
# Imports
###############################################################################
import os
import avid.common.workflow as workflow

from avid.actions.pythonAction import PythonUnaryBatchAction, PythonBinaryBatchAction
from avid.selectors import ActionTagSelector, CaseSelector, ValiditySelector
from avid.linkers import CaseLinker, TimePointLinker

## Workflow Initialization

Next, we need to initialize the workflow session by
- loading the previously generated bootstrap artefact XML file into the workflow
- setting the session path, in this example ```\output\output.avid```. This is the place where all artefacts of the whole session, including the bootstrap artefacts and all the artefacts generated via actions, will be stored. It is set to a different file than the bootstrap artefact since it can be a good idea to separate the data crawling from the actual session workflow since then the potentially time-consuming datacrawling process has to be run only one time even if the workflow is restarted.
- setting the session name, here to "example_session"
- setting some optional flags, e.g. for debugging purposes 

In [55]:
###############################################################################
# Initialize session with existing Artefacts
###############################################################################
session =  workflow.initSession(bootstrapArtefacts=os.path.join(os.getcwd(),'output', 'bootstrap.avid'),
                                sessionPath=os.path.join(os.getcwd(),'output', 'output.avid'),
                                name = "example_session",
                                expandPaths=True,
                                debug=True,
                                autoSave = True)

## Example 1
Now we want to do something with the initial artefacts. Remeber from the basic concepts: What makes things happen in AVID are *actions*.
As a start, we will define a simple self-written action that calls a specified Python-function `write_filename` for each input artefact which writes into an empty input text file "Result for file is (name of file)". <br> 

In [56]:
def write_filename(outputs, inputs, **kwargs):
    """
        Simple callable that outputs a sentence including the filename of the input
    """
    inputName = os.path.basename(inputs[0])
    
    with open(outputs[0], "w") as ofile:
        ofile.write(f"Result for file is '{inputName}'")

Now we want to apply this action to our artefacts. We do this by using the ```PythonUnaryBatchAction```. We use the *Unary* batch action because we assume only one input artefact at a time will be passed to the script.<br> 
*Spoiler*: Later on we will find that also more than one input artefact can be passed to an action e.g. by using the ```PythonBinaryBatchAction```.<br>

The ```PythonUnaryBatchAction``` requires as input
- *inputSelector*: We need to specify which artefacts we want to apply the action onto. For that the *selectors* comes into play. Let's start with the simple case that we want to apply the action to all files that are valid. Let's recall what an artefact looks like and remember that one of the properties is the *invalid* property e.g.
    ```xml
      <avid:property key="invalid">False</avid:property>
    ``` 
    We use the already pre-defined ```ValiditySelector``` which only selects the artefacts for which the invalid property is false.
- *generateCallable*: The action we want to use, in our case ```write_filename```.
- *actionTag*: The name of the *actionTag* of all artefacts produced by this action, in this example *example1*. If no name is specified, by default it will just be the name of the action (here: *PythonUnaryBatchAction*)
- *defaultoutputextension*: The file extension of the files produced by this action. In this example, the resulting data will be stored as "txt" files.  

In [57]:
allValid_selector = ValiditySelector()


with session:
    PythonUnaryBatchAction(
        inputSelector=allValid_selector,
        generateCallable=write_filename,
        actionTag="example1",
        defaultoutputextension="txt"
    ).do()


*Result*: Let's have a look at the resulting output in the ```examples\output``` folder:
- The file ```examples\output\output.avid```has been created. As we have specified in the initialization this is the session path. It is another XML file containing artefact items, just as ```bootstrap.avid```. Along with the bootstrap artefact items it contains additionally all the artefacts generated from the action run in example 1, which all have the actionTag *example1*. Here is an example of a newly created artefact item:
```xml
  <avid:artefact>
    <avid:property key="case">pat1</avid:property>
    <avid:property key="timePoint">TP1</avid:property>
    <avid:property key="actionTag">example1</avid:property>
    <avid:property key="type">result</avid:property>
    <avid:property key="format">itk</avid:property>
    <avid:property key="url">example_session/example1/result/pat1/write_filename.cdcadf1a-9b68-11ef-a1a2-f894c218a9f1.txt</avid:property>
    <avid:property key="objective">CT</avid:property>
    <avid:property key="invalid">False</avid:property>
    <avid:property key="input_ids">
      <avid:input_id key="inputs">b3694134-9b68-11ef-b3e7-f894c218a9f1</avid:input_id>
    </avid:property>
    <avid:property key="action_class">PythonAction</avid:property>
    <avid:property key="action_instance_uid">783d5956-d137-4399-9384-ec13b4fa25dc</avid:property>
    <avid:property key="id">cdcadf1a-9b68-11ef-a1a2-f894c218a9f1</avid:property>
    <avid:property key="timestamp">1730805990.330345</avid:property>
    <avid:property key="execution_duration">0.001995563507080078</avid:property>
  </avid:artefact>
  ```
Compared to the bootstrap artefacts the properties list is extended to additional properties describing the action execution such as the *input ids* or the *execution_duration*. 

- A new folder "example_session" has been created (recall that we have set "example_session" as "name" is the initialization of the session). Here, the resulting files created by the action are stored. For each applied action a new subfolder will be created with the name of the actionTag. Therefore, the resulting files of "example1" can be found in the folder ```output\example_session\example1```. Specifically, they are text files for which the original name of the input files is written into the file as text.

## Example 2:
Let's use the output we produced in the previous example and apply a different action. For that, we define another self-written action that calls the Python-function `extend_content` for each input artefact which reads the content of an input file and writes a new text file also including the previous content.

In order to chain actions together, we use the *ActionTagSelector*. By setting the ActiontagSelector to "example1", only artefacts generated in the previous example will be considered for the next action.

This time we also wish to select only a smaller portion of these artefacts using an additional *selector*: let's call the function `extend_content` only for the entries of patient 1. For this we use the CaseSelector which is based on the `case` property and set it to "pat1".<br>
We can logically combine these two selectors by using the "+"-operator. Now only data with the case property *'pat1'* AND the ActionTag *'example1'* will be processed.

In [58]:
def extend_content(outputs, inputs):
    """
        Simple callable that reads the content of an input file and writes a new file, including the previous content
    """
    inputName = os.path.basename(inputs[0])
    
    with open(inputs[0], "r") as ifile:
        content = ifile.read()
    with open(outputs[0], "w") as ofile:
        ofile.write(f"New content based on '{inputName}'\nOriginal content: '{content}'")

In [59]:
pat1_selector = CaseSelector('pat1') + ActionTagSelector('example1')

with session:
    PythonUnaryBatchAction(
        inputSelector=pat1_selector,
        actionTag="example2",
        generateCallable=extend_content,
        defaultoutputextension="txt"
    ).do()

*Result*: The resulting text files can be found in ```examples/output/example_session/example2```. We find that text files have only been generated for patient 1, not for patient 2. 

When looking at the ```examples/output/output.avid```we find that the content has been extended by the artefact items with the actionTag "example2".

## Example 3
Actions can also be given more than a single input, meaning they don't have to work on individual artefacts, but can also work on pairs of artefacts (or even more). In case of two inputs we can use the ```PythonBinaryBatchAction```.

In this example, we want to pair up MR images and CT images of the same patient for both, "pat1" and "pat2", using the function ```pair_two_images```, which writes both filenames as text into a new text file. We use the ActionTagSelectors to select the ActionTags 'MR' and 'CT' from the bootstrap artefacts. But there is a problem. How do we clarify which artefacts belong together in a pair? Theoretically, each MR image could be paired with each CT image, across patients and time points. To get exactly what we want, there are *Linkers*. In our case, the *CaseLinker* will ensure pairs will only be created between artefacts that share the same case.

In [60]:
def pair_two_images(inputs1, inputs2, outputs):
    """
        Simple callable that outputs the names of the two inputs
    """
    text = f"Matched up two images.  Input 1: {os.path.basename(inputs1[0])}  Input 2: {os.path.basename(inputs2[0])}"
    
    with open(outputs[0], "w") as ofile:
        ofile.write(text)

In [61]:
mr_selector = ActionTagSelector('MR')
ct_selector = ActionTagSelector('CT')


with session:
    PythonBinaryBatchAction(
        inputs1Selector=mr_selector,
        inputs2Selector=ct_selector,
        inputLinker=CaseLinker(),
        actionTag="example3",
        generateCallable=pair_two_images,
        defaultoutputextension="txt"
    ).do()

*Result*: Looking into the output folder ```examples/output/example_session/example3```, we can see the pairs that were matched up. 

For pat2, there is one MR and one CT image both at TP1, which were matched as: 
- ```Input 1: pat2_TP1_MR.txt  Input 2: pat2_TP1_CT.txt.```
All good here.

For pat1, there are two results: 
- ```Input 1: pat1_TP1_MR.txt  Input 2: pat1_TP1_CT.txt```
- ```Input 1: pat1_TP2_MR.txt  Input 2: pat1_TP1_CT.txt```
  
We can see, that the CT image for timepoint 1 is matched up twice, with the MR images of timepoint 1 and 2. This might not be what we want and instead it could be more meaningful to match only data from the same patient of the same timepoint.

## Example 4
Let's see in this example how we can achieve to only match images of the same patient acquired at the same timepoint.
To achieve that we can combine different linkers like again using the '+'-operator. In this example it looks like this: `CaseLinker() + TimePointLinker()`

In [62]:
combined_linker = CaseLinker() + TimePointLinker()

with session:
    PythonBinaryBatchAction(
        inputs1Selector=mr_selector,
        inputs2Selector=ct_selector,
        inputLinker=combined_linker,
        actionTag="example4",
        generateCallable=pair_two_images,
        defaultoutputextension="txt"
    ).do()

*Result*: We can find the results in the folder ```examples/output/example_session/example4```. When looking into the text files we now find the following matches:

For pat2 we get the same results a in example 3:
- ```Input 1: pat2_TP1_MR.txt  Input 2: pat2_TP1_CT.txt.```

For pat1, there is now also only 1 match, based on case AND timepoint: 
- ```Input 1: pat1_TP1_MR.txt  Input 2: pat1_TP1_CT.txt```
 

# Example 5: 

In all previous examples, we have run one action after another in separate steps. When actions are consecutively executed like that, the output can be overwhelming at times. 

Another more elegant option is to run all the actions in an automated way by using the ```run_batches``` command. This will perform the same steps and produce the same results as above, however, this time with a more user-friendly output.

In this example we are creating a new workflow ```session2``` to run all the previously described actions in an automated way by usin the ```run_batches```command. 

Technical note: The python package rich is required for this option.

In [52]:
###############################################################################
# Initialize session with existing Artefacts
###############################################################################
session2 =  workflow.initSession(bootstrapArtefacts=os.path.join(os.getcwd(),'output', 'bootstrap.avid'),
                                sessionPath=os.path.join(os.getcwd(),'output', 'output2.avid'),
                                name = "example_session2",
                                expandPaths=True,
                                debug=True,
                                autoSave = True)



In [64]:
with session2:
    PythonUnaryBatchAction(
        inputSelector=allValid_selector,
        generateCallable=write_filename,
        actionTag="example1",
        defaultoutputextension="txt"
    )
    PythonUnaryBatchAction(
        inputSelector=pat1_selector,
        actionTag="example2",
        generateCallable=extend_content,
        defaultoutputextension="txt"
    )
    PythonBinaryBatchAction(
        inputs1Selector=mr_selector,
        inputs2Selector=ct_selector,
        inputLinker=CaseLinker(),
        actionTag="example3",
        generateCallable=pair_two_images,
        defaultoutputextension="txt"
    )
    PythonBinaryBatchAction(
        inputs1Selector=mr_selector,
        inputs2Selector=ct_selector,
        inputLinker=combined_linker,
        actionTag="example4",
        generateCallable=pair_two_images,
        defaultoutputextension="txt"
    )

    session2.run_batches()

pydev debugger: Unable to find real location for: <frozen codecs>
