# Analysis of AVID output files

Any information of an AVID session can be accessed via the xml-style session files. However, the file itself is hard to read. Instead, we will look at how to analyze `.avid` files programmatically.  
As an example we will be using the `output/introduction.avid` from the basic introduction notebook. If it doesn't exist yet, simply execute all cells in the basic introduction notebook.

# AVID Diagnostics Tool

The most direct way to get an overview is by using the **AVID Diagnostics Tool**, a script you can execute from the commandline. It loads an artefact collection and allows filtering by criteria such as validity, root status, or whether artefacts are used as sources.

The call signature looks like this:  
`aviddiag <artefactfile> [options]`  
where "artefactfile" is the path to the artefact file to be analysed and "options" can be any combination of the following flags:
- `-h`, `--help` Shows the help text
- `--invalids` Selects only artefacts that are _invalid_.
- `--roots` Selects only artefacts that have _no input artefacts/sources_ (root artefacts).
- `--prime-invalids` Selects only _prime invalid artefacts_, i.e. invalid artefacts whose inputs are all valid or missing. This flag overrides --invalids.
- `--sources` Selects only artefacts that are _inputs for other artefacts_.

As output, the tool prints:
- The path of the loaded file
- The number of selected artefacts
- A tabular overview with:
    - Case
    - Action Tag
    - Timepoint
    - Fail (i.e. invalid)
    - URL


An exemplary call could look like this:  
(In a commandline with an active environment where AVID is installed, you would simply call `aviddiag output/introduction.avid --sources`)

In [1]:
# making sure we use the "aviddiag" executable of this virtual environment
import sys, pathlib
venv_bin = pathlib.Path(sys.executable).parent
aviddiag_executable = venv_bin / "aviddiag"

!{aviddiag_executable} output/introduction.avid --sources

AVID diagnostics tool

Artefacts loaded from: output/introduction.avid
Number of selected artefacts: 8

#    | Case                 | Action tag           | Timepoint  | Fail  | URL
   0 | pat1                 | CT                   |          1 |     0 | data\pat1\CT_TP1.txt
   1 | pat1                 | MR                   |          1 |     0 | data\pat1\MR_TP1.txt
   2 | pat1                 | MR                   |          2 |     0 | data\pat1\MR_TP2.txt
   3 | pat2                 | CT                   |          1 |     0 | data\pat2\CT_TP1.txt
   4 | pat2                 | MR                   |          1 |     0 | data\pat2\MR_TP1.txt
   5 | pat1                 | example1             |          1 |     0 | output\introduction.avid_session\example1\result\pat1\write_filename.427d844d-93a3-11f0-885a-847b57957cf1.txt
   6 | pat1                 | example1             |          1 |     0 | output\introduction.avid_session\example1\result\pat1\write_filename.427e98e9-93a3-11

If we are interested in more detailed information, we can manually load the artefact file in Python and analyze its contents. For this, keep on reading.

# Analysis in Python

## Loading
First, we specify the file we want to open. We will again be using the `output/introduction.avid` from the basic introduction notebook.

When loading the artefacts in this way, we can opt to use the argument `check_validity`. It determines if the loaded artefacts are checked for their validity (i.e. do the respective files exist?) and set to `invalid` if needed. This can be relevant when analyzing outputs that were produced on a different machine, e.g. when running a workflow on a high-perfomance cluster, and your current machine does not have access to the session data. In that case, you can avoid _every_ artefact being marked as invalid by setting the argument `check_validity = False`. This will read in the session as is, without checking/changing anything.
In this case, it doesn't matter since we have access to all the files used in the example session.

In [2]:
from avid.common.artefact import ArtefactCollection
import avid.common.artefact.fileHelper as fileHelper

In [3]:
filepath = "output/introduction.avid"
artefacts = ArtefactCollection(fileHelper.load_artefact_collection_from_xml(filepath, check_validity=False))

In [4]:
artefacts

ArtefactCollection(18 artefacts)

## Selecting specific artefacts
Just like in an AVID workflow script, we can use _Selectors_ to filter for specific artefacts. For example, we can get all artefacts with a specific _actionTag_. Unlike in a workflow script, where we usually hand over _Selectors_ to various actions, here we use them directly on our collection of artefacts:

In [5]:
from avid.selectors import ActionTagSelector, TimepointSelector

In [6]:
mr_selector = ActionTagSelector('MR')
mr_artefacts = mr_selector.getSelection(artefacts)

In [7]:
mr_artefacts

ArtefactCollection(3 artefacts)

These results can then be further filtered by subsequent selectors. As usual, selectors can also be combined to respect several conditions at once:

In [8]:
tp1_selector = TimepointSelector(1)
mr_artefacts_tp1 = tp1_selector.getSelection(mr_artefacts)
print(f"Further filtering of mr_artefacts: {mr_artefacts_tp1}")

mr_tp1_selector = mr_selector + tp1_selector
mr_tp1_artefacts = mr_tp1_selector.getSelection(artefacts)
print(f"Combined filtering of both selectors: {mr_tp1_artefacts}")

print(f"Both methods yield the same result: {mr_artefacts_tp1 == mr_tp1_artefacts}")

Further filtering of mr_artefacts: ArtefactCollection(2 artefacts)
Combined filtering of both selectors: ArtefactCollection(2 artefacts)
Both methods yield the same result: True


Without looking into the details within these artefact collections, we can already gather some insights. For example, this code will give an overview what percentage of artefacts for the actionTag `MR` are invalid:

In [9]:
from avid.selectors import ActionTagSelector, ValiditySelector

In [10]:
mr_selector = ActionTagSelector('MR')
mr_artefacts = mr_selector.getSelection(artefacts)
mr_count = len(mr_artefacts)

valid_selector = ValiditySelector()
valid_mr_artefacts = valid_selector.getSelection(mr_artefacts)
valid_mr_count = len(valid_mr_artefacts)

percentage = 100 * valid_mr_count / mr_count

print(f"Valid MR artefacts: {valid_mr_count}/{mr_count} ({percentage:2.0f}%)")

Valid MR artefacts: 3/3 (100%)


## Accessing artefact information
If we are interested in the actual contents of the artefacts, we can simply iterate over the artefact collection. All the metadata is accessible via keywords.

In [11]:
from avid.common.workflow.console_abstraction import Console
console = Console()

In [12]:
for artefact in mr_artefacts:
    console.print(artefact)

In [13]:
for artefact in mr_artefacts:
    print(f"{artefact['case']}:  {artefact['url']}")

pat1:  ..\data\pat1\MR_TP1.txt
pat1:  ..\data\pat1\MR_TP2.txt
pat2:  ..\data\pat2\MR_TP1.txt
