# Part 3: Radiotherapy Image Data Analysis

This notebook is part of the **Radiotherapy image data analysis using Python** Workshop at ASMIRT 2023 in Sydney, Australia.

In this part you will learn about how we can convert and analyse RT DICOM data:
- Convert a collection of DICOM data into NIfTI format
- Automatically select the data objects we want to analyse for each patient
- Compute DVHs and extract dose metrics
- Box plots of dose metrics

## Import libraries and download some sample data

First we will import some libraries that we will need and download some DICOM data which we can use
for these examples.

In [2]:
try:
    from pydicer import PyDicer
    from pydicer.utils import read_converted_data
except:
    ! pip install git+https://github.com/AustralianCancerDataNetwork/pydicer.git
    from pydicer import PyDicer
    from pydicer.utils import read_converted_data

import tempfile
import zipfile
import requests
from pathlib import Path
import seaborn as sns

dicom_zip_url = "https://unsw-my.sharepoint.com/:u:/g/personal/z3523015_ad_unsw_edu_au/EfuOALdQEHtFph3EzdpmbOUBx3-kPcLGpuQI2sML7vje-g?download=1"
dicom_directory = "dicom"

with tempfile.TemporaryDirectory() as temp_dir:
    temp_file = Path(temp_dir).joinpath("tmp.zip")
        
    data = requests.get(dicom_zip_url)
    with open(temp_file, 'wb')as out_file:
        out_file.write(data.content)
        
    with zipfile.ZipFile(temp_file, "r") as zip_ref:
        zip_ref.extractall(dicom_directory)

## Setup PyDicer tool

Create a PyDicer object, telling it where to store our data to analyse. We also add an input
folder which contains the DICOM data we want to process.

In [4]:
# Define a directory in which to store our converted data
working_directory = Path("./working")

# Create a PyDicer object (called pyd) for us to work with
pyd = PyDicer(working_directory)

# Set some configuration to turn off generating NRRD files (makes conversion run faster)
pyd.config.set_config("generate_nrrd", False)

# Add the directory containing downloaded DICOM as an input path
pyd.add_input(dicom_directory)

## Preprocess data

Next we call the preprocess function, this reads through our folder of DICOMs and tracks the
files available ready for conversion.

In [None]:
pyd.preprocess()

## Convert data

The convert function converts the DICOM data into NIfTI. Check out the `working` folder to see the
files appear as they are converted.

In [None]:
pyd.convert.convert()

## Visualise data

PyDicer can create visualisations of the images, structures and dose which it converts. Check those
out in the `working` folder.

In [None]:
pyd.visualise.visualise()

## Prepare dataset

Now that our data is converted, we are almost ready to start analysing it. But first, we want to
prepare a clean dataset since some of our data has multiple structure sets and multiple dose grids.

We can use the `read_converted_data` function from the PyDicer library to fetch a Pandas DataFrame
containing all data converted.

In [None]:
df_data = read_converted_data(working_directory)
df_data

Next we'll use the PyDicer preparation module to select the latest RTDOSE for each patient, along
with the linked datasets.

In [None]:
clean_dataset = "clean"
pyd.dataset.prepare(clean_dataset, "rt_latest_dose")

Now let's read the converted data in our clean dataset. We should now have exactly one CT,
RTSTRUCT, RTPLAN and RTDOSE per patient.

In [None]:
df_data = read_converted_data(working_directory, dataset_name=clean_dataset)
df_data

## Compute Dose Volume Histogram (DVH)

Before we can extract dose metrics, we first need to compute the DVHs on our cleaned up dataset.

In [None]:
pyd.analyse.compute_dvh(dataset_name=clean_dataset)

## Extract Dose Metrics

And then we can extract some common dose metrics.

> Tip: The `compute_dose_metrics` accepts paramters `d_point`, `v_point`, `d_cc_point` which accept lists
of values to compute dose metrics for.

In [None]:
df_dose_metrics = pyd.analyse.compute_dose_metrics(d_point=[50,95], d_cc_point=[2], dataset_name=clean_dataset)
df_dose_metrics

## Clean up dose metrics

This has produced the dose metrics for all structures. We only want to analyse a subset, so here's
some code which will filter these out and standardise the label names.

In [None]:
structure_names = {
    "PTV": ["PTV_57_Gy", "PTV57", "ptv57"],
    "CTV": ["CTV_57_Gy", "CTV_57", "ctv_57"],
    "Brainstem": [],
    "SpinalCord": ["Cord", ""],
    "Lt_Parotid": ["L_parotid"],
    "Rt_Parotid": ["R_parotid"]
}

for structure_name in structure_names:
    for name_variation in structure_names[structure_name]:
        df_dose_metrics.loc[df_dose_metrics.label==name_variation, "label"] = structure_name

df_dose_metrics = df_dose_metrics[df_dose_metrics.label.isin(structure_names)]

## Plot dose metrics

Now we can use the `seaborn` library to produce a box plot from these dose metrics!

In [None]:
metric = "D95"
sns.boxplot(data=df_dose_metrics, x="label", y=metric, order=structure_names.keys())

## Output table of statistics

And we can also output the dose metric statistics!

In [None]:
df_dose_metrics[["label"]+[metric]].groupby(["label"]).agg(["mean", "std", "min", "max"])

## Excerise

Rerun the cells above, and try computing some different dose metrics. Here are a few things to try:

- Try computing V dose metrics, add the `v_point` parameter in the
`pydicer.analyse.compute_dose_metrics` function above.

- Try plotting some different metrics.

- Try plotting some metrics for some different labels.