# Example Notebook to run PyDicer

This notebook provides a basic example to run the PyDicer pipeline using some test data.

In [1]:
import sys
sys.path.insert(0, "..")

from pathlib import Path

from pydicer import PyDicer

from pydicer.input.test import TestInput

## Setup working directory

First we'll create a directory for our project. Change the `directory` location to a folder on your
system where you'd like PyDicer to work with this data.

In [2]:
directory = Path("./data")

## Create a PyDicer object

The PyDicer class provides all functionlity to run the pipeline and work with the data stored and
converted in your project directory

In [3]:
pydicer = PyDicer(directory)

## Fetch some data

A TestInput class is provided in pydicer to download some sample data to work with. Several other
input classes exist if you'd like to retrieve DICOM data for conversion from somewhere else, [see 
the docs for information on how these work](https://australiancancerdatanetwork.github.io/pydicer/html/input.html).

In [5]:
dicom_directory = directory.joinpath("dicom")
test_input = TestInput(dicom_directory)
test_input.fetch_data()

# Add the input DICOM location to the pydicer object
pydicer.add_input(dicom_directory)

Directory not empty, won't download files


## Run the pipeline

The function runs the entire PyDicer pipeline on the test DICOM data. This includes:
- Preprocessing the DICOM data (data which can't be handled or is corrupt will be placed in Quarantine)
- Convert the data to Nifti format (see the output in the `data` directory)
- Visualise the data (png files will be placed alongside the converted Nifti files)
- Compute Radiomics features (Results are stored in a csv alongside the converted structures)
- Compute Dose Volume Histograms (results are stored alongside converted dose data)

> Note that the entire Pipeline can be quite time consuming to run. Depending on your project's
> dataset you will likely want to run only portions of the pipeline with finer control over each
> step. For this reason we only run the pipeline for one patient here as a demonstration.

In [6]:
#ydicer.run_pipeline(patient="HNSCC-01-0019")
pydicer.run_pipeline()

2022-08-15 14:16:56.794 | DEBUG    | platipy.dicom.io.rtstruct_to_nifti:transform_point_set_from_dicom_struct:134 - Converting structure 0 with name: Marked_Iso
2022-08-15 14:16:56.795 | DEBUG    | platipy.dicom.io.rtstruct_to_nifti:transform_point_set_from_dicom_struct:152 - This is not a closed planar structure, skipping.
2022-08-15 14:16:56.796 | DEBUG    | platipy.dicom.io.rtstruct_to_nifti:transform_point_set_from_dicom_struct:134 - Converting structure 1 with name: Calc_pt
2022-08-15 14:16:56.796 | DEBUG    | platipy.dicom.io.rtstruct_to_nifti:transform_point_set_from_dicom_struct:152 - This is not a closed planar structure, skipping.
2022-08-15 14:16:56.811 | DEBUG    | platipy.dicom.io.rtstruct_to_nifti:transform_point_set_from_dicom_struct:134 - Converting structure 2 with name: External
2022-08-15 14:17:00.725 | DEBUG    | platipy.dicom.io.rtstruct_to_nifti:transform_point_set_from_dicom_struct:134 - Converting structure 3 with name: GTV1
2022-08-15 14:17:01.422 | DEBUG    | 

## Prepare a dataset

Datasets which are extracted in DICOM format can often be a bit messy and require some cleaning up
after conversion. Exactly what data objects to extract for the clean dataset will differ by project
but here we use a somewhat common approach of extracting the latest structure set for each patient
and the image linked to that.

The resulting dataset is stored in a folder with your dataset name (`clean` for this example).


In [5]:
pydicer.dataset.prepare(dataset_name="clean", preparation_function="rt_latest_dose")

../../../data/HNSCC-01-0199/doses/c16e76
data/clean/HNSCC-01-0199/doses/c16e76
../../../data/HNSCC-01-0199/plans/664e96
data/clean/HNSCC-01-0199/plans/664e96
../../../data/HNSCC-01-0199/structures/06e49c
data/clean/HNSCC-01-0199/structures/06e49c
../../../data/HNSCC-01-0199/images/72b0f9
data/clean/HNSCC-01-0199/images/72b0f9
../../../data/HNSCC-01-0019/doses/309e1a
data/clean/HNSCC-01-0019/doses/309e1a
../../../data/HNSCC-01-0019/plans/57b99f
data/clean/HNSCC-01-0019/plans/57b99f
../../../data/HNSCC-01-0019/structures/7cdcd9
data/clean/HNSCC-01-0019/structures/7cdcd9
../../../data/HNSCC-01-0019/images/b281ea
data/clean/HNSCC-01-0019/images/b281ea
../../../data/HNSCC-01-0176/doses/833a74
data/clean/HNSCC-01-0176/doses/833a74
../../../data/HNSCC-01-0176/plans/a6b346
data/clean/HNSCC-01-0176/plans/a6b346
../../../data/HNSCC-01-0176/structures/cbbf5b
data/clean/HNSCC-01-0176/structures/cbbf5b
../../../data/HNSCC-01-0176/images/c4ffd0
data/clean/HNSCC-01-0176/images/c4ffd0


## Analyse the dataset

The pipeline computes first-order radiomics features by default, as well as dose volume histograms.
Here we can extract out the results easily into a Pandas DataFrame for analysis.

In [9]:
# Display the DataFrame of radiomics computed
df_radiomics = pydicer.analyse.get_all_computed_radiomics_for_dataset(dataset_name="clean")
df_radiomics

Unnamed: 0,Contour,Patient,ImageHashedUID,StructHashedUID,firstorder|10Percentile,firstorder|90Percentile,firstorder|Energy,firstorder|Entropy,firstorder|InterquartileRange,firstorder|Kurtosis,...,firstorder|Mean,firstorder|Median,firstorder|Minimum,firstorder|Range,firstorder|RobustMeanAbsoluteDeviation,firstorder|RootMeanSquared,firstorder|Skewness,firstorder|TotalEnergy,firstorder|Uniformity,firstorder|Variance
5,+1,HNSCC-01-0019,b281ea,7cdcd9,-999.0,339.0,5.178130e+11,4.785593,407.00,4.227583,...,-127.630179,18.0,-1024.0,4000.0,276.691595,565.017962,0.216289,1.481475e+12,0.076182,302955.835075
4,-.3,HNSCC-01-0019,b281ea,7cdcd9,-68.0,669.0,1.587794e+11,4.415742,73.00,9.756706,...,142.809678,41.0,-1024.0,4000.0,77.747524,410.946993,1.886750,4.542715e+11,0.111299,148482.826970
27,Brain,HNSCC-01-0019,b281ea,7cdcd9,24.0,70.0,6.420357e+08,1.698910,21.00,266.544166,...,45.375629,40.0,-849.0,2151.0,9.055957,57.813075,10.351606,1.836879e+09,0.427362,1283.403884
28,Brainstem,HNSCC-01-0019,b281ea,7cdcd9,14.0,46.0,1.860914e+07,1.275139,17.00,133.476694,...,30.442222,30.0,-26.0,581.0,7.063362,33.431514,3.682653,5.324118e+07,0.491624,190.937232
25,CTV63,HNSCC-01-0019,b281ea,7cdcd9,-65.0,308.0,1.346678e+10,4.124500,60.00,10.650191,...,58.805603,37.0,-1014.0,3990.0,36.643247,287.047711,0.524907,3.852878e+10,0.125626,78938.289561
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18,max_50,HNSCC-01-0199,72b0f9,06e49c,-631.5,250.0,3.920070e+07,5.282432,442.25,2.732802,...,-69.705357,40.5,-905.0,1382.0,190.102584,341.568097,-0.882366,9.346176e+07,0.033943,111809.928067
25,post_neck,HNSCC-01-0199,72b0f9,06e49c,-28.0,352.0,9.173127e+09,3.457737,32.00,12.417805,...,107.575110,45.0,-891.0,2488.0,31.982812,243.143026,2.970106,2.187044e+10,0.205189,47546.126922
15,total_ptv,HNSCC-01-0199,72b0f9,06e49c,-262.0,110.0,2.146647e+10,4.334682,109.00,8.478738,...,-17.168810,25.0,-1011.0,2717.0,49.117380,304.491275,-0.258097,5.118006e+10,0.104536,92420.168569
31,uni,HNSCC-01-0199,72b0f9,06e49c,-438.0,439.0,2.621229e+09,4.976606,115.00,5.307995,...,26.582496,35.0,-996.0,2616.0,84.668009,356.600440,-0.164137,6.249497e+09,0.078507,126457.244989


In [10]:
# Extract the D95, D50 and V3 dose metrics
df_dose_metrics = pydicer.analyse.compute_dose_metrics(dataset_name="clean", d_point=[95, 50], v_point=[3])
df_dose_metrics

Unnamed: 0,patient,struct_hash,label,cc,mean,D95,D50,V3
0,HNSCC-01-0019,7cdcd9,+1,4640.553474,29.679710,0.0,25.3,3709.925652
1,HNSCC-01-0019,7cdcd9,-.3,2689.948082,43.784290,11.3,43.3,2688.291550
2,HNSCC-01-0019,7cdcd9,Brain,549.576759,24.043829,7.6,22.3,549.553871
3,HNSCC-01-0019,7cdcd9,Brainstem,47.636032,39.998913,28.9,39.2,47.636032
4,HNSCC-01-0019,7cdcd9,CTV63,467.602730,71.274086,66.1,71.5,467.602730
...,...,...,...,...,...,...,...,...
101,HNSCC-01-0199,06e49c,max_50,0.801086,44.847424,38.9,44.9,0.801086
102,HNSCC-01-0199,06e49c,post_neck,369.942188,23.905859,12.4,23.4,369.849205
103,HNSCC-01-0199,06e49c,total_ptv,552.015305,67.680992,57.4,71.0,552.015305
104,HNSCC-01-0199,06e49c,uni,49.145222,70.718925,66.1,71.3,49.145222


array(['HNSCC-01-0019', 'HNSCC-01-0176', 'HNSCC-01-0199'], dtype=object)