# Getting Started

This notebook provides a basic example to run the PyDicer pipeline using some test data.

In [1]:
try:
    from pydicer import PyDicer
except ImportError:
    !pip install pydicer
    from pydicer import PyDicer

from pathlib import Path

from pydicer.input.test import TestInput

## Setup working directory

First we'll create a directory for our project. Change the `directory` location to a folder on your
system where you'd like PyDicer to work with this data.

In [2]:
directory = Path("./data")

## Create a PyDicer object

The PyDicer class provides all functionlity to run the pipeline and work with the data stored and
converted in your project directory

In [3]:
pydicer = PyDicer(directory)

## Fetch some data

A TestInput class is provided in pydicer to download some sample data to work with. Several other
input classes exist if you'd like to retrieve DICOM data for conversion from somewhere else, [see 
the docs for information on how these work](https://australiancancerdatanetwork.github.io/pydicer/html/input.html).

In [4]:
dicom_directory = directory.joinpath("dicom")
test_input = TestInput(dicom_directory)
test_input.fetch_data()

# Add the input DICOM location to the pydicer object
pydicer.add_input(dicom_directory)

## Run the pipeline

The function runs the entire PyDicer pipeline on the test DICOM data. This includes:
- Preprocessing the DICOM data (data which can't be handled or is corrupt will be placed in Quarantine)
- Convert the data to Nifti format (see the output in the `data` directory)
- Visualise the data (png files will be placed alongside the converted Nifti files)
- Compute Radiomics features (Results are stored in a csv alongside the converted structures)
- Compute Dose Volume Histograms (results are stored alongside converted dose data)

> Note that the entire Pipeline can be quite time consuming to run. Depending on your project's
> dataset you will likely want to run only portions of the pipeline with finer control over each
> step. For this reason we only run the pipeline for one patient here as a demonstration.

In [5]:
pydicer.run_pipeline(patient="HNSCC-01-0019")

  0%|          | 0/1309 [00:00<?, ?files/s, preprocess]

  0%|          | 1/1309 [00:00<06:59,  3.12files/s, preprocess]

  0%|          | 2/1309 [00:00<04:47,  4.55files/s, preprocess]

  3%|▎         | 35/1309 [00:00<00:13, 94.31files/s, preprocess]

  5%|▌         | 69/1309 [00:00<00:07, 163.00files/s, preprocess]

  8%|▊         | 103/1309 [00:00<00:05, 212.67files/s, preprocess]

 10%|█         | 137/1309 [00:00<00:04, 248.53files/s, preprocess]

 13%|█▎        | 171/1309 [00:00<00:04, 274.08files/s, preprocess]

 15%|█▌        | 202/1309 [00:01<00:07, 143.96files/s, preprocess]

 18%|█▊        | 242/1309 [00:01<00:05, 188.55files/s, preprocess]

 22%|██▏       | 282/1309 [00:01<00:04, 230.56files/s, preprocess]

 24%|██▍       | 314/1309 [00:02<00:08, 112.05files/s, preprocess]

 27%|██▋       | 359/1309 [00:02<00:06, 153.92files/s, preprocess]

 31%|███       | 404/1309 [00:02<00:04, 198.40files/s, preprocess]

 34%|███▍      | 449/1309 [00:02<00:03, 242.78files/s, preprocess]

 37%|███▋      | 487/1309 [00:02<00:04, 169.04files/s, preprocess]

 40%|███▉      | 523/1309 [00:03<00:03, 197.44files/s, preprocess]

 43%|████▎     | 559/1309 [00:03<00:03, 225.30files/s, preprocess]

 45%|████▌     | 594/1309 [00:03<00:02, 250.44files/s, preprocess]

 48%|████▊     | 629/1309 [00:03<00:02, 272.57files/s, preprocess]

 51%|█████     | 664/1309 [00:03<00:02, 291.32files/s, preprocess]

 53%|█████▎    | 700/1309 [00:03<00:01, 306.73files/s, preprocess]

 56%|█████▌    | 735/1309 [00:03<00:01, 317.32files/s, preprocess]

 59%|█████▉    | 770/1309 [00:03<00:01, 326.28files/s, preprocess]

 61%|██████▏   | 805/1309 [00:03<00:01, 331.39files/s, preprocess]

 64%|██████▍   | 840/1309 [00:03<00:01, 335.19files/s, preprocess]

 67%|██████▋   | 875/1309 [00:04<00:01, 339.28files/s, preprocess]

 70%|██████▉   | 910/1309 [00:04<00:01, 342.35files/s, preprocess]

 72%|███████▏  | 945/1309 [00:04<00:01, 344.12files/s, preprocess]

 75%|███████▍  | 980/1309 [00:04<00:00, 345.45files/s, preprocess]

 78%|███████▊  | 1016/1309 [00:04<00:00, 346.71files/s, preprocess]

 80%|████████  | 1052/1309 [00:04<00:00, 347.88files/s, preprocess]

 83%|████████▎ | 1088/1309 [00:04<00:00, 348.98files/s, preprocess]

 86%|████████▌ | 1124/1309 [00:04<00:00, 349.61files/s, preprocess]

 89%|████████▊ | 1160/1309 [00:04<00:00, 349.50files/s, preprocess]

 92%|█████████▏| 1202/1309 [00:04<00:00, 368.73files/s, preprocess]

 95%|█████████▌| 1247/1309 [00:05<00:00, 392.44files/s, preprocess]

 99%|█████████▉| 1293/1309 [00:05<00:00, 409.90files/s, preprocess]

100%|██████████| 1309/1309 [00:05<00:00, 247.62files/s, preprocess]




  0%|          | 0/4 [00:00<?, ?objects/s, convert]

 25%|██▌       | 1/4 [00:02<00:07,  2.41s/objects, convert]

 75%|███████▌  | 3/4 [00:02<00:00,  1.27objects/s, convert]

100%|██████████| 4/4 [01:14<00:00, 25.33s/objects, convert]

100%|██████████| 4/4 [01:14<00:00, 18.64s/objects, convert]




  0%|          | 0/3 [00:00<?, ?objects/s, visualise]

 33%|███▎      | 1/3 [00:00<00:01,  1.72objects/s, visualise]

 67%|██████▋   | 2/3 [00:13<00:08,  8.00s/objects, visualise]

100%|██████████| 3/3 [00:26<00:00, 10.06s/objects, visualise]

100%|██████████| 3/3 [00:26<00:00,  8.76s/objects, visualise]




  0%|          | 0/1 [00:00<?, ?objects/s, Compute Radiomics]

100%|██████████| 1/1 [00:28<00:00, 28.83s/objects, Compute Radiomics]

100%|██████████| 1/1 [00:28<00:00, 28.83s/objects, Compute Radiomics]




  0%|          | 0/1 [00:00<?, ?objects/s, Compute DVH]

100%|██████████| 1/1 [00:13<00:00, 13.55s/objects, Compute DVH]

100%|██████████| 1/1 [00:13<00:00, 13.55s/objects, Compute DVH]




## Prepare a dataset

Datasets which are extracted in DICOM format can often be a bit messy and require some cleaning up
after conversion. Exactly what data objects to extract for the clean dataset will differ by project
but here we use a somewhat common approach of extracting the latest structure set for each patient
and the image linked to that.

The resulting dataset is stored in a folder with your dataset name (`clean` for this example).


In [6]:
pydicer.dataset.prepare(dataset_name="clean", preparation_function="rt_latest_dose")

## Analyse the dataset

The pipeline computes first-order radiomics features by default, as well as dose volume histograms.
Here we can extract out the results easily into a Pandas DataFrame for analysis.

In [7]:
# Display the DataFrame of radiomics computed
df_radiomics = pydicer.analyse.get_all_computed_radiomics_for_dataset(dataset_name="clean")
df_radiomics

Unnamed: 0,Contour,Patient,ImageHashedUID,StructHashedUID,firstorder|10Percentile,firstorder|90Percentile,firstorder|Energy,firstorder|Entropy,firstorder|InterquartileRange,firstorder|Kurtosis,...,firstorder|Mean,firstorder|Median,firstorder|Minimum,firstorder|Range,firstorder|RobustMeanAbsoluteDeviation,firstorder|RootMeanSquared,firstorder|Skewness,firstorder|TotalEnergy,firstorder|Uniformity,firstorder|Variance
17,+1,HNSCC-01-0019,b281ea,7cdcd9,-999.0,339.0,517813000000.0,4.785593,407.0,4.227583,...,-127.630179,18.0,-1024.0,4000.0,276.691595,565.017962,0.216289,1481475000000.0,0.076182,302955.835075
24,-.3,HNSCC-01-0019,b281ea,7cdcd9,-68.0,669.0,158779400000.0,4.415742,73.0,9.756706,...,142.809678,41.0,-1024.0,4000.0,77.747524,410.946993,1.88675,454271500000.0,0.111299,148482.82697
23,Brain,HNSCC-01-0019,b281ea,7cdcd9,24.0,70.0,642035700.0,1.69891,21.0,266.544166,...,45.375629,40.0,-849.0,2151.0,9.055957,57.813075,10.351606,1836879000.0,0.427362,1283.403884
29,Brainstem,HNSCC-01-0019,b281ea,7cdcd9,14.0,46.0,18609140.0,1.275139,17.0,133.476694,...,30.442222,30.0,-26.0,581.0,7.063362,33.431514,3.682653,53241180.0,0.491624,190.937232
3,CTV63,HNSCC-01-0019,b281ea,7cdcd9,-65.0,308.0,13466780000.0,4.1245,60.0,10.650191,...,58.805603,37.0,-1014.0,3990.0,36.643247,287.047711,0.524907,38528780000.0,0.125626,78938.289561
25,CTV63_Sep,HNSCC-01-0019,b281ea,7cdcd9,-409.0,498.2,8517524000.0,5.069875,137.0,5.673825,...,37.578143,32.0,-1014.0,3990.0,89.611043,407.993479,0.16345,24368830000.0,0.06147,165046.562407
22,CTV70,HNSCC-01-0019,b281ea,7cdcd9,-34.0,209.0,5101866000.0,3.585289,46.0,16.344663,...,69.823863,38.0,-997.0,2572.0,24.191215,211.758612,1.98796,14596550000.0,0.164658,39966.337967
30,Cord,HNSCC-01-0019,b281ea,7cdcd9,5.0,85.0,51364840.0,2.646141,34.0,25.991774,...,48.951795,49.0,-296.0,1058.0,14.980855,71.085212,2.263382,146956000.0,0.23373,2656.829157
27,Cord_EXPANDED,HNSCC-01-0019,b281ea,7cdcd9,24.0,754.0,5387667000.0,4.928948,394.0,7.598874,...,276.231702,123.0,-296.0,3272.0,177.244691,417.346112,1.615239,15414240000.0,0.061246,97873.824272
4,External,HNSCC-01-0019,b281ea,7cdcd9,-78.0,582.0,166320800000.0,4.426843,79.0,10.768368,...,121.129199,37.0,-1024.0,4000.0,62.170197,389.791941,2.028668,475847600000.0,0.10493,137265.474901


In [8]:
# Extract the D95, D50 and V3 dose metrics
df_dose_metrics = pydicer.analyse.compute_dose_metrics(dataset_name="clean", d_point=[95, 50], v_point=[3])
df_dose_metrics

Unnamed: 0,patient,struct_hash,dose_hash,label,cc,mean,D95,D50,V3
0,HNSCC-01-0019,7cdcd9,309e1a,+1,4640.553474,29.67971,0.030835,25.358893,3709.925652
1,HNSCC-01-0019,7cdcd9,309e1a,-.3,2689.948082,43.78429,11.384142,43.358867,2688.29155
2,HNSCC-01-0019,7cdcd9,309e1a,Brain,549.576759,24.043829,7.657684,22.339093,549.553871
3,HNSCC-01-0019,7cdcd9,309e1a,Brainstem,47.636032,39.998913,28.935185,39.202222,47.636032
4,HNSCC-01-0019,7cdcd9,309e1a,CTV63,467.60273,71.274086,66.177108,71.589748,467.60273
5,HNSCC-01-0019,7cdcd9,309e1a,CTV63_Sep,146.395683,68.858795,64.433568,69.12256,146.395683
6,HNSCC-01-0019,7cdcd9,309e1a,CTV70,325.512886,72.30452,69.484503,72.385685,325.512886
7,HNSCC-01-0019,7cdcd9,309e1a,Cord,29.082298,24.179092,3.371394,32.13,28.092384
8,HNSCC-01-0019,7cdcd9,309e1a,Cord_EXPANDED,88.497162,23.978754,3.222574,31.201053,84.640503
9,HNSCC-01-0019,7cdcd9,309e1a,External,3131.858826,41.149075,8.677738,39.766531,3123.699188
