# Med Imagetools Tutorial 1: Forming the dataset

In this tutorial, we will show how with med imagetools one can process raw dataset consisting of various different types of modalities needing different processing pipelines. This tutorial will showcase the ease of forming new datasets which is Pytorch-ready for Machine learning pipeline.  
We will showcase here one of the crucial classes Autopipeline in imgtools/autopipeline.py, which process all user defined modalities and stores them as nrrds while taking into account the different relationships between the modalities

## Setup

In [1]:
%pip install --quiet med-imagetools
%pip install --quiet med-imagetools[debug]

[K     |████████████████████████████████| 57 kB 2.5 MB/s 
[K     |████████████████████████████████| 2.0 MB 10.5 MB/s 
[K     |████████████████████████████████| 164 kB 32.4 MB/s 
[K     |████████████████████████████████| 48.4 MB 13 kB/s 
[?25h

In [2]:
import os
import shutil
import pathlib
import urllib.request as request
from zipfile import ZipFile
import torchio as tio
from torch.utils.data import DataLoader
from imgtools.autopipeline import AutoPipeline
from imgtools.io import file_name_convention, Dataset
import pandas as pd

## Download Sample Dataset

In [3]:
print("Downloading the test dataset...")
curr_path = pathlib.Path().parent.parent.resolve()
quebec_path = pathlib.Path(os.path.join(curr_path, "data", "Head-Neck-PET-CT"))

if not os.path.exists(quebec_path):
  pathlib.Path(quebec_path).mkdir(parents=True, exist_ok=True)
  quebec_data_url = "https://github.com/bhklab/tcia_samples/blob/main/Head-Neck-PET-CT.zip?raw=true"
  quebec_zip_path = os.path.join(quebec_path, "Head-Neck-PET-CT.zip")
  request.urlretrieve(quebec_data_url, quebec_zip_path) 
  with ZipFile(quebec_zip_path, 'r') as zipfile:
      zipfile.extractall(quebec_path)
  os.remove(quebec_zip_path)
else:
  print("Data already present")

Downloading the test dataset...


From the downloaded data at /content/data/Head-Neck-PET-CT/ it can be seen that the data structure is not straight forward and very difficult to process for any ML pipeline. Additionally, its not clear from the file structure, what kind of modalities and relationships exists

## Autopipeline

For this test case lets consider dataset with the following modalities:-  
1. CT
2. RTSTRUCT
3. RTDOSE 

Autopipeline first crawls through the raw dataset, indexes it and saves it where the data folder exists. Due to the wide variety of relationships between different modalities, an edge table is formed. Using the user defined modalties, the graph gets queried, and the relevant files are returned along with the relationships.

Autopipeline takes 4 main inputs:-
1. input_directory: (str) Location of rawdataset
2. output_directory: (str) Location where processed dataset would be saved
3. modalties: (str) Different modalties and the relationship that should be considered
4. spacing: (tuple(int,int,int)) spacing in x,y,z coordinates 

In [4]:
output_path = pathlib.Path(os.path.join(curr_path, "processed_dataset"))
pipeline = AutoPipeline(input_directory = quebec_path,
                        output_directory = output_path,
                        modalities = "CT,RTSTRUCT,RTDOSE",
                        spacing = (5,5,5),
                        n_jobs = 2,
                        visualize = True
                        )
pipeline.run()

Couldn't find the dataset index CSV. Indexing the dataset...


100%|██████████| 2/2 [00:00<00:00, 22.73it/s]


Number of patients in the dataset: 2
Edge table not present. Forming the edge table based on the crawl data...


100%|██████████| 2/2 [00:00<00:00, 24.91it/s]


Total time taken: 0.08674931526184082
Saving edge table in /content/data/imgtools_Head-Neck-PET-CT_edges.csv





Generating visualizations...
Forming the graph based on the given modalities: CT,RTSTRUCT,RTDOSE
There are 2 cases containing all CT,RTSTRUCT,RTDOSE modalities.


  return func(self, *args, **kwargs)
  return array(a, dtype, copy=False, order=order)
  arr_value = np.array(value)


## Output Files
Autopipeline gives 4 files along with the processed dicom files
1. /data/*{dataset_name}*/imgtools_*{dataset_name}*.csv: crawl output
2. /data/*{dataset_name}*/imgtools_*{dataset_name}_edges*.csv: edge table
3. /data/*{dataset_name}*/datanet.html: For visualizing the dataset with graphs
4. *{output_path}*/dataset.csv: Metadata of different components saved 

### Crawl ouput

In [11]:
df = pd.read_csv(os.path.join(os.path.dirname(quebec_path),"imgtools_Head-Neck-PET-CT.csv"),index_col=0)
df

Unnamed: 0,folder,instance_uid,instances,modality,patient_ID,reference_ct,reference_frame,reference_pl,reference_rs,series,series_description,study,study_description
0,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.766477766548...,1.0,RTSTRUCT,HN-CHUS-082,1.3.6.1.4.1.14519.5.2.1.5168.2407.289342954540...,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.766477766548...,RTstruct_CTsim->PET(PET-CT),1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP]
1,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.105007968190...,234.0,PT,HN-CHUS-082,,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.289342954540...,LOR-RAMLA,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP]
2,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.179765748272...,2.0,RTSTRUCT,HN-CHUS-082,1.3.6.1.4.1.14519.5.2.1.5168.2407.208685212796...,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.156522899867...,Pinnacle POI,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP]
3,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.282571492555...,1.0,RTDOSE,HN-CHUS-082,,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,1.3.6.1.4.1.14519.5.2.1.5168.2407.121619654285...,,1.3.6.1.4.1.14519.5.2.1.5168.2407.104174488062...,,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP]
4,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.121619654285...,1.0,RTPLAN,HN-CHUS-082,,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,1.3.6.1.4.1.14519.5.2.1.5168.2407.179765748272...,1.3.6.1.4.1.14519.5.2.1.5168.2407.112437338536...,,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP]
5,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.329354641638...,134.0,CT,HN-CHUS-082,1.3.6.1.4.1.14519.5.2.1.5168.2407.575900405303...,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.208685212796...,Merged,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP]
6,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.645289681167...,1.0,RTPLAN,HN-CHUS-052,,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,1.3.6.1.4.1.14519.5.2.1.5168.2407.196238472009...,1.3.6.1.4.1.14519.5.2.1.5168.2407.180996930421...,,1.3.6.1.4.1.14519.5.2.1.5168.2407.270192284011...,Oroph_CB.0:Oroph_CB::TRTID derived (StudyInsta...
7,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.196238472009...,1.0,RTSTRUCT,HN-CHUS-052,1.3.6.1.4.1.14519.5.2.1.5168.2407.316675519384...,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.259673657557...,Pinnacle POI,1.3.6.1.4.1.14519.5.2.1.5168.2407.270192284011...,Oroph_CB.0:Oroph_CB::TRTID derived (StudyInsta...
8,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.172774347407...,1.0,RTDOSE,HN-CHUS-052,,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,1.3.6.1.4.1.14519.5.2.1.5168.2407.645289681167...,,1.3.6.1.4.1.14519.5.2.1.5168.2407.241156363783...,,1.3.6.1.4.1.14519.5.2.1.5168.2407.270192284011...,Oroph_CB.0:Oroph_CB::TRTID derived (StudyInsta...
9,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.174060535505...,132.0,CT,HN-CHUS-052,1.3.6.1.4.1.14519.5.2.1.5168.2407.127778300374...,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.316675519384...,Merged,1.3.6.1.4.1.14519.5.2.1.5168.2407.270192284011...,Oroph_CB.0:Oroph_CB::TRTID derived (StudyInsta...


### Edge table

In [15]:
df = pd.read_csv(os.path.join(os.path.dirname(quebec_path),"imgtools_Head-Neck-PET-CT_edges.csv"))
df

Unnamed: 0,folder_x,instance_uid_x,instances_x,modality_x,patient_ID_x,reference_ct_x,reference_frame_x,reference_pl_x,reference_rs_x,series_x,series_description_x,study_x,study_description_x,folder_y,instance_uid_y,instances_y,modality_y,reference_ct_y,reference_frame_y,reference_pl_y,reference_rs_y,series_y,edge_type
0,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.179765748272...,2.0,RTSTRUCT,HN-CHUS-082,1.3.6.1.4.1.14519.5.2.1.5168.2407.208685212796...,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.156522899867...,Pinnacle POI,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP],/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.282571492555...,1.0,RTDOSE,,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,1.3.6.1.4.1.14519.5.2.1.5168.2407.121619654285...,1.3.6.1.4.1.14519.5.2.1.5168.2407.179765748272...,1.3.6.1.4.1.14519.5.2.1.5168.2407.104174488062...,0
1,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.329354641638...,134.0,CT,HN-CHUS-082,1.3.6.1.4.1.14519.5.2.1.5168.2407.575900405303...,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.208685212796...,Merged,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP],/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.179765748272...,2.0,RTSTRUCT,1.3.6.1.4.1.14519.5.2.1.5168.2407.208685212796...,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.156522899867...,2
2,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.105007968190...,234.0,PT,HN-CHUS-082,,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.289342954540...,LOR-RAMLA,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP],/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.766477766548...,1.0,RTSTRUCT,1.3.6.1.4.1.14519.5.2.1.5168.2407.289342954540...,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.766477766548...,3
3,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.329354641638...,134.0,CT,HN-CHUS-082,1.3.6.1.4.1.14519.5.2.1.5168.2407.575900405303...,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.208685212796...,Merged,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,TEP cancerologique [TEP],/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.105007968190...,234.0,PT,,1.3.6.1.4.1.14519.5.2.1.5168.2407.175687945653...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.289342954540...,4
4,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.196238472009...,1.0,RTSTRUCT,HN-CHUS-052,1.3.6.1.4.1.14519.5.2.1.5168.2407.316675519384...,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.259673657557...,Pinnacle POI,1.3.6.1.4.1.14519.5.2.1.5168.2407.270192284011...,Oroph_CB.0:Oroph_CB::TRTID derived (StudyInsta...,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.172774347407...,1.0,RTDOSE,,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,1.3.6.1.4.1.14519.5.2.1.5168.2407.645289681167...,1.3.6.1.4.1.14519.5.2.1.5168.2407.196238472009...,1.3.6.1.4.1.14519.5.2.1.5168.2407.241156363783...,0
5,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.174060535505...,132.0,CT,HN-CHUS-052,1.3.6.1.4.1.14519.5.2.1.5168.2407.127778300374...,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.316675519384...,Merged,1.3.6.1.4.1.14519.5.2.1.5168.2407.270192284011...,Oroph_CB.0:Oroph_CB::TRTID derived (StudyInsta...,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.196238472009...,1.0,RTSTRUCT,1.3.6.1.4.1.14519.5.2.1.5168.2407.316675519384...,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.259673657557...,2
6,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.105065754632...,255.0,PT,HN-CHUS-052,,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.267511642604...,LOR-RAMLA,1.3.6.1.4.1.14519.5.2.1.5168.2407.270192284011...,Oroph_CB.0:Oroph_CB::TRTID derived (StudyInsta...,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.442248252736...,1.0,RTSTRUCT,1.3.6.1.4.1.14519.5.2.1.5168.2407.267511642604...,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.442248252736...,3
7,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.174060535505...,132.0,CT,HN-CHUS-052,1.3.6.1.4.1.14519.5.2.1.5168.2407.127778300374...,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.316675519384...,Merged,1.3.6.1.4.1.14519.5.2.1.5168.2407.270192284011...,Oroph_CB.0:Oroph_CB::TRTID derived (StudyInsta...,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.105065754632...,255.0,PT,,1.3.6.1.4.1.14519.5.2.1.5168.2407.162021466661...,,,1.3.6.1.4.1.14519.5.2.1.5168.2407.267511642604...,4


### Graph visualization  
Download the datanet.html file for better visualization   
You can also change the size by editing the html file

In [26]:
import IPython
IPython.display.HTML(filename=os.path.join(os.path.dirname(quebec_path),"datanet.html"))

### Metadata and Component table

In [22]:
df = pd.read_csv(os.path.join(output_path,"dataset.csv"))
df.head()

Unnamed: 0.1,Unnamed: 0,study,patient_ID,series_CT,folder_CT,series_RTSTRUCT_CT,folder_RTSTRUCT_CT,series_RTDOSE_CT,folder_RTDOSE_CT,size_CT,metadata_RTSTRUCT_CT,size_RTDOSE_CT,metadata_RTDOSE_CT
0,0_HN-CHUS-052,1.3.6.1.4.1.14519.5.2.1.5168.2407.270192284011...,HN-CHUS-052,1.3.6.1.4.1.14519.5.2.1.5168.2407.316675519384...,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.259673657557...,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.241156363783...,/content/data/Head-Neck-PET-CT/HN-CHUS-052/08-...,"(120, 120, 79)","[['IsoCT', 'Isocentre', 'GTV1', 'GTV2 GG', 'CT...","(120, 120, 79)",[{}]
1,1_HN-CHUS-082,1.3.6.1.4.1.14519.5.2.1.5168.2407.510837068387...,HN-CHUS-082,1.3.6.1.4.1.14519.5.2.1.5168.2407.208685212796...,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.156522899867...,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,1.3.6.1.4.1.14519.5.2.1.5168.2407.104174488062...,/content/data/Head-Neck-PET-CT/HN-CHUS-082/08-...,"(120, 120, 80)","[['Iso Nasopharynx', 'GTV1', 'CTV1', 'CTV2', '...","(120, 120, 80)",[{}]
