# Object-oriented Programming for Medical Imaging Functionality Demo

This demonstration is intended to showcase the potential of using an object-oriented approach to manage medical image data in pyhton in preparation for ML. We will demonstrate the framework using the first 30 CT images of the following dataset https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/6ACUZJ. 

The dataset consists of CT images of subjects with confirmed lung infections after a positive Covid-19 diagnosis. Each Subject has its own directory that may or may not contain additional subdirectories. Within a single one of these subdirectories for each Subject is a series of .dcm (DICOM) files accounting for several different CT scan volumes representing different types CT scans for that patient.

We will demonstrate how to use the library to view and analyse the scans both individually and at a patient and dataset level. We will also demonstrate how these objects can be saved in the different file formats common in machine learning assisted medical imaging.



In [17]:
from Management import medOOP_3D
from Visualisation import medVIZ_3D
import Parsing.parse_funcs as pf
from tqdm import tqdm


First, we will create a list of pairs consisting of our subject directory path and the subject title.

In [18]:
dicom_root_dir = "/Users/eamonmcandrew/Desktop/ML/CT_work/Covid_Positive_CT/Covid_Positive_CT_Dicom"
nifti_output_root_dir = "/Users/eamonmcandrew/Desktop/ML/CT_work/Covid_Positive_CT/Covid_Positive_CT_Nifti"
slice_output_root_dir = "/Users/eamonmcandrew/Desktop/ML/CT_work/Covid_Positive_CT/Covid_Positive_Slices"



dir_list = pf.get_sub_directories(dicom_root_dir)
dir_list = pf.inner_sub_dir_replace(dir_list) 
paired_list = pf.get_path_id_list(dir_list)




Here is what this list looks like; as some of the patient directories contain multiple sub-directories, it's necessary to do this for this dataset to achieve a proper patient naming convention. In the future, it may be a good idea to add a regex-powered search function to do this within the library itself - but for now, this is the simplest and most customisable way of doing this.

In [19]:
paired_list[:3]

[('/Users/eamonmcandrew/Desktop/ML/CT_work/Covid_Positive_CT/Covid_Positive_CT_Dicom/Subject (1009)',
  'Subject_(1009)'),
 ('/Users/eamonmcandrew/Desktop/ML/CT_work/Covid_Positive_CT/Covid_Positive_CT_Dicom/Subject (1003)',
  'Subject_(1003)'),
 ('/Users/eamonmcandrew/Desktop/ML/CT_work/Covid_Positive_CT/Covid_Positive_CT_Dicom/Subject (1000)',
  'Subject_(1000)')]

Let's use one of the pairs in this list to create a patient object and explore its functionality.

In [20]:
Paitent_1 = medOOP_3D.Patient(paired_list[0][0], paired_list[0][1])




First, let us check we are working with the correct patient by checking the patient ID.

In [21]:
Paitent_1.patient_id

'Subject_(1009)'

A patient object is a collection of one or more scan objects belonging to a single subject and their associated metadata. All we need to do is supply a directory and title to the class creator, and it will automatically generate a Scan object for each individual scan using the underlying Dicom files in the directory.

Let's take a look at the scan objects attributed to this patient.

In [22]:
Paitent_1.scan_list

['UNNAMED_SERIES', 'LUNG_3_mm', 'LUNG', 'Mediastinum']

Lets now take a closer look at the scan titled "LUNG"

In [23]:
Paitent_1.LUNG

Scan Object titled LUNG, with dimensions (158, 512, 512)

We can quickly write this scan to a directory of choice in several different formats; here will convert the 3D volume into 2D axial slices in PNG format.

In [24]:
Paitent_1.LUNG.Vol_to_slices(slice_output_root_dir)

If we want to quickly examine the scans, we can use the following to display the image and use the slider to examine the whole volume: *NOTE this is disabled at present because it stops GitHub from rendering the example notebook - but it should function perfectly with a local install*

In [25]:
# Paitent_1.LUNG.display_3D_volume()

Working at the patient level is excellent, but we often must work with large datasets. We can easily do this by loading the entire list into the Dataset class creator. 

In [26]:
Dataset_1 = medOOP_3D.Dataset(paired_list)


Processing Subject_1002: 100%|██████████| 30/30 [00:44<00:00,  1.47s/it]


The creator will automatically create a patient object for each subject and also create a Scan object for each scan attributed to that patient.

In this manner, let's examine the patient example we have used at each stage.

In [27]:
Dataset_1.Subject_1009

Patient Object concisting of 4 scans : UNNAMED_SERIES, LUNG_3_mm, LUNG, Mediastinum

Let's take a look at another patient in the dataset, and we can easily select one because type hints will display the patients in the dataset. As you can see, this patient actually has a different number of scans compared to the previous example.

In [28]:
Dataset_1.Subject_102

Patient Object concisting of 5 scans : UNNAMED_SERIES, Mediastinum, Lung, Lung_1.5, Mediastinum_1.5

We can have a closer look at the scans attributed to this patient.

In [29]:
Dataset_1.Subject_102.scan_list

['UNNAMED_SERIES', 'Mediastinum', 'Lung', 'Lung_1.5', 'Mediastinum_1.5']

Perhaps a more detailed description of each scan will help us understand what's going on here.

In [30]:
for scan in Dataset_1.Subject_102:
    print(scan)

Scan Object titled UNNAMED_SERIES, with dimensions (1, 512, 512)
Scan Object titled Mediastinum, with dimensions (46, 512, 512)
Scan Object titled Lung, with dimensions (37, 512, 512)
Scan Object titled Lung_1.5, with dimensions (181, 512, 512)
Scan Object titled Mediastinum_1.5, with dimensions (181, 512, 512)


We now have a Dataset object that consists of 30 patients with at least four distinct scan objects each.

In [31]:
Dataset_1.num_patients

30

Here we will quickly list the patients by their IDs to ensure they are all correct.

In [34]:
Dataset_1.patient_list

['Subject_1009',
 'Subject_1003',
 'Subject_1000',
 'Subject_1011',
 'Subject_1018',
 'Subject_1012',
 'Subject_102',
 'Subject_1016',
 'Subject_10',
 'Subject_1015',
 'Subject_101',
 'Subject_1021',
 'Subject_1004',
 'Subject_1022',
 'Subject_1007',
 'Subject_1023',
 'Subject_1006',
 'Subject_1020',
 'Subject_1005',
 'Subject_1014',
 'Subject_100',
 'Subject_1017',
 'Subject_1019',
 'Subject_1013',
 'Subject_1',
 'Subject_1010',
 'Subject_1024',
 'Subject_1001',
 'Subject_1008',
 'Subject_1002']

In order to demonstrate the composability of the Dataset, Patient and Scan objects, I have chosen to write these functions here. In the finished version, these and other similar functionality will have dedicated function calls, but these will serve as an example of the power enabled by the object-oriented approach.

First, let us create a directory of PNG slices for each patient scan titled "LUNG". This task takes six lines of code to complete (fancy progress bar included) and is highly customisable to allow a user to select precisely which patients and scans they desire.

In [39]:
pbar = tqdm(Dataset_1)
for paitent in pbar:
    for scan in paitent:
        if scan.title == "LUNG":
            scan.Vol_to_slices(slice_output_root_dir)
            pbar.set_description(f"Creating PNG slices for {paitent.patient_id} : {scan.title}")

           

Creating PNG slices for Subject_1002 : LUNG: 100%|██████████| 30/30 [00:19<00:00,  1.57it/s]


Now we shall do the same thing, but instead of slices, let us generate a NIfti file for each of the "LUNG" scans.

In [40]:
pbar = tqdm(Dataset_1)
for paitent in pbar:
    for scan in paitent:
        if scan.title == "LUNG":
            scan.write_nifti(nifti_output_root_dir)
            pbar.set_description(f"Creating Nifti file for {paitent.patient_id} : {scan.title}")


Creating Nifti file for Subject_1002 : LUNG: 100%|██████████| 30/30 [01:41<00:00,  3.37s/it]


The complete Dataset we are working with has the CT for 1000 unique patients, and it's interesting to note the different number of scans attributed to each patient even in the first 30 patients.

Here we can print the number of scans each patient in the dataset has in a very efficient and pythonic manner using list comprehension. We cans also see that calling len on a Paitent object returns the number of scans.

In [41]:
print("Number of scans per paitent :",[len(paitent) for paitent in Dataset_1])

Number of scans per paitent : [4, 4, 4, 4, 4, 4, 5, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 5, 4, 4, 4, 4, 4]


In [42]:
len(Dataset_1)

30