# Exercise
Prepare DICOM Images for ML Exercise

In this exercise, you'll receive a small set of seven DICOM images. Here, rather than extracting the image itself from the DICOM file, we'll be extracting other attributes that tell us about the image and the patient who is represented in it.

To complete this exercise, create a single dataframe that has the following columns:

-    Patient ID
-    Patient Age (as an integer)
-    Patient Sex (M/F)
-    Imaging Modality
-    Type of finding in the image
-    Number of rows in the image
-    Number of columns in the image

Print the contents of this dataframe.

Note: When you see an attribute listed like "Patient's Age" for a DICOM, this can usually be extracted with pydicom by removing the spaces and special characters, like dcm.PatientAge

In [1]:
import pandas as pd
import numpy as np
import pydicom
import glob

In [2]:
## First, read all of my DICOM files into a list
mydicoms = glob.glob("*.dcm")

### Let's look at the contents of the first DICOM:

In [3]:
dcm1 = pydicom.dcmread(mydicoms[0])
dcm1

(0008, 0016) SOP Class UID                       UI: Secondary Capture Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.3.6.1.4.1.11129.5.5.139539879914217162512411239901306132962191
(0008, 0060) Modality                            CS: 'DX'
(0008, 1030) Study Description                   LO: 'Atelectasis'
(0010, 0020) Patient ID                          LO: '13118'
(0010, 0040) Patient's Sex                       CS: 'M'
(0010, 1010) Patient's Age                       AS: '69'
(0020, 000d) Study Instance UID                  UI: 1.3.6.1.4.1.11129.5.5.120992059193772113283592409393507044871674
(0020, 000e) Series Instance UID                 UI: 1.3.6.1.4.1.11129.5.5.110922964580080663514009950443538578354984
(0028, 0002) Samples per Pixel                   US: 1
(0028, 0004) Photometric Interpretation          CS: 'MONOCHROME2'
(0028, 0010) Rows                                US: 1024
(0028, 0011) Columns                             US: 1024
(0028, 0100) Bits Allo

In [5]:
## Do some exploratory work before about how to extract these attributes using pydicom... 
dcm1.Modality





'DX'

In [6]:
dcm1.StudyDescription

'Atelectasis'

In [7]:
dcm1.PatientID

'13118'

In [8]:
dcm1.PatientSex

'M'

In [9]:
dcm1.PatientAge

'69'

In [10]:
dcm1.Rows

1024

In [11]:
dcm1.Columns

1024

## Now, let's create the dataframe that we want, and populate it in a loop with all of our DICOMS:

To complete this exercise, create a single dataframe that has the following columns:
- Patient ID
- Patient Age (as an integer)
- Patient Sex (M/F)
- Imaging Modality
- Type of finding in the image
- Number of rows in the image
- Number of columns in the image

Save this dataframe as a .CSV file.

In [13]:
all_data = []

for i in mydicoms:
    dcm = pydicom.dcmread(i)
    fields = [dcm.PatientID, int(dcm.PatientAge), dcm.PatientSex, dcm.Modality, dcm.StudyDescription,
              dcm.Rows, dcm.Columns]
    all_data.append(fields)

In [14]:
mydata = pd.DataFrame(all_data, 
                      columns = ['PatientID','PatientAge','PatientSex','Modality','Findings','Rows','Columns'])

In [None]:
mydata