# Data Exporation Duke Breast Cancer MRI

- **Dynamic contrast-enhanced magnetic resonance images of breast cancer patients with tumor locations (Duke-Breast-Cancer-MRI)**: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70226903

In [43]:
import pandas as pd
import numpy as np
import os
import pydicom as dicom
from tqdm import tqdm
from skimage.io import imsave
import matplotlib.pylab as plt
import seaborn as sns

## Load Lesion Annotation Boxes Excel

In [8]:
annot_boxes_df = pd.read_excel("/media/james/My Passport/Jetson_TX2_CMPE258/duke-breast-cancer-mri/Annotation_Boxes.xlsx", index_col=None, names=["Patient ID","Start Row","End Row","Start Column", "End Column", "Start Slice", "End Slice"])
annot_boxes_df.head()

Unnamed: 0,Patient ID,Start Row,End Row,Start Column,End Column,Start Slice,End Slice
0,Breast_MRI_001,234,271,308,341,89,112
1,Breast_MRI_002,251,294,108,136,59,72
2,Breast_MRI_003,351,412,82,139,96,108
3,Breast_MRI_004,262,280,193,204,86,95
4,Breast_MRI_005,188,213,138,178,76,122


## Load Breast Radiologist Density Assessment Excel

In [13]:
radiologist_density_asmt_df = pd.read_excel("/media/james/My Passport/Jetson_TX2_CMPE258/duke-breast-cancer-mri/Breast_Radiologist_Density_Assessments.xlsx", index_col=None, names=["Subject_ID","Radiologist A","Radiologist B","Radiologist C","BI‐RADS density categories"])
radiologist_density_asmt_df.head()

Unnamed: 0,Subject_ID,Radiologist A,Radiologist B,Radiologist C,BI‐RADS density categories
0,Breast_MRI_010,c,c,c,
1,Breast_MRI_014,b,c,c,
2,Breast_MRI_026,a,b,b,
3,Breast_MRI_029,a,b,a,
4,Breast_MRI_038,b,c,b,


## Load Clinical and Other Features Excel

In [14]:
clinical_features_df = pd.read_excel("/media/james/My Passport/Jetson_TX2_CMPE258/duke-breast-cancer-mri/Clinical_and_Other_Features.xlsx", index_col=None, header=None)
clinical_features_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,88,89,90,91,92,93,94,95,96,97
0,Patient Information,MRI Technical Information,,,,,,,,,...,,Anti-Her2 Neu Therapy,,Neoadjuvant therapy,Pathologic Response to Neoadjuvant Therapy,,,Near Complete Response,,
1,Patient ID,Days to MRI (From the Date of Diagnosis),Manufacturer,Manufacturer Model Name,Scan Options,Field Strength (Tesla),Patient Position During MRI,Image Position of Patient,Contrast Agent,Contrast Bolus Volume (mL),...,Therapeutic or Prophylactic Oophorectomy as pa...,Neoadjuvant Anti-Her2 Neu Therapy,Adjuvant Anti-Her2 Neu Therapy,Received Neoadjuvant Therapy or Not,Pathologic response to Neoadjuvant therapy: Pa...,Pathologic response to Neoadjuvant therapy: P...,Pathologic response to Neoadjuvant therapy: P...,Overall Near-complete Response: Stricter Defi...,Overall Near-complete Response: Looser Defini...,Near-complete Response (Graded Measure)
2,,,"GE MEDICAL SYSTEMS=0, MPTronic software=1, SIE...","Avanto=0, Optima MR450w=1, SIGNA EXCITE=2, SIG...","FAST_GEMS\SAT_GEMS\ACC_GEMS\PFP\FS=0,FAST_GEMS...","1.494=0,1.5=1,2.8936=2,3=3","FFP=0,HFP=1",,"GADAVIST=0,MAGNEVIST=1,MMAGNEVIST=2,MULTIHANCE...","6=0,7=1,8=2,9=3,10=4,11=5,11.88=6,12=7,13=8,13...",...,"{0 = no, 1 = yes, NP = not pertinent}","{0 = no, 1 = yes}","{0 = no, 1 = yes}","{1 = yes, 2 = no, NA = not applicable}",{ -1 = TX; 0 = T0; 1 = T1; 2 = T2; 3 = T3;...,{ -1 = NX; 0 = N0; 1 = N1; 2 = N2; 3 = N3...,{ -1 = MX; 0 = M0; 1 = M1; NA = not applica...,"{0 = not complete or near-complete, 1 = comple...","{0 = not complete or near-complete, 1 = comple...",{0 = Not complete or near-complete; 1 = Compl...
3,Breast_MRI_001,6,2,0,5,1,0,-191.8003 X -176.1259 X 86.6065,1,15,...,1,1,1,1,1,-1,-1,0,0,0
4,Breast_MRI_002,12,0,4,1,3,0,154.724 X 176.048 X 94.5771,1,,...,0,0,0,1,,,,2,2,4


In [23]:
# grab row 2, put into list, show top 5
clinical_features_header = clinical_features_df.loc[1,:].values.tolist()

In [25]:
clinical_features_header[:5]

['Patient ID',
 'Days to MRI (From the Date of Diagnosis)',
 'Manufacturer',
 'Manufacturer Model Name',
 'Scan Options']

In [24]:
len(clinical_features_header)

98

In [26]:
# get rid of first 2 rows, insert the header from the second row
clinical_features_df = clinical_features_df[2:]
clinical_features_df.columns = clinical_features_header

In [27]:
clinical_features_df.head()

Unnamed: 0,Patient ID,Days to MRI (From the Date of Diagnosis),Manufacturer,Manufacturer Model Name,Scan Options,Field Strength (Tesla),Patient Position During MRI,Image Position of Patient,Contrast Agent,Contrast Bolus Volume (mL),...,Therapeutic or Prophylactic Oophorectomy as part of Endocrine Therapy,Neoadjuvant Anti-Her2 Neu Therapy,Adjuvant Anti-Her2 Neu Therapy,Received Neoadjuvant Therapy or Not,Pathologic response to Neoadjuvant therapy: Pathologic stage (T) following neoadjuvant therapy,Pathologic response to Neoadjuvant therapy: Pathologic stage (N) following neoadjuvant therapy,Pathologic response to Neoadjuvant therapy: Pathologic stage (M) following neoadjuvant therapy,Overall Near-complete Response: Stricter Definition,Overall Near-complete Response: Looser Definition,Near-complete Response (Graded Measure)
2,,,"GE MEDICAL SYSTEMS=0, MPTronic software=1, SIE...","Avanto=0, Optima MR450w=1, SIGNA EXCITE=2, SIG...","FAST_GEMS\SAT_GEMS\ACC_GEMS\PFP\FS=0,FAST_GEMS...","1.494=0,1.5=1,2.8936=2,3=3","FFP=0,HFP=1",,"GADAVIST=0,MAGNEVIST=1,MMAGNEVIST=2,MULTIHANCE...","6=0,7=1,8=2,9=3,10=4,11=5,11.88=6,12=7,13=8,13...",...,"{0 = no, 1 = yes, NP = not pertinent}","{0 = no, 1 = yes}","{0 = no, 1 = yes}","{1 = yes, 2 = no, NA = not applicable}",{ -1 = TX; 0 = T0; 1 = T1; 2 = T2; 3 = T3;...,{ -1 = NX; 0 = N0; 1 = N1; 2 = N2; 3 = N3...,{ -1 = MX; 0 = M0; 1 = M1; NA = not applica...,"{0 = not complete or near-complete, 1 = comple...","{0 = not complete or near-complete, 1 = comple...",{0 = Not complete or near-complete; 1 = Compl...
3,Breast_MRI_001,6.0,2,0,5,1,0,-191.8003 X -176.1259 X 86.6065,1,15,...,1,1,1,1,1,-1,-1,0,0,0
4,Breast_MRI_002,12.0,0,4,1,3,0,154.724 X 176.048 X 94.5771,1,,...,0,0,0,1,,,,2,2,4
5,Breast_MRI_003,10.0,0,3,2,3,0,174.658 X 228.317 X 88.4878,1,,...,0,0,0,1,1,1,-1,0,0,0
6,Breast_MRI_004,18.0,0,4,1,1,0,188.148 X 194.282 X 94.1832,1,,...,0,0,0,2,,,,,,


## Load Imaging Features Excel

In [28]:
imaging_features_df = pd.read_excel("/media/james/My Passport/Jetson_TX2_CMPE258/duke-breast-cancer-mri/Imaging_Features.xlsx", index_col=None, header=None)
imaging_features_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,520,521,522,523,524,525,526,527,528,529
0,Patient ID,"F1_DT_POSTCON (T11=0.05,T12=0.5)","F1_DT_POSTCON (T11=0.05,T12=0.1)","F1_DT_POSTCON (T11=0.02,T12=0.5)","F1_DT_POSTCON (T11=0.02,T12=0.8)","F1_DT_POSTCON (T11=0.05,T12=0.8)","F1_DT_POSTCON (T11=0.1,T12=0.5)","F1_DT_POSTCON (T11=0.1,T12=0.8)","F1_DT_POSTCON (T11=0.2,T12=0.5)","F1_DT_POSTCON (T11=0.2,T12=0.8)",...,WashinRate_map_difference_entropy_tissue_PostCon,WashinRate_map_information_measure_correlation...,WashinRate_map_information_measure_correlation...,WashinRate_map_inverse_difference_is_homom_tis...,WashinRate_map_inverse_difference_normalized_t...,WashinRate_map_inverse_difference_moment_norma...,WashinRate_map_mean_tissue_PostCon,WashinRate_map_std_dev_tissue_PostCon,WashinRate_map_skewness_tissue_PostCon,WashinRate_map_kurtosis_tissue_PostCon
1,Breast_MRI_001,1,0.120721,0.530395,1,1,1,1,1,1,...,3.380663,-0.025575,0.422391,0.171959,0.960359,0.996829,14.517894,20.347506,1.62587,11.406955
2,Breast_MRI_002,1,0.129546,0.485217,1,1,1,1,1,1,...,3.444474,-0.036063,0.505652,0.177087,0.959067,0.996363,47.29795,83.909561,0.251498,5.659428
3,Breast_MRI_003,0.174775,0.062051,0.06991,0.132265,0.330662,0.34955,0.661324,0.6991,1,...,3.478455,-0.04373,0.546674,0.170507,0.957527,0.995981,114.171582,129.252343,1.928743,11.554948
4,Breast_MRI_004,0.086546,0.045111,0.034619,0.051265,0.128162,0.173093,0.256325,0.346185,0.51265,...,3.389678,-0.017802,0.363818,0.17721,0.960705,0.996827,33.499175,69.164227,1.171314,8.493319


In [30]:
imaging_features_header = imaging_features_df.loc[0,:].values.tolist()

In [31]:
imaging_features_header[:5]

['Patient ID',
 'F1_DT_POSTCON (T11=0.05,T12=0.5)',
 'F1_DT_POSTCON (T11=0.05,T12=0.1)',
 'F1_DT_POSTCON (T11=0.02,T12=0.5)',
 'F1_DT_POSTCON (T11=0.02,T12=0.8)']

In [32]:
# get rid of first row, insert the header from the first row
imaging_features_df = imaging_features_df[1:]
imaging_features_df.columns = imaging_features_header

In [33]:
imaging_features_df.head()

Unnamed: 0,Patient ID,"F1_DT_POSTCON (T11=0.05,T12=0.5)","F1_DT_POSTCON (T11=0.05,T12=0.1)","F1_DT_POSTCON (T11=0.02,T12=0.5)","F1_DT_POSTCON (T11=0.02,T12=0.8)","F1_DT_POSTCON (T11=0.05,T12=0.8)","F1_DT_POSTCON (T11=0.1,T12=0.5)","F1_DT_POSTCON (T11=0.1,T12=0.8)","F1_DT_POSTCON (T11=0.2,T12=0.5)","F1_DT_POSTCON (T11=0.2,T12=0.8)",...,WashinRate_map_difference_entropy_tissue_PostCon,WashinRate_map_information_measure_correlation1_tissue_PostCon,WashinRate_map_information_measure_correlation2_tissue_PostCon,WashinRate_map_inverse_difference_is_homom_tissue_PostCon,WashinRate_map_inverse_difference_normalized_tissue_PostCon,WashinRate_map_inverse_difference_moment_normalized_tissue_PostCon,WashinRate_map_mean_tissue_PostCon,WashinRate_map_std_dev_tissue_PostCon,WashinRate_map_skewness_tissue_PostCon,WashinRate_map_kurtosis_tissue_PostCon
1,Breast_MRI_001,1.0,0.120721,0.530395,1.0,1.0,1.0,1.0,1.0,1.0,...,3.380663,-0.025575,0.422391,0.171959,0.960359,0.996829,14.517894,20.347506,1.62587,11.406955
2,Breast_MRI_002,1.0,0.129546,0.485217,1.0,1.0,1.0,1.0,1.0,1.0,...,3.444474,-0.036063,0.505652,0.177087,0.959067,0.996363,47.29795,83.909561,0.251498,5.659428
3,Breast_MRI_003,0.174775,0.062051,0.06991,0.132265,0.330662,0.34955,0.661324,0.6991,1.0,...,3.478455,-0.04373,0.546674,0.170507,0.957527,0.995981,114.171582,129.252343,1.928743,11.554948
4,Breast_MRI_004,0.086546,0.045111,0.034619,0.051265,0.128162,0.173093,0.256325,0.346185,0.51265,...,3.389678,-0.017802,0.363818,0.17721,0.960705,0.996827,33.499175,69.164227,1.171314,8.493319
5,Breast_MRI_005,0.289669,0.052031,0.115868,0.378575,0.839984,0.579338,1.0,0.958287,1.0,...,4.009938,-0.049294,0.603426,0.117966,0.930624,0.989135,34.406635,26.951415,0.985464,4.331451


## Load Breast Cancer MRI Filepath Filename Mapping Excel

In [41]:
bc_mri_path_file_mapping_df = pd.read_excel("/media/james/My Passport/Jetson_TX2_CMPE258/duke-breast-cancer-mri/Breast-Cancer-MRI-filepath_filename-mapping.xlsx", index_col=None, header=None)
bc_mri_path_file_mapping_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,sop_instance_UID,original_path_and_filename,classic_path,descriptive_path,,,,,,,,,,,,series_sort
1,1.3.6.1.4.1.14519.5.2.1.1805789812895034139917...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...,,,,,,,,,,,,01.dcm
2,1.3.6.1.4.1.14519.5.2.1.4903237729147735321973...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...,,,,,,,,,,,,
3,1.3.6.1.4.1.14519.5.2.1.3061160038794820079325...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...,,,,,,,,,,,,
4,1.3.6.1.4.1.14519.5.2.1.1574717199045785031549...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...,,,,,,,,,,,,


In [53]:
drop_nan_cols_list = bc_mri_path_file_mapping_df.columns[4:].values.tolist()

In [54]:
bc_mri_path_file_mapping_df = bc_mri_path_file_mapping_df.drop(drop_nan_cols_list, axis=1)

In [55]:
bc_mri_path_file_mapping_df.head()

Unnamed: 0,0,1,2,3
0,sop_instance_UID,original_path_and_filename,classic_path,descriptive_path
1,1.3.6.1.4.1.14519.5.2.1.1805789812895034139917...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...
2,1.3.6.1.4.1.14519.5.2.1.4903237729147735321973...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...
3,1.3.6.1.4.1.14519.5.2.1.3061160038794820079325...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...
4,1.3.6.1.4.1.14519.5.2.1.1574717199045785031549...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...


In [56]:
bc_mri_path_file_mapping_header = bc_mri_path_file_mapping_df.loc[0,:].values.tolist()

In [57]:
bc_mri_path_file_mapping_df = bc_mri_path_file_mapping_df[1:]
bc_mri_path_file_mapping_df.columns = bc_mri_path_file_mapping_header

In [58]:
bc_mri_path_file_mapping_df.head()

Unnamed: 0,sop_instance_UID,original_path_and_filename,classic_path,descriptive_path
1,1.3.6.1.4.1.14519.5.2.1.1805789812895034139917...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...
2,1.3.6.1.4.1.14519.5.2.1.4903237729147735321973...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...
3,1.3.6.1.4.1.14519.5.2.1.3061160038794820079325...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...
4,1.3.6.1.4.1.14519.5.2.1.1574717199045785031549...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...
5,1.3.6.1.4.1.14519.5.2.1.2594404476894572978078...,DICOM_Images/Breast_MRI_001/post_1/Breast_MRI_...,Duke-Breast-Cancer-MRI/Breast_MRI_001/1.3.6.1....,Duke-Breast-Cancer-MRI/BreastMRI001/01-01-1990...


## Load Breast Cancer Segmentation Filepath Mapping CSV

In [61]:
bc_seg_file_mapping_df = pd.read_csv("/media/james/My Passport/Jetson_TX2_CMPE258/duke-breast-cancer-mri/segmentation_filepath_mapping.csv")
bc_seg_file_mapping_df.head()

Unnamed: 0,Patient ID,Segmentation Label,Slice File,Full Descriptive Path
0,Breast_MRI_002,Fatty tissue of breast,1-125.dcm,Duke-Breast-Cancer-MRI/Breast_MRI_002/01-01-19...
1,Breast_MRI_002,Fatty tissue of breast,1-030.dcm,Duke-Breast-Cancer-MRI/Breast_MRI_002/01-01-19...
2,Breast_MRI_002,Fatty tissue of breast,1-073.dcm,Duke-Breast-Cancer-MRI/Breast_MRI_002/01-01-19...
3,Breast_MRI_002,Mammary Fibroglandular Tissue,1-116.dcm,Duke-Breast-Cancer-MRI/Breast_MRI_002/01-01-19...
4,Breast_MRI_002,Mammary Fibroglandular Tissue,1-073.dcm,Duke-Breast-Cancer-MRI/Breast_MRI_002/01-01-19...


## Load Train_Ids CSV

In [59]:
bc_train_ids_df = pd.read_csv("/media/james/My Passport/Jetson_TX2_CMPE258/duke-breast-cancer-mri/train_ids.csv")
bc_train_ids_df.head()

Unnamed: 0,0
0,Breast_MRI_002
1,Breast_MRI_008
2,Breast_MRI_018
3,Breast_MRI_020
4,Breast_MRI_047


## Load Test_Ids CSV

In [60]:
bc_test_ids_df = pd.read_csv("/media/james/My Passport/Jetson_TX2_CMPE258/duke-breast-cancer-mri/test_ids.csv")
bc_test_ids_df.head()

Unnamed: 0,0
0,Breast_MRI_014
1,Breast_MRI_026
2,Breast_MRI_187
3,Breast_MRI_197
4,Breast_MRI_231
