# <center>**RADIOMICS EXTRACTION - Merge with labels and clinical data**</center>

(*Step 8*)

## **Radiomics Workflow:**
  
  **1. Download** DICOM images and convert to **NRRD.**

  **2.** Perform **target segmentations** and save in **NRRD.**
  
  **3.** Perform a **first Radiomic Features** ***Dummy*** **Extraction,** to:
    
  - Detect **erros in segmentations**: only one dimension, no label 1, only one segmented voxel...
  - Analyze **binwith**.


  **4. Analyze** and **correct mask errors.**

  **5. Adjust binwith.**
   - **Tune featureextractor param file.**


  **6.** Perform **final Radiomic Feature Extraction.**

  **7. Clean Radiomic Features.**

  **8. Merge** with **labels** and **clinical data.**

  **9. Descriptive Statistics.**

  **10. Inferential Statistics.**

  **11. Machine Learning.**

## **MERGE CLEANED RADIOMIC FEATURES WITH LABELS AND CLINICAL DATA**

0. Environment **configuration**
1. **Load** Cleaned Radiomic Features, Cleaned Clinical data, and Labels.

2. **Merge** Cleaned Radiomic Features, Cleaned Clinical data, and Labels.
  * **Check** Datatypes & Merge.

3. **Save** merged and cleaned dataframe.

In [1]:
import os
import pandas as pd

### **0. Environment configuration**

#### Set the working directory

In [2]:
# Set working directory
wd = '/Users/pablomenendezfernandez-miranda/Proyecto Otosclerosis/'
os.chdir(wd)

print(f'Directorio actual: {os.getcwd()}')

# Check directory files
print(f'Directory files: {os.listdir(wd)}')

Directorio actual: /Users/pablomenendezfernandez-miranda/Proyecto Otosclerosis
Directory files: ['Tables', 'Databases', '.DS_Store', 'Episodes', 'Figures', 'Notebooks']


#### Mount Drive (if executed on Google Colab)

In [3]:
#from google.colab import drive
#drive.mount('/content/drive')

### **1. Load Cleaned Radiomic Features, Cleaned Clinical data, and Labels**

In [4]:
# Load pandas dataframes
clinical_data_dir=      'Databases/2_df_clinical_data_cleaned.feather'
df_STAPES_features_dir= 'Databases/5_1_df_radiomics_STAPES_tuned_final_features_cleaned.feather'
df_AF_features_dir=     'Databases/5_2_df_radiomics_AF_tuned_final_features_cleaned.feather'
df_OW_features_dir=     'Databases/5_3_df_radiomics_OW_tuned_final_features_cleaned.feather'

df_clinical=        pd.read_feather(clinical_data_dir)
df_STAPES_features= pd.read_feather(df_STAPES_features_dir)
df_AF_features=     pd.read_feather(df_AF_features_dir)
df_OW_features=     pd.read_feather(df_OW_features_dir)

print(f'df_clinical shape:        {df_clinical.shape}')
print(f'df_STAPES_features shape: {df_STAPES_features.shape}')
print(f'df_AF_features shape:     {df_AF_features.shape}')
print(f'df_OW_features shape:     {df_OW_features.shape}\n')

# Check dataframes
df_clinical.head(4)

df_clinical shape:        (127, 29)
df_STAPES_features shape: (89, 457)
df_AF_features shape:     (99, 479)
df_OW_features shape:     (83, 386)



Unnamed: 0,EPI_CODE,Sex,Pathological_Ear,Hearing_impairment_Pathological_Ear,Sensorineural_impairment_dB_250Hz,Sensorineural_impairment_dB_500Hz,Sensorineural_impairment_dB_1000Hz,Sensorineural_impairment_dB_2000Hz,Sensorineural_impairment_dB_3000Hz,Sensorineural_impairment_dB_4000Hz,...,Surgical_Treatment,Post_surgical_Vertigo,One_Week_Post_surgical_Tonal_Audiometry_Pathological_Ear,Hearing_impairment_One_Week_Post_surgical,One_Month_Post_surgical_Tonal_Audiometry_Pathological_Ear,Hearing_impairment_One_Month_Post_surgical,Otosclerosis_Contralateral_Ear,Label,age,Label_extended
0,EPI_0001,Hombre,Oído izquierdo,Moderado (40-70 dB),25.0,35.0,35.0,45.0,35.0,15.0,...,Estapedotomía,No,Hipoacusia mixta,Moderado (40-70 dB),Hipoacusia neurossensorial,Leve (20-40 dB),No,Otosclerosis,39.0,NoTC_Otosclerosis
1,EPI_0002,Mujer,Oído izquierdo,Moderado (40-70 dB),0.0,10.0,20.0,30.0,10.0,5.0,...,Estapedotomía,No,Hipoacusia mixta,Leve (20-40 dB),Hipoacusia mixta,Leve (20-40 dB),"Sí, confirmación quirúrgica con TC+",Otosclerosis,45.0,Otosclerosis
2,EPI_0003,Mujer,Oído derecho,Moderado (40-70 dB),35.0,35.0,25.0,60.0,55.0,50.0,...,Estapedotomía,No,Hipoacusia mixta,Moderado (40-70 dB),Hipoacusia mixta,Moderado (40-70 dB),"Sí, confirmación quirúrgica con TC+",Otosclerosis,65.0,Otosclerosis
3,EPI_0004,Hombre,Oído derecho,Moderado (40-70 dB),0.0,5.0,20.0,30.0,20.0,20.0,...,Estapedotomía,No,,,Hipoacusia neurossensorial,Leve (20-40 dB),"Sí, sospecha en TC",Otosclerosis,44.0,Otosclerosis


In [5]:
print(df_STAPES_features.shape)
df_STAPES_features.head(4)

(89, 457)


Unnamed: 0,EPI_CODE,R_STAPES_original_shape_Elongation,R_STAPES_original_shape_LeastAxisLength,R_STAPES_original_shape_MajorAxisLength,R_STAPES_original_shape_Maximum2DDiameterColumn,R_STAPES_original_shape_Maximum2DDiameterRow,R_STAPES_original_shape_Maximum2DDiameterSlice,R_STAPES_original_shape_Maximum3DDiameter,R_STAPES_original_shape_MeshVolume,R_STAPES_original_shape_SurfaceVolumeRatio,...,R_STAPES_lbp-3D-k_glszm_ZoneVariance,R_STAPES_lbp-3D-k_gldm_DependenceEntropy,R_STAPES_lbp-3D-k_gldm_DependenceNonUniformity,R_STAPES_lbp-3D-k_gldm_DependenceNonUniformityNormalized,R_STAPES_lbp-3D-k_gldm_GrayLevelNonUniformity,R_STAPES_lbp-3D-k_ngtdm_Busyness,R_STAPES_lbp-3D-k_ngtdm_Coarseness,R_STAPES_lbp-3D-k_ngtdm_Complexity,R_STAPES_lbp-3D-k_ngtdm_Contrast,R_STAPES_lbp-3D-k_ngtdm_Strength
0,EPI_0001,0.88339,1.23937,3.716135,3.162278,3.162278,3.162278,3.741657,3.416667,5.413905,...,1.6875,2.521641,2.142857,0.306122,2.714286,0.225694,0.646154,8.171429,0.338516,7.488722
1,EPI_0002,0.790084,1.23753,3.530601,3.162278,3.162278,3.162278,3.316625,2.166667,7.170049,...,0.75,2.251629,2.333333,0.388889,2.0,0.426471,0.413793,10.819444,0.417824,3.22807
2,EPI_0003,8.231927e-09,4.440892e-16,3.464102,1.414214,1.414214,1.414214,2.44949,0.333333,10.392305,...,0.0,-3.203427e-16,2.0,1.0,2.0,0.0,1000000.0,0.0,0.0,0.0
3,EPI_0004,0.0,0.0,4.0,3.0,1.0,3.0,3.0,0.333333,10.392305,...,0.0,-3.203427e-16,2.0,1.0,2.0,0.0,1000000.0,0.0,0.0,0.0


In [6]:
print(df_AF_features.shape)
df_AF_features.head(4)

(99, 479)


Unnamed: 0,EPI_CODE,R_AF_original_shape_LeastAxisLength,R_AF_original_shape_MajorAxisLength,R_AF_original_shape_Maximum2DDiameterColumn,R_AF_original_shape_Maximum2DDiameterRow,R_AF_original_shape_Maximum2DDiameterSlice,R_AF_original_shape_Maximum3DDiameter,R_AF_original_shape_MeshVolume,R_AF_original_shape_MinorAxisLength,R_AF_original_shape_SurfaceVolumeRatio,...,R_AF_lbp-3D-k_glszm_SmallAreaLowGrayLevelEmphasis,R_AF_lbp-3D-k_glszm_ZoneEntropy,R_AF_lbp-3D-k_glszm_ZoneVariance,R_AF_lbp-3D-k_gldm_DependenceEntropy,R_AF_lbp-3D-k_gldm_DependenceNonUniformity,R_AF_lbp-3D-k_gldm_DependenceVariance,R_AF_lbp-3D-k_gldm_LargeDependenceLowGrayLevelEmphasis,R_AF_lbp-3D-k_gldm_SmallDependenceHighGrayLevelEmphasis,R_AF_lbp-3D-k_ngtdm_Busyness,R_AF_lbp-3D-k_ngtdm_Strength
0,EPI_0001,3.023389,4.257578,5.09902,4.123106,5.09902,5.196152,21.541667,3.277801,2.199027,...,0.211944,2.251629,22.555556,3.609496,4.384615,5.213018,15.010085,1.769625,1.51536,1.871021
1,EPI_0002,2.815211,4.59063,4.472136,3.605551,4.472136,4.898979,17.375,3.390659,2.823894,...,0.058159,1.584963,48.222222,3.265583,2.727273,8.652893,11.243687,0.546965,0.508064,0.730429
2,EPI_0003,3.140533,5.315134,5.385165,4.472136,5.385165,5.477226,28.458333,3.458037,2.079975,...,0.259601,2.0,58.1875,3.801377,4.515152,7.415978,17.345118,0.243797,1.118533,0.488941
3,EPI_0004,3.097962,5.282717,5.385165,4.123106,5.09902,5.477226,21.875,3.446484,2.496766,...,0.065748,2.0,44.1875,3.661933,3.222222,7.654321,12.31893,0.495049,1.046149,0.485882


In [7]:
print(df_OW_features.shape)
df_OW_features.head(4)

(83, 386)


Unnamed: 0,EPI_CODE,R_OW_original_shape_Elongation,R_OW_original_shape_LeastAxisLength,R_OW_original_shape_MajorAxisLength,R_OW_original_shape_Maximum2DDiameterColumn,R_OW_original_shape_Maximum2DDiameterRow,R_OW_original_shape_Maximum2DDiameterSlice,R_OW_original_shape_MeshVolume,R_OW_original_shape_SurfaceVolumeRatio,R_OW_original_firstorder_10Percentile,...,R_OW_lbp-3D-k_glszm_SmallAreaEmphasis,R_OW_lbp-3D-k_glszm_SmallAreaLowGrayLevelEmphasis,R_OW_lbp-3D-k_glszm_ZoneVariance,R_OW_lbp-3D-k_gldm_DependenceEntropy,R_OW_lbp-3D-k_gldm_DependenceNonUniformity,R_OW_lbp-3D-k_gldm_DependenceNonUniformityNormalized,R_OW_lbp-3D-k_gldm_LargeDependenceEmphasis,R_OW_lbp-3D-k_gldm_LargeDependenceLowGrayLevelEmphasis,R_OW_lbp-3D-k_ngtdm_Busyness,R_OW_lbp-3D-k_ngtdm_Strength
0,EPI_0002,0.227735,0.0,3.677089,1.0,2.0,3.162278,0.833333,7.551034,221.8,...,1.0,0.43,0.0,1.584963,3.0,1.0,1.0,0.43,0.65625,3.301587
1,EPI_0003,0.0,0.0,2.828427,1.0,2.236068,1.0,0.666667,6.495191,442.8,...,0.25,0.25,0.0,-3.203427e-16,2.0,1.0,4.0,4.0,0.0,0.0
2,EPI_0005,0.57735,0.0,2.309401,2.0,2.236068,2.0,1.416667,5.147667,1216.2,...,0.625,0.53125,0.25,0.9182958,1.666667,0.555556,3.0,1.0,0.5,1.0
3,EPI_0006,0.408248,0.0,3.464102,2.0,3.162278,3.0,1.916667,5.280499,489.8,...,0.25,0.15625,0.0,1.0,4.0,1.0,4.0,2.5,0.833333,1.2


### **2. Merge Cleaned Radiomic Features, Cleaned Clinical data, and Labels**

In [8]:
# Merge all dataframes

# Merge Radiomic features
df_all_radiomic_features = df_STAPES_features.merge(df_AF_features, on='EPI_CODE', how='outer') \
                                             .merge(df_OW_features, on='EPI_CODE', how='outer')

# Merge Radiomic features with Clinical data
df_all_features = df_clinical.merge(df_all_radiomic_features, on='EPI_CODE', how='left')


print(f'df_STAPES_features shape:         {df_STAPES_features.shape}')
print(f'df_AF_features shape:             {df_AF_features.shape}')
print(f'df_OW_features shape:             {df_OW_features.shape}')
print(f'df_all_radiomic_features shape:   {df_all_radiomic_features.shape}')
print(f'\ndf_all_features shape:            {df_all_features.shape}\n')

df_all_features.head(4)

df_STAPES_features shape:         (89, 457)
df_AF_features shape:             (99, 479)
df_OW_features shape:             (83, 386)
df_all_radiomic_features shape:   (99, 1320)

df_all_features shape:            (127, 1348)



Unnamed: 0,EPI_CODE,Sex,Pathological_Ear,Hearing_impairment_Pathological_Ear,Sensorineural_impairment_dB_250Hz,Sensorineural_impairment_dB_500Hz,Sensorineural_impairment_dB_1000Hz,Sensorineural_impairment_dB_2000Hz,Sensorineural_impairment_dB_3000Hz,Sensorineural_impairment_dB_4000Hz,...,R_OW_lbp-3D-k_glszm_SmallAreaEmphasis,R_OW_lbp-3D-k_glszm_SmallAreaLowGrayLevelEmphasis,R_OW_lbp-3D-k_glszm_ZoneVariance,R_OW_lbp-3D-k_gldm_DependenceEntropy,R_OW_lbp-3D-k_gldm_DependenceNonUniformity,R_OW_lbp-3D-k_gldm_DependenceNonUniformityNormalized,R_OW_lbp-3D-k_gldm_LargeDependenceEmphasis,R_OW_lbp-3D-k_gldm_LargeDependenceLowGrayLevelEmphasis,R_OW_lbp-3D-k_ngtdm_Busyness,R_OW_lbp-3D-k_ngtdm_Strength
0,EPI_0001,Hombre,Oído izquierdo,Moderado (40-70 dB),25.0,35.0,35.0,45.0,35.0,15.0,...,,,,,,,,,,
1,EPI_0002,Mujer,Oído izquierdo,Moderado (40-70 dB),0.0,10.0,20.0,30.0,10.0,5.0,...,1.0,0.43,0.0,1.584963,3.0,1.0,1.0,0.43,0.65625,3.301587
2,EPI_0003,Mujer,Oído derecho,Moderado (40-70 dB),35.0,35.0,25.0,60.0,55.0,50.0,...,0.25,0.25,0.0,-3.203427e-16,2.0,1.0,4.0,4.0,0.0,0.0
3,EPI_0004,Hombre,Oído derecho,Moderado (40-70 dB),0.0,5.0,20.0,30.0,20.0,20.0,...,,,,,,,,,,


#### **3. Check Datatypes & Merge**


In [9]:
# Check Datatypes & Merge
dtypes_count= df_all_features.dtypes.value_counts()

objects=    dtypes_count[dtypes_count.index == 'object'].sum()
floats64=   dtypes_count[dtypes_count.index == 'float64'].sum()
categories= dtypes_count[dtypes_count.index == 'category'].sum()

print(f"Merged_df:")
print(f"object:   {objects}")
print(f"float64:  {floats64}")
print(f"category: {categories}")
print(f"--------------")
print(f"Total:   {objects + floats64 + categories}")

# Check number of original variables
removed= 3 # EPI_CODE was four times (in each df) - now remain in one

print('')
print(f'Radiomic STAPES Features: {df_STAPES_features.shape[1]}')
print(f'Radiomic AF Features:     {df_AF_features.shape[1]}')
print(f'Radiomic OW Features:     {df_OW_features.shape[1]}')
print(f'Clinical Features:        {df_clinical.shape[1]}')
print(f'Removed variables:        {removed}')
print(f"-----------------------------")
print(f"Total:                   {df_STAPES_features.shape[1] + df_AF_features.shape[1] + df_OW_features.shape[1] + df_clinical.shape[1] - removed}")

Merged_df:
object:   1
float64:  1332
category: 15
--------------
Total:   1348

Radiomic STAPES Features: 457
Radiomic AF Features:     479
Radiomic OW Features:     386
Clinical Features:        29
Removed variables:        3
-----------------------------
Total:                   1348


### **Save merged and cleaned dataframe.**

In [10]:
# Save data
save_path= 'Databases/'
df_all_features.to_excel(save_path + ' 6_df_radiomics_&_clinical.xlsx')
df_all_features.to_csv(save_path + ' 6_df_radiomics_&_clinical.csv')
df_all_features.to_feather(save_path + ' 6_df_radiomics_&_clinical.feather')