# <center>**RADIOMICS EXTRACTION - Adjust bin width - STAPES**<center>

(*Step 5*)

## **Radiomics Workflow:**
  
  **1. Download** DICOM images and convert to **NRRD.**

  **2.** Perform **target segmentations** and save in **NRRD.**
  
  **3.** Perform a **first Radiomic Feature** ***Dummy*** **Extraction,** to:
    
  - Detect **erros in segmentations**: only one dimension, no label 1, only one segmented voxel...
  - Analyze **bin width**.


  **4. Analyze** and **correct mask errors.**

  **5. Adjust binwith.**
   - **Tune featureextractor param file.**


  **6.** Perform **final Radiomic Feature Extraction.**

  **7. Clean Radiomic Features.**

  **8. Merge** with **labels** and **clinical data.**

  **9. Descriptive Statistics.**

  **10. Inferential Statistics.**

  **11. Machine Learning.**

## **Ajust Bin width - STAPES**

This notebook calcultes radiomic features ranges to adjust bin width in the <code>param_file</code> for the <code>featureextractor</code> class.

0.   Environment **configuration**.
1.   **Load Radiomic** ***Dummy*** **Features.**
2.   Check **extraction information**.
3.   **Select First Order Range features** (***from image - original, square, squarefoot, logarithm, exponential, gradient, log-sigma, wavelet***).
4.   Calculate **Median - P50 of First Order Feature Ranges.**

**Bin width** will be:

$$
\textit{Bin width} = \frac{P_{50}\textit{ of Ranges}}{50}
$$

- For **original, square, squarefoot, logarithm, exponential, gradient** images: the **exact value**.
- For **log-sigma and wavelet** images: the **mean of all transformations**, respectively.
- For **LBP3D**: **not to change** (1.0).

In [1]:
# Import libraries
import os
import pandas as pd

### **0. Environment configuration**

#### Set the working directory

In [2]:
# Set working directory
wd = '/Users/pablomenendezfernandez-miranda/Proyecto Otosclerosis/'
os.chdir(wd)

print(f'Directorio actual: {os.getcwd()}')

# Check directory files
print(f'Directory files: {os.listdir(wd)}')

Directorio actual: /Users/pablomenendezfernandez-miranda/Proyecto Otosclerosis
Directory files: ['Tables', 'Databases', 'Episodes', 'Figures', 'Notebooks']


#### Mount Drive (if executed on Google Colab)

In [3]:
#from google.colab import drive
#drive.mount('/content/drive')

### **1. Load Radiomic Features**

In [4]:
# Load data
df_path = 'Databases/3_1_df_radiomics_STAPES_dummy_features.csv'

df= pd.read_csv(df_path)
print(df.shape)
df.head(4)

(42, 2054)


Unnamed: 0,Image_Segmentation,diagnostics_Versions_PyRadiomics,diagnostics_Versions_Numpy,diagnostics_Versions_SimpleITK,diagnostics_Versions_PyWavelet,diagnostics_Versions_Python,diagnostics_Configuration_Settings,diagnostics_Configuration_EnabledImageTypes,diagnostics_Image-original_Hash,diagnostics_Image-original_Dimensionality,...,lbp-3D-k_gldm_LargeDependenceLowGrayLevelEmphasis,lbp-3D-k_gldm_LowGrayLevelEmphasis,lbp-3D-k_gldm_SmallDependenceEmphasis,lbp-3D-k_gldm_SmallDependenceHighGrayLevelEmphasis,lbp-3D-k_gldm_SmallDependenceLowGrayLevelEmphasis,lbp-3D-k_ngtdm_Busyness,lbp-3D-k_ngtdm_Coarseness,lbp-3D-k_ngtdm_Complexity,lbp-3D-k_ngtdm_Contrast,lbp-3D-k_ngtdm_Strength
0,EPI_0001_STAPES.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 2, 'minimumROISize': ...","{'Original': {'binWidth': 6.0}, 'Square': {'bi...",1310b8441367f403dccd0241fde06dd70966c56a,3D,...,1.519841,0.305556,0.50496,6.876984,0.181796,0.225694,0.646154,8.171429,0.338516,7.488722
1,EPI_0002_STAPES.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 2, 'minimumROISize': ...","{'Original': {'binWidth': 6.0}, 'Square': {'bi...",64fb22f54e86a46714e45ec0a26705a81f50b639,3D,...,0.900185,0.316852,0.601852,6.240741,0.217315,0.426471,0.413793,10.819444,0.417824,3.22807
2,EPI_0003_STAPES.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 2, 'minimumROISize': ...","{'Original': {'binWidth': 6.0}, 'Square': {'bi...",c37a6f6b17a1a89cd5f06d640fb8e33dc6219c6e,3D,...,4.0,1.0,0.25,0.25,0.25,0.0,1000000.0,0.0,0.0,0.0
3,EPI_0005_STAPES.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 2, 'minimumROISize': ...","{'Original': {'binWidth': 6.0}, 'Square': {'bi...",3f11e7a4b9e487bfc8299630acbc69287f745d63,3D,...,0.634722,0.334722,0.7,5.6,0.259722,0.333333,0.75,5.822222,0.196444,3.247059


In [5]:
# Check column list
print(list(df.columns), end='')

['Image_Segmentation', 'diagnostics_Versions_PyRadiomics', 'diagnostics_Versions_Numpy', 'diagnostics_Versions_SimpleITK', 'diagnostics_Versions_PyWavelet', 'diagnostics_Versions_Python', 'diagnostics_Configuration_Settings', 'diagnostics_Configuration_EnabledImageTypes', 'diagnostics_Image-original_Hash', 'diagnostics_Image-original_Dimensionality', 'diagnostics_Image-original_Spacing', 'diagnostics_Image-original_Size', 'diagnostics_Image-original_Mean', 'diagnostics_Image-original_Minimum', 'diagnostics_Image-original_Maximum', 'diagnostics_Mask-original_Hash', 'diagnostics_Mask-original_Spacing', 'diagnostics_Mask-original_Size', 'diagnostics_Mask-original_BoundingBox', 'diagnostics_Mask-original_VoxelNum', 'diagnostics_Mask-original_VolumeNum', 'diagnostics_Mask-original_CenterOfMassIndex', 'diagnostics_Mask-original_CenterOfMass', 'diagnostics_Image-interpolated_Spacing', 'diagnostics_Image-interpolated_Size', 'diagnostics_Image-interpolated_Mean', 'diagnostics_Image-interpolated

### **2. Check extraction information**

In [6]:
# Configuration list
df['diagnostics_Configuration_EnabledImageTypes'].iloc[0]

"{'Original': {'binWidth': 6.0}, 'Square': {'binWidth': 0.6}, 'SquareRoot': {'binWidth': 15.0}, 'Logarithm': {'binWidth': 15.0}, 'Exponential': {'binWidth': 0.1}, 'Gradient': {'binWidth': 4.0}, 'LoG': {'binWidth': 4.0, 'sigma': [1.0, 2.0, 3.0, 4.0, 5.0]}, 'Wavelet': {'binWidth': 2.0}, 'LBP3D': {'binWidth': 1.0}}"

### **3. Select First Order Range features**

In [7]:
# First Order Range Features columns
ranges = [c for c in df.columns if c.endswith('_firstorder_Range')]
ranges

['original_firstorder_Range',
 'square_firstorder_Range',
 'squareroot_firstorder_Range',
 'logarithm_firstorder_Range',
 'exponential_firstorder_Range',
 'gradient_firstorder_Range',
 'log-sigma-1-0-mm-3D_firstorder_Range',
 'log-sigma-2-0-mm-3D_firstorder_Range',
 'log-sigma-3-0-mm-3D_firstorder_Range',
 'log-sigma-4-0-mm-3D_firstorder_Range',
 'log-sigma-5-0-mm-3D_firstorder_Range',
 'wavelet-LLH_firstorder_Range',
 'wavelet-LHL_firstorder_Range',
 'wavelet-LHH_firstorder_Range',
 'wavelet-HLL_firstorder_Range',
 'wavelet-HLH_firstorder_Range',
 'wavelet-HHL_firstorder_Range',
 'wavelet-HHH_firstorder_Range',
 'wavelet-LLL_firstorder_Range',
 'lbp-3D-m1_firstorder_Range',
 'lbp-3D-m2_firstorder_Range',
 'lbp-3D-k_firstorder_Range']

In [8]:
# First Order Range Features
df[ranges].head(4)

Unnamed: 0,original_firstorder_Range,square_firstorder_Range,squareroot_firstorder_Range,logarithm_firstorder_Range,exponential_firstorder_Range,gradient_firstorder_Range,log-sigma-1-0-mm-3D_firstorder_Range,log-sigma-2-0-mm-3D_firstorder_Range,log-sigma-3-0-mm-3D_firstorder_Range,log-sigma-4-0-mm-3D_firstorder_Range,...,wavelet-LHL_firstorder_Range,wavelet-LHH_firstorder_Range,wavelet-HLL_firstorder_Range,wavelet-HLH_firstorder_Range,wavelet-HHL_firstorder_Range,wavelet-HHH_firstorder_Range,wavelet-LLL_firstorder_Range,lbp-3D-m1_firstorder_Range,lbp-3D-m2_firstorder_Range,lbp-3D-k_firstorder_Range
0,1694.0,537.357895,2926.087474,4477.770647,36.541693,1600.437775,586.967697,566.224533,748.454971,715.162109,...,1492.528684,1749.910852,1565.636734,888.751665,1179.813072,1149.87868,3006.424738,17.534513,13.833421,4.442462
1,1253.0,238.190627,2490.135038,4119.659126,11.432528,1374.701599,626.115555,355.480682,539.355728,543.371246,...,1495.067807,1211.872491,1168.526472,1646.952847,653.268811,1072.438896,2393.295045,10.314419,7.882235,3.107804
2,1073.0,279.664894,2175.477821,3803.409287,14.82214,1202.388733,195.604366,257.693115,460.524155,415.096397,...,119.742991,559.109042,833.024017,650.54538,1702.895491,379.483039,2005.097857,1.547163,2.628434,0.028817
3,2566.0,1488.505414,3502.272197,4603.002734,388.13708,1595.481369,693.08075,365.035095,666.056076,702.410179,...,379.111777,2372.017166,2393.782356,1770.539929,1517.379345,1843.137309,4326.883707,9.798698,8.160206,2.527292


### **4. Calculate Median - P50 of First Order Feature Ranges**

**Bin width** will be:

$$
\textit{Bin width} = \frac{P_{50}\textit{ of Ranges}}{50}
$$

- For **original, square, squarefoot, logarithm, exponential, gradient** images: the **exact value**.
- For **log-sigma and wavelet** images: the **mean of all transformations**, respectively.
- For **LBP3D**: **not to change** (1.0).

In [9]:
# Descriptive Statistics of Ranges
desc = df[ranges].describe().T
desc

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
original_firstorder_Range,42.0,1273.309524,622.948124,210.0,788.25,1250.5,1599.0,2795.0
square_firstorder_Range,42.0,480.848218,463.982453,17.25356,173.655214,322.959401,535.405856,1932.805379
squareroot_firstorder_Range,42.0,2188.579026,865.335094,283.394201,1684.239395,2423.242303,2783.019214,3646.488838
logarithm_firstorder_Range,42.0,3435.192288,1411.951031,195.130251,3376.194825,3973.720296,4197.707677,4771.281939
exponential_firstorder_Range,42.0,81.630675,170.407122,0.887406,8.35912,16.602468,40.624597,822.945674
gradient_firstorder_Range,42.0,1025.323687,429.575618,7.02063,705.15052,1175.54644,1373.971504,1600.437775
log-sigma-1-0-mm-3D_firstorder_Range,42.0,496.221621,211.261541,95.219582,358.151333,535.095844,612.350846,1090.989319
log-sigma-2-0-mm-3D_firstorder_Range,42.0,446.565187,190.774843,70.776398,296.332138,444.226059,580.928978,836.292713
log-sigma-3-0-mm-3D_firstorder_Range,42.0,501.603488,215.974287,4.142166,374.581697,555.726559,635.939293,882.786591
log-sigma-4-0-mm-3D_firstorder_Range,42.0,482.258226,202.939757,7.310234,409.482701,539.672729,617.394062,806.931351


In [10]:
# Divide P50 of ranges by 50 - the results will be the Bin widths to set in featureexctractor param_file
binW = df[ranges].agg('median')/50
pd.DataFrame(binW, columns=['P50 of Ranges / 50'])

Unnamed: 0,P50 of Ranges / 50
original_firstorder_Range,25.01
square_firstorder_Range,6.459188
squareroot_firstorder_Range,48.464846
logarithm_firstorder_Range,79.474406
exponential_firstorder_Range,0.332049
gradient_firstorder_Range,23.510929
log-sigma-1-0-mm-3D_firstorder_Range,10.701917
log-sigma-2-0-mm-3D_firstorder_Range,8.884521
log-sigma-3-0-mm-3D_firstorder_Range,11.114531
log-sigma-4-0-mm-3D_firstorder_Range,10.793455


In [11]:
# Check bin widths are now correct (P50 should be 50)
(desc.T/binW).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
original_firstorder_Range,1.679328,50.912016,24.907962,8.396641,31.517393,50.0,63.934426,111.755298
square_firstorder_Range,6.502365,74.444066,71.832938,2.671165,26.884991,50.0,82.890582,299.233491
squareroot_firstorder_Range,0.866608,45.158072,17.854902,5.847418,34.751774,50.0,57.423461,75.239873
logarithm_firstorder_Range,0.528472,43.223881,17.76611,2.455259,42.481536,50.0,52.818359,60.035453
exponential_firstorder_Range,126.487218,245.838976,513.198159,2.672514,25.174328,50.0,122.345052,2478.383536
gradient_firstorder_Range,1.786403,43.610514,18.271316,0.298611,29.992457,50.0,58.439695,68.072078
log-sigma-1-0-mm-3D_firstorder_Range,3.924531,46.367546,19.740533,8.897432,33.466092,50.0,57.2188,101.943356
log-sigma-2-0-mm-3D_firstorder_Range,4.727323,50.263281,21.472721,7.966259,33.353754,50.0,65.386639,94.129182
log-sigma-3-0-mm-3D_firstorder_Range,3.778837,45.130423,19.431705,0.37268,33.701979,50.0,57.216925,79.426345
log-sigma-4-0-mm-3D_firstorder_Range,3.891247,44.680618,18.802113,0.677284,37.938058,50.0,57.200784,74.761175
