# <center>**RADIOMICS EXTRACTION - Adjust bin width -OW**<center>

*Repetition of Step 5 for Oval Window Segmentations calculation of Bin width.*

(*Step 5*)

## **Radiomics Workflow:**
  
  **1. Download** DICOM images and convert to **NRRD.**

  **2.** Perform **target segmentations** and save in **NRRD.**
  
  **3.** Perform a **first Radiomic Feature** ***Dummy*** **Extraction,** to:
    
  - Detect **erros in segmentations**: only one dimension, no label 1, only one segmented voxel...
  - Analyze **bin width**.


  **4. Analyze** and **correct mask errors.**

  **5. Adjust binwith.**
   - **Tune featureextractor param file.**


  **6.** Perform **final Radiomic Feature Extraction.**

  **7. Clean Radiomic Features.**

  **8. Merge** with **labels** and **clinical data.**

  **9. Descriptive Statistics.**

  **10. Inferential Statistics.**

  **11. Machine Learning.**

## **Ajust Bin width - OW**

This notebook calcultes radiomic features ranges to adjust bin width in the <code>param_file</code> for the <code>featureextractor</code> class.

0.   Environment **configuration**.
1.   **Load Radiomic** ***Dummy*** **Features.**
2.   Check the **extraction information**.
3.   **Select First Order Range features** (***from image - original, square, squarefoot, logarithm, exponential, gradient, log-sigma, wavelet***).
4.   Calculate **Median - P50 of First Order Feature Ranges.**

**Bin width** will be:

$$
\textit{Bin width} = \frac{P_{50}\textit{ of Ranges}}{50}
$$

- For **original, square, squarefoot, logarithm, exponential, gradient** images: the **exact value**.
- For **log-sigma and wavelet** images: the **mean of all transformations**, respectively.
- For **LBP3D**: **not to change** (1.0).

In [1]:
# Import libraries
import os
import pandas as pd

### **0. Environment configuration**

#### Set the working directory

In [2]:
# Set working directory
wd = '/Users/pablomenendezfernandez-miranda/Proyecto Otosclerosis/'
os.chdir(wd)

print(f'Directorio actual: {os.getcwd()}')

# Check directory files
print(f'Directory files: {os.listdir(wd)}')

Directorio actual: /Users/pablomenendezfernandez-miranda/Proyecto Otosclerosis
Directory files: ['Tables', 'Databases', 'Episodes', 'Figures', 'Notebooks']


#### Mount Drive (if executed on Google Colab)

In [3]:
#from google.colab import drive
#drive.mount('/content/drive')

### **1. Load Radiomic Features**

In [4]:
# Load data
df_path = 'Databases/3_3_df_radiomics_OW_dummy_features.csv'

df= pd.read_csv(df_path)
print(df.shape)
df.head(4)

(45, 2054)


Unnamed: 0,Image_Segmentation,diagnostics_Versions_PyRadiomics,diagnostics_Versions_Numpy,diagnostics_Versions_SimpleITK,diagnostics_Versions_PyWavelet,diagnostics_Versions_Python,diagnostics_Configuration_Settings,diagnostics_Configuration_EnabledImageTypes,diagnostics_Image-original_Hash,diagnostics_Image-original_Dimensionality,...,lbp-3D-k_gldm_LargeDependenceLowGrayLevelEmphasis,lbp-3D-k_gldm_LowGrayLevelEmphasis,lbp-3D-k_gldm_SmallDependenceEmphasis,lbp-3D-k_gldm_SmallDependenceHighGrayLevelEmphasis,lbp-3D-k_gldm_SmallDependenceLowGrayLevelEmphasis,lbp-3D-k_ngtdm_Busyness,lbp-3D-k_ngtdm_Coarseness,lbp-3D-k_ngtdm_Complexity,lbp-3D-k_ngtdm_Contrast,lbp-3D-k_ngtdm_Strength
0,EPI_0002_OW.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 1, 'minimumROISize': ...","{'Original': {'binWidth': 25.01}, 'Square': {'...",64fb22f54e86a46714e45ec0a26705a81f50b639,3D,...,0.43,0.43,1.0,10.0,0.43,0.65625,0.285714,18.833333,3.37037,3.301587
1,EPI_0003_OW.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 1, 'minimumROISize': ...","{'Original': {'binWidth': 25.01}, 'Square': {'...",c37a6f6b17a1a89cd5f06d640fb8e33dc6219c6e,3D,...,4.0,1.0,0.25,0.25,0.25,0.0,1000000.0,0.0,0.0,0.0
2,EPI_0005_OW.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 1, 'minimumROISize': ...","{'Original': {'binWidth': 25.01}, 'Square': {'...",3f11e7a4b9e487bfc8299630acbc69287f745d63,3D,...,1.0,0.5,0.5,1.0,0.375,0.5,1.0,0.666667,0.148148,1.0
3,EPI_0006_OW.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 1, 'minimumROISize': ...","{'Original': {'binWidth': 25.01}, 'Square': {'...",ef0cc9a99861c73aa781a695855f83fcfe173d7c,3D,...,2.5,0.625,0.25,0.625,0.15625,0.833333,1.2,0.416667,0.104167,1.2


In [5]:
# Check column list
print(list(df.columns), end='')

['Image_Segmentation', 'diagnostics_Versions_PyRadiomics', 'diagnostics_Versions_Numpy', 'diagnostics_Versions_SimpleITK', 'diagnostics_Versions_PyWavelet', 'diagnostics_Versions_Python', 'diagnostics_Configuration_Settings', 'diagnostics_Configuration_EnabledImageTypes', 'diagnostics_Image-original_Hash', 'diagnostics_Image-original_Dimensionality', 'diagnostics_Image-original_Spacing', 'diagnostics_Image-original_Size', 'diagnostics_Image-original_Mean', 'diagnostics_Image-original_Minimum', 'diagnostics_Image-original_Maximum', 'diagnostics_Mask-original_Hash', 'diagnostics_Mask-original_Spacing', 'diagnostics_Mask-original_Size', 'diagnostics_Mask-original_BoundingBox', 'diagnostics_Mask-original_VoxelNum', 'diagnostics_Mask-original_VolumeNum', 'diagnostics_Mask-original_CenterOfMassIndex', 'diagnostics_Mask-original_CenterOfMass', 'diagnostics_Image-interpolated_Spacing', 'diagnostics_Image-interpolated_Size', 'diagnostics_Image-interpolated_Mean', 'diagnostics_Image-interpolated

### **2. Check the extraction information**

In [6]:
# Configuration list
df['diagnostics_Configuration_EnabledImageTypes'].iloc[0]

"{'Original': {'binWidth': 25.01}, 'Square': {'binWidth': 6.459188}, 'SquareRoot': {'binWidth': 48.464846}, 'Logarithm': {'binWidth': 79.474406}, 'Exponential': {'binWidth': 0.332049}, 'Gradient': {'binWidth': 23.510929}, 'LoG': {'binWidth': 9.0, 'sigma': [1.0, 2.0, 3.0, 4.0, 5.0]}, 'Wavelet': {'binWidth': 25}, 'LBP3D': {'binWidth': 1.0}}"

### **3. Select First Order Range features**

In [7]:
# First Order Range Features columns
ranges = [c for c in df.columns if c.endswith('_firstorder_Range')]
ranges

['original_firstorder_Range',
 'square_firstorder_Range',
 'squareroot_firstorder_Range',
 'logarithm_firstorder_Range',
 'exponential_firstorder_Range',
 'gradient_firstorder_Range',
 'log-sigma-1-0-mm-3D_firstorder_Range',
 'log-sigma-2-0-mm-3D_firstorder_Range',
 'log-sigma-3-0-mm-3D_firstorder_Range',
 'log-sigma-4-0-mm-3D_firstorder_Range',
 'log-sigma-5-0-mm-3D_firstorder_Range',
 'wavelet-LLH_firstorder_Range',
 'wavelet-LHL_firstorder_Range',
 'wavelet-LHH_firstorder_Range',
 'wavelet-HLL_firstorder_Range',
 'wavelet-HLH_firstorder_Range',
 'wavelet-HHL_firstorder_Range',
 'wavelet-HHH_firstorder_Range',
 'wavelet-LLL_firstorder_Range',
 'lbp-3D-m1_firstorder_Range',
 'lbp-3D-m2_firstorder_Range',
 'lbp-3D-k_firstorder_Range']

In [8]:
# First Order Range Features
df[ranges].head(4)

Unnamed: 0,original_firstorder_Range,square_firstorder_Range,squareroot_firstorder_Range,logarithm_firstorder_Range,exponential_firstorder_Range,gradient_firstorder_Range,log-sigma-1-0-mm-3D_firstorder_Range,log-sigma-2-0-mm-3D_firstorder_Range,log-sigma-3-0-mm-3D_firstorder_Range,log-sigma-4-0-mm-3D_firstorder_Range,...,wavelet-LHL_firstorder_Range,wavelet-LHH_firstorder_Range,wavelet-HLL_firstorder_Range,wavelet-HLH_firstorder_Range,wavelet-HHL_firstorder_Range,wavelet-HHH_firstorder_Range,wavelet-LLL_firstorder_Range,lbp-3D-m1_firstorder_Range,lbp-3D-m2_firstorder_Range,lbp-3D-k_firstorder_Range
0,622.0,236.646545,759.994155,496.229474,9.983231,1220.590881,626.115601,142.979095,206.318104,256.811768,...,1290.811771,168.810509,765.679457,654.643117,513.078569,1014.287497,1893.248154,3.610047,3.071986,3.107804
1,458.0,234.621931,460.529971,239.901049,11.769717,1051.31134,377.23571,23.504059,345.007812,360.553528,...,424.867676,544.929158,229.670377,114.408989,324.23916,437.043097,281.798239,1.547163,1.53594,0.078448
2,792.0,959.220418,513.063676,170.008313,353.302393,1124.984863,62.020111,137.579285,188.409424,241.671467,...,303.448985,1442.611471,1265.354188,2860.870311,919.161331,1843.137309,769.070663,3.610047,6.18289,0.500143
3,847.0,491.485009,821.377132,416.827873,32.084238,867.44458,208.775917,149.376526,372.51889,477.289688,...,416.338375,2012.471589,1853.98985,2606.586126,1163.182347,1174.596352,774.170115,8.767256,5.760433,1.906281


### **4. Calculate Median - P50 of First Order Feature Ranges**

**Bin width** will be:

$$
\textit{Bin width} = \frac{P_{50}\textit{ of Ranges}}{50}
$$

- For **original, square, squarefoot, logarithm, exponential, gradient** images: the **exact value**.
- For **log-sigma and wavelet** images: the **mean of all transformations**, respectively.
- For **LBP3D**: **not to change** (1.0).

In [9]:
# Descriptive Statistics of Ranges
desc = df[ranges].describe().T
desc

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
original_firstorder_Range,45.0,1048.133333,606.043638,72.0,595.0,915.0,1348.0,2580.0
square_firstorder_Range,45.0,685.804135,516.695827,36.768379,236.646545,542.846468,1025.446595,1992.743331
squareroot_firstorder_Range,45.0,1201.758335,840.961597,71.2714,632.785024,1006.792146,1417.104999,3427.027317
logarithm_firstorder_Range,45.0,1174.606428,1423.176953,35.965211,303.330509,517.301254,1195.613767,4561.169166
exponential_firstorder_Range,45.0,145.783525,230.434855,1.652508,12.772167,39.609205,161.814157,1121.800691
gradient_firstorder_Range,45.0,778.231266,387.409286,38.64209,492.560791,737.849304,1090.803101,1582.105164
log-sigma-1-0-mm-3D_firstorder_Range,45.0,415.160307,234.347448,5.135178,261.821564,398.655106,545.353287,1078.208984
log-sigma-2-0-mm-3D_firstorder_Range,45.0,201.474166,124.458838,2.137299,126.123936,186.373489,265.431412,600.640472
log-sigma-3-0-mm-3D_firstorder_Range,45.0,264.379471,141.801722,0.868164,180.787735,277.087093,345.007812,703.751564
log-sigma-4-0-mm-3D_firstorder_Range,45.0,321.444518,157.738757,81.201126,215.824083,317.431656,405.954125,811.537781


In [10]:
# Divide P50 of ranges by 50 - the results will be the Bin widths to set in featureexctractor param_file
binW = df[ranges].agg('median')/50
pd.DataFrame(binW, columns=['P50 of Ranges / 50'])

Unnamed: 0,P50 of Ranges / 50
original_firstorder_Range,18.3
square_firstorder_Range,10.856929
squareroot_firstorder_Range,20.135843
logarithm_firstorder_Range,10.346025
exponential_firstorder_Range,0.792184
gradient_firstorder_Range,14.756986
log-sigma-1-0-mm-3D_firstorder_Range,7.973102
log-sigma-2-0-mm-3D_firstorder_Range,3.72747
log-sigma-3-0-mm-3D_firstorder_Range,5.541742
log-sigma-4-0-mm-3D_firstorder_Range,6.348633


In [11]:
# Check bin widths are now correct (P50 should be 50)
(desc.T/binW).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
original_firstorder_Range,2.459016,57.275046,33.117139,3.934426,32.513661,50.0,73.661202,140.983607
square_firstorder_Range,4.144818,63.167413,47.591341,3.386628,21.796821,50.0,94.450886,183.545758
squareroot_firstorder_Range,2.234821,59.682544,41.76441,3.539529,31.425803,50.0,70.377237,170.195374
logarithm_firstorder_Range,4.349497,113.532146,137.557849,3.476235,29.318555,50.0,115.56262,440.861986
exponential_firstorder_Range,56.804977,184.02733,290.885482,2.086015,16.122726,50.0,204.263323,1416.085841
gradient_firstorder_Range,3.049403,52.736464,26.252602,2.618562,33.378143,50.0,73.917743,107.210589
log-sigma-1-0-mm-3D_firstorder_Range,5.643976,52.07011,29.392255,0.644063,32.838105,50.0,68.399135,135.230801
log-sigma-2-0-mm-3D_firstorder_Range,12.072532,54.051187,33.38963,0.573391,33.83634,50.0,71.209541,161.138924
log-sigma-3-0-mm-3D_firstorder_Range,8.120191,47.706926,25.587933,0.156659,32.622908,50.0,62.256204,126.99104
log-sigma-4-0-mm-3D_firstorder_Range,7.08814,50.632083,24.846097,12.790332,33.995362,50.0,63.943548,127.828741
