# <center>**RADIOMICS EXTRACTION - Adjust bin width - AF**<center>

*Repetition of Step 5 for Antefenestram Segmentations calculation of Bin width.*

(*Step 5*)

## **Radiomics Workflow:**
  
  **1. Download** DICOM images and convert to **NRRD.**

  **2.** Perform **target segmentations** and save in **NRRD.**
  
  **3.** Perform a **first Radiomic Feature** ***Dummy*** **Extraction,** to:
    
  - Detect **erros in segmentations**: only one dimension, no label 1, only one segmented voxel...
  - Analyze **bin width**.


  **4. Analyze** and **correct mask errors.**

  **5. Adjust binwith.**
   - **Tune featureextractor param file.**


  **6.** Perform **final Radiomic Feature Extraction.**

  **7. Clean Radiomic Features.**

  **8. Merge** with **labels** and **clinical data.**

  **9. Descriptive Statistics.**

  **10. Inferential Statistics.**

  **11. Machine Learning.**

## **Ajust Bin width - AF**

This notebook calcultes radiomic features ranges to adjust bin width in the <code>param_file</code> for the <code>featureextractor</code> class.

0.   Environment **configuration**.
1.   **Load Radiomic** ***Dummy*** **Features.**
2.   Check **extraction information**.
3.   **Select First Order Range features** (***from image - original, square, squarefoot, logarithm, exponential, gradient, log-sigma, wavelet***).
4.   Calculate **Median - P50 of First Order Feature Ranges.**

**Bin width** will be:

$$
\textit{Bin width} = \frac{P_{50}\textit{ of Ranges}}{50}
$$

- For **original, square, squarefoot, logarithm, exponential, gradient** images: the **exact value**.
- For **log-sigma and wavelet** images: the **mean of all transformations**, respectively.
- For **LBP3D**: **not to change** (1.0).

In [1]:
# Import libraries
import os
import pandas as pd

### **0. Environment configuration**

#### Set the working directory

In [2]:
# Set working directory
wd = '/Users/pablomenendezfernandez-miranda/Proyecto Otosclerosis/'
os.chdir(wd)

print(f'Directorio actual: {os.getcwd()}')

# Check directory files
print(f'Directory files: {os.listdir(wd)}')

Directorio actual: /Users/pablomenendezfernandez-miranda/Proyecto Otosclerosis
Directory files: ['Tables', 'Databases', 'Episodes', 'Figures', 'Notebooks']


#### Mount Drive (if executed on Google Colab)

In [3]:
#from google.colab import drive
#drive.mount('/content/drive')

### **1. Load Radiomic Features**

In [4]:
# Load data
df_path = 'Databases/3_2_df_radiomics_AF_dummy_features.csv'

df= pd.read_csv(df_path)
print(df.shape)
df.head(4)

(52, 2054)


Unnamed: 0,Image_Segmentation,diagnostics_Versions_PyRadiomics,diagnostics_Versions_Numpy,diagnostics_Versions_SimpleITK,diagnostics_Versions_PyWavelet,diagnostics_Versions_Python,diagnostics_Configuration_Settings,diagnostics_Configuration_EnabledImageTypes,diagnostics_Image-original_Hash,diagnostics_Image-original_Dimensionality,...,lbp-3D-k_gldm_LargeDependenceLowGrayLevelEmphasis,lbp-3D-k_gldm_LowGrayLevelEmphasis,lbp-3D-k_gldm_SmallDependenceEmphasis,lbp-3D-k_gldm_SmallDependenceHighGrayLevelEmphasis,lbp-3D-k_gldm_SmallDependenceLowGrayLevelEmphasis,lbp-3D-k_ngtdm_Busyness,lbp-3D-k_ngtdm_Coarseness,lbp-3D-k_ngtdm_Complexity,lbp-3D-k_ngtdm_Contrast,lbp-3D-k_ngtdm_Strength
0,EPI_0001_AF.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 1, 'minimumROISize': ...","{'Original': {'binWidth': 25.01}, 'Square': {'...",1310b8441367f403dccd0241fde06dd70966c56a,3D,...,15.010085,0.606239,0.192961,1.769625,0.078111,1.51536,0.164977,5.904967,0.093878,1.871021
1,EPI_0002_AF.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 1, 'minimumROISize': ...","{'Original': {'binWidth': 25.01}, 'Square': {'...",64fb22f54e86a46714e45ec0a26705a81f50b639,3D,...,11.243687,0.380051,0.104544,0.546965,0.044444,0.508064,0.349207,1.050969,0.023217,0.730429
2,EPI_0003_AF.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 1, 'minimumROISize': ...","{'Original': {'binWidth': 25.01}, 'Square': {'...",c37a6f6b17a1a89cd5f06d640fb8e33dc6219c6e,3D,...,17.345118,0.491582,0.076806,0.243797,0.051893,1.118533,0.216933,1.293046,0.042011,0.488941
3,EPI_0004_AF.nrrd,3.1.0a2.post14+gaab3c6f,1.26.4,2.4.0,1.7.0,3.12.4,"{'minimumROIDimensions': 1, 'minimumROISize': ...","{'Original': {'binWidth': 25.01}, 'Square': {'...",5dce836924eb76ab17bb61896784da9f4cd31cbd,3D,...,12.31893,0.467078,0.103222,0.495049,0.046418,1.046149,0.195522,1.568187,0.04057,0.485882


In [5]:
# Check column list
print(list(df.columns), end='')

['Image_Segmentation', 'diagnostics_Versions_PyRadiomics', 'diagnostics_Versions_Numpy', 'diagnostics_Versions_SimpleITK', 'diagnostics_Versions_PyWavelet', 'diagnostics_Versions_Python', 'diagnostics_Configuration_Settings', 'diagnostics_Configuration_EnabledImageTypes', 'diagnostics_Image-original_Hash', 'diagnostics_Image-original_Dimensionality', 'diagnostics_Image-original_Spacing', 'diagnostics_Image-original_Size', 'diagnostics_Image-original_Mean', 'diagnostics_Image-original_Minimum', 'diagnostics_Image-original_Maximum', 'diagnostics_Mask-original_Hash', 'diagnostics_Mask-original_Spacing', 'diagnostics_Mask-original_Size', 'diagnostics_Mask-original_BoundingBox', 'diagnostics_Mask-original_VoxelNum', 'diagnostics_Mask-original_VolumeNum', 'diagnostics_Mask-original_CenterOfMassIndex', 'diagnostics_Mask-original_CenterOfMass', 'diagnostics_Image-interpolated_Spacing', 'diagnostics_Image-interpolated_Size', 'diagnostics_Image-interpolated_Mean', 'diagnostics_Image-interpolated

### **2. Check extraction information**

In [6]:
# Configuration list
df['diagnostics_Configuration_EnabledImageTypes'].iloc[0]

"{'Original': {'binWidth': 25.01}, 'Square': {'binWidth': 6.459188}, 'SquareRoot': {'binWidth': 48.464846}, 'Logarithm': {'binWidth': 79.474406}, 'Exponential': {'binWidth': 0.332049}, 'Gradient': {'binWidth': 23.510929}, 'LoG': {'binWidth': 9.0, 'sigma': [1.0, 2.0, 3.0, 4.0, 5.0]}, 'Wavelet': {'binWidth': 25}, 'LBP3D': {'binWidth': 1.0}}"

### **3. Select First Order Range features**

In [7]:
# First Order Range Features columns
ranges = [c for c in df.columns if c.endswith('_firstorder_Range')]
ranges

['original_firstorder_Range',
 'square_firstorder_Range',
 'squareroot_firstorder_Range',
 'logarithm_firstorder_Range',
 'exponential_firstorder_Range',
 'gradient_firstorder_Range',
 'log-sigma-1-0-mm-3D_firstorder_Range',
 'log-sigma-2-0-mm-3D_firstorder_Range',
 'log-sigma-3-0-mm-3D_firstorder_Range',
 'log-sigma-4-0-mm-3D_firstorder_Range',
 'log-sigma-5-0-mm-3D_firstorder_Range',
 'wavelet-LLH_firstorder_Range',
 'wavelet-LHL_firstorder_Range',
 'wavelet-LHH_firstorder_Range',
 'wavelet-HLL_firstorder_Range',
 'wavelet-HLH_firstorder_Range',
 'wavelet-HHL_firstorder_Range',
 'wavelet-HHH_firstorder_Range',
 'wavelet-LLL_firstorder_Range',
 'lbp-3D-m1_firstorder_Range',
 'lbp-3D-m2_firstorder_Range',
 'lbp-3D-k_firstorder_Range']

In [8]:
# First Order Range Features
df[ranges].head(4)

Unnamed: 0,original_firstorder_Range,square_firstorder_Range,squareroot_firstorder_Range,logarithm_firstorder_Range,exponential_firstorder_Range,gradient_firstorder_Range,log-sigma-1-0-mm-3D_firstorder_Range,log-sigma-2-0-mm-3D_firstorder_Range,log-sigma-3-0-mm-3D_firstorder_Range,log-sigma-4-0-mm-3D_firstorder_Range,...,wavelet-LHL_firstorder_Range,wavelet-LHH_firstorder_Range,wavelet-HLL_firstorder_Range,wavelet-HLH_firstorder_Range,wavelet-HHL_firstorder_Range,wavelet-HHH_firstorder_Range,wavelet-LLL_firstorder_Range,lbp-3D-m1_firstorder_Range,lbp-3D-m2_firstorder_Range,lbp-3D-k_firstorder_Range
0,1586.0,1884.120301,1065.539644,371.971466,1125.627784,1605.314087,1160.950302,918.002716,560.796249,690.438591,...,3875.810947,1928.86553,3714.331963,1464.284697,2048.290879,1367.904222,3966.017723,13.408745,11.870833,3.658574
1,1318.0,1473.982526,908.071877,326.103375,609.013585,877.574158,882.340027,563.249237,608.43811,787.693817,...,2249.994784,1709.792433,3239.904589,1446.439711,1905.118967,1889.774459,2754.671531,14.955908,12.101362,1.460743
2,1499.0,1779.295827,1009.88332,357.973924,1112.248498,1610.102631,1187.166992,610.630814,720.057877,881.987885,...,3392.110041,1894.437295,4166.778252,2114.79926,2749.092785,2553.066215,3436.368512,15.471629,14.551673,1.701119
3,1677.0,1996.698018,1144.125087,416.905004,1588.227659,1494.398682,1071.759949,808.585266,696.122131,813.793945,...,2861.499077,1515.522024,4179.093701,2896.524541,2729.5157,1978.459236,3864.878152,10.314419,9.452155,1.749189


### **4. Calculate Median - P50 of First Order Feature Ranges**

**Bin width** will be:

$$
\textit{Bin width} = \frac{P_{50}\textit{ of Ranges}}{50}
$$

- For **original, square, squarefoot, logarithm, exponential, gradient** images: the **exact value**.
- For **log-sigma and wavelet** images: the **mean of all transformations**, respectively.
- For **LBP3D**: **not to change** (1.0).

In [9]:
# Descriptive Statistics of Ranges
desc = df[ranges].describe().T
desc

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
original_firstorder_Range,52.0,1828.769231,418.959431,602.0,1605.0,1796.5,2109.25,2824.0
square_firstorder_Range,52.0,1949.955052,327.263832,996.834959,1804.721109,1961.709484,2153.798562,2548.146186
squareroot_firstorder_Range,52.0,1473.748846,665.586134,331.641393,1085.424257,1332.53069,1714.491501,3643.520652
logarithm_firstorder_Range,52.0,897.526336,1137.502797,93.217734,382.473707,525.746831,740.136556,4728.066962
exponential_firstorder_Range,52.0,1361.005188,611.957621,446.321918,882.723903,1193.338505,1666.40315,2655.926057
gradient_firstorder_Range,52.0,1475.915869,290.538608,734.580994,1279.502724,1491.564983,1641.684177,2050.35582
log-sigma-1-0-mm-3D_firstorder_Range,52.0,1095.001927,202.336823,572.924104,972.507788,1085.021881,1199.006506,1606.463623
log-sigma-2-0-mm-3D_firstorder_Range,52.0,766.902522,177.734519,480.323212,618.362698,753.037125,897.396973,1132.879333
log-sigma-3-0-mm-3D_firstorder_Range,52.0,617.201752,111.657067,402.66449,530.346066,595.838196,696.828228,835.718811
log-sigma-4-0-mm-3D_firstorder_Range,52.0,769.687733,105.769251,541.801453,699.59815,781.746964,826.182072,996.307709


In [10]:
# Divide P50 of ranges by 50 - the results will be the Bin widths to set in featureexctractor param_file
binW = df[ranges].agg('median')/50
pd.DataFrame(binW, columns=['P50 of Ranges / 50'])

Unnamed: 0,P50 of Ranges / 50
original_firstorder_Range,35.93
square_firstorder_Range,39.23419
squareroot_firstorder_Range,26.650614
logarithm_firstorder_Range,10.514937
exponential_firstorder_Range,23.86677
gradient_firstorder_Range,29.8313
log-sigma-1-0-mm-3D_firstorder_Range,21.700438
log-sigma-2-0-mm-3D_firstorder_Range,15.060742
log-sigma-3-0-mm-3D_firstorder_Range,11.916764
log-sigma-4-0-mm-3D_firstorder_Range,15.634939


In [11]:
# Check bin widths are now correct (P50 should be 50)
(desc.T/binW).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
original_firstorder_Range,1.447259,50.898114,11.660435,16.754801,44.670192,50.0,58.704425,78.597272
square_firstorder_Range,1.325375,49.700403,8.341292,25.407303,45.998684,50.0,54.895961,64.947083
squareroot_firstorder_Range,1.951175,55.298871,24.974514,12.444043,40.727927,50.0,64.332158,136.714324
logarithm_firstorder_Range,4.945346,85.357275,108.17971,8.865268,36.374324,50.0,70.389065,449.652445
exponential_firstorder_Range,2.178762,57.02511,25.640571,18.700558,36.985478,50.0,69.821058,111.281336
gradient_firstorder_Range,1.743136,49.475413,9.739388,24.624505,42.891283,50.0,55.032271,68.731696
log-sigma-1-0-mm-3D_firstorder_Range,2.396265,50.459901,9.32409,26.4015,44.815123,50.0,55.252642,74.029089
log-sigma-2-0-mm-3D_firstorder_Range,3.452685,50.920632,11.801179,31.892399,41.057916,50.0,59.585175,75.220683
log-sigma-3-0-mm-3D_firstorder_Range,4.363601,51.792731,9.369747,33.789751,44.504202,50.0,58.474619,70.129678
log-sigma-4-0-mm-3D_firstorder_Range,3.325884,49.2287,6.764929,34.65325,44.745818,50.0,52.842039,63.723158
