<a target="_blank" href="https://colab.research.google.com/github/cerr/pyCERR-Notebooks/blob/main/batch_extract_radiomics_lung_ct.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Extract radiomics features from a batch of dicom datasets

The example below demonstrates extraction of radiomics features from CT scan and tumor segmentation for a batch of DICOM datasets.

#### Dataset description

**Citation:**  Bakr, S., Gevaert, O., Echegaray, S., Ayers, K., Zhou, M., Shafiq, M., Zheng, H., Zhang, W., Leung, A., Kadoch, M., Shrager, J., Quon, A., Rubin, D., Plevritis, S., & Napel, S. (2017). Data for NSCLC Radiogenomics (Version 4) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/K9/TCIA.2017.7hs46erv <br>
https://www.cancerimagingarchive.net/collection/nsclc-radiogenomics/ 

#### Dataset download

In [None]:
lungDcmPath = '/content/lung_dicom_5pts'
dataTarPath = '/content/lung_data_5pt.gz'
settingsTarPath = '/content/settings.gz'

In [None]:
! wget -O {dataTarPath} https://mskcc.box.com/shared/static/cpngvd2kn6ff97laz5amkax7e60iz05z.gz
! tar xf {dataTarPath}
! rm {dataTarPath}
! wget -O {settingsTarPath} https://mskcc.box.com/shared/static/dg53m58l3o8txu71a7ycd25qd9sa0ild.gz
! tar xf {settingsTarPath}
! rm {settingsTarPath}

### Install pyCERR

In [None]:
%%capture
!pip install -U git+https://github.com/cerr/pyCERR/

#### Get a list of patient directories

In [None]:
from cerr import datasets
import os

all_pat_dirs = []
for d in os.scandir(lungDcmPath):
    all_pat_dirs.append(d.path)


#### Define location of settings file

In [None]:
settingsFile = '/content/settings/original_settings.json'

#### Define location of output csv file

In [None]:
csvFileName = r"/content/feats_from_CT.csv"

### Loop over dicom directories and extract features

The example datasets contain only one scan and one segmentation. Hence, `scanNum = 0` and `structNum = 0` is used in this example. In case of multiple scans and segmentations, users should find their appropriate indices as an input to `ibsi1.computeScalarFeatures`

In [None]:
import os
from cerr import plan_container as pc
from cerr.radiomics import ibsi1

featList = []
idDict = {}
writeHeader = True
for pt_dir in all_pat_dirs[:5]:
    _, id = os.path.split(pt_dir)
    print("Data dir :" + id)
    planC = pc.load_dcm_dir(pt_dir)
    scanNum = 0
    structNum = 0
    featDict = ibsi1.computeScalarFeatures(scanNum, structNum, settingsFile, planC)
    idDict['id'] = id
    featDict = {**idDict, **featDict}
    featList.append(featDict)
    ibsi1.writeFeaturesToFile(featDict, csvFileName, writeHeader)
    writeHeader = False


#### Explore features

In [None]:
import pandas as pd
df = pd.read_csv(csvFileName)  
featNames = [col for col in df.columns]
df.head()


In [None]:
!pip install seaborn

In [None]:
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
plt.subplots(figsize=(8, 8))
meanVals = df.pivot(index="Original_shape_filledVolume", columns="id", values="Original_firstOrder_mean")
sns.heatmap(meanVals, cmap='viridis')