# Extract PyRadiomics Features Full Study
## Extraction on full feature selection
This notebook extracts pyradiomics features on the mpReview segmentations in the designated folder. Results will be saved in the "EvalData" folder located next to this notebook file.

### User Input
Specify the folder with the data (MPReview-style directory/file structure).
Specify settings file to configure the feature extractor.

In [None]:
import os
MPREVIEW_BASE_FOLDER = "D:/Dropbox (Partners HealthCare)/Repeat_studies-Cases"
SETTINGS_BASE_FILE_NAMES = ["FullStudySettings_2D",
                            "FullStudySettings_3D",
                            "FullStudySettings_noNormalization_2D",
                            "FullStudySettings_noNormalization_3D"]


### Preparation
First do some basic setup, import all libs we need, parse the mpReview directory for all available cases with segmentations and select which cases we want to process. 

In [None]:
import os
from mpReviewUtils import MpReviewParser
import radiomics
import pandas as pd

# tell pandas the display width (more ore less), since it cannot autodetect in a browser, 
# lower this if it is too high for your display (and you encounter ugly wrapping effects)
pd.set_option('display.width', 120)

mpReviewParser = MpReviewParser(MPREVIEW_BASE_FOLDER)
dfSegs = pd.DataFrame(mpReviewParser.getSegmentationRecords())
dfSelectedSegs = dfSegs[(dfSegs.measurements.notnull())]

assert dfSelectedSegs.shape[0] == 326
#print("Selected", dfSelectedSegs.shape[0])


### Extract the features
NOTE: This only works with 64bit Python. With 32bit we get memory allocation porblems

Extract features for each extractor setting in the file list. For each setting, features are extracted for several bin widths and the filenames are tagged accordingly. Changes to other parameters of the extraction (e.g. normalization, 2D/3D texture features) should be done in the settings files.

In [None]:
for settingsBaseFileName in SETTINGS_BASE_FILE_NAMES:
  settingsFile = os.path.join(os.getcwd(), "PyRadiomicsSettings", settingsBaseFileName + ".yaml")
  extractor = radiomics.featureextractor.RadiomicsFeaturesExtractor(settingsFile)
  
  #print(extractor._enabledFeatures)
  #print(list(dfSelectedSegs))
  #extractor.settings

  for binWidth in [5, 10, 15, 20, 40]:
    extractor.settings["binWidth"] = binWidth
    features = []
    print("Extracting features for (image/mask): ")
    for idx, row in dfSelectedSegs.iterrows():
      print(idx, row.labelFileName)
      featureVector = extractor.execute(row.origFileName, row.labelFileName, label = row.labelValue)
      featureVector["study"] = row.study
      featureVector["series"] = row.series
      featureVector["canonicalType"] = row.canonicalType
      featureVector["segmentedStructure"] = row.segmentedStructure
      features.append(featureVector)
    print("------------ DONE ------------")


    # postprocess and check features
    dfFeatures = pd.DataFrame(features)
    print(dfFeatures.shape)

    dfNanSel = dfFeatures[(dfFeatures.isnull().any(axis = 1))]
    print(dfNanSel.shape)
    print(dfNanSel.loc[:, dfNanSel.isnull().any()])

    # save to disk
    outputBaseDir = os.path.join(os.getcwd(), "EvalData")
    if not os.path.exists(outputBaseDir):
      os.mkdir(outputBaseDir)
    outFileName = os.path.join(outputBaseDir, "pyRadiomicsFeatures_" + settingsBaseFileName + "_bin" + str(binWidth) + ".csv")
    dfFeatures.to_csv(outFileName, sep = ";", index = False)
    print("Features saved to", outFileName)
