# MR-Longitudinal Radiomics
### Radiomics pipeline created for longitudinal images collected at subsequent fractions of treatment.
##### Full model: Feature Extraction, Feature Reduction via volume correlation & test-retest stability, Feature Selection via Euclidean distance between feature pair trajectories and hierachical clustering.
##### Compares the results of the longitudinal model with a standard delta-radiomics approach to illustrate the importance of accounting for the full feature trajectory over treatment.

###
#### Below specify certain variables and options for customising the notebook

In [43]:
import os
import pandas as pd
from tqdm import tqdm

# Specify the output path
# specify the tag to use - could be anything, helps to identify the output if running multiple models
# default is "Test"
tag = "HM-FS-Final-RS"
# output_path = "C:/Users/b01297ar/Documents/ProstateMRL-local/ProstateMRL-Radiomics/ReleaseCode/Output/" + tag + "/"
cwd = os.getcwd()
output_path = cwd + "/Output/" 
if os.path.exists(output_path) == False:
        os.mkdir(output_path)
if os.path.exists(output_path + tag) == False:
    os.mkdir(output_path + tag)
    os.mkdir(output_path + tag + "/Features/")
    os.mkdir(output_path + tag + "/Extraction/")
    os.mkdir(output_path + tag + "/Plots/")
output_path = cwd + "/Output/" + tag + "/"

# specify if you want to compare to a delta model
# default is False
Delta_Model = True

# Specify if you want to visualise the results in plots
# default is False, can specify at given stages below if you want to visualise
plot = True

# Specify if you want to extract Features
# default is False, option to do so is below
# If features are already extracted, set to false and provide the path to the extracted features below
extract = False


## Feature Extraction
#### If you want to extract features, provide a csv containing the following:
####               | PatID | Fraction | Image file | Mask Name | Mask file | 
#### Specify the root of the csv in the Input dir.
#### Calculates features based on the parameter file specified. Default setting is currently set at PyRadiomics base extraction parameters - Fixed bin size (FBS) of 25, no resampling, no normalisation, 107 features (IBSI compliant) and no wavelet/laplacian filters applied. 
#### Features are then calculated and then saved in a new folder in the Output dir - with files in parquet format. Columns will be:
#### PatID | Fraction | Mask | Feature | Feature Value |

In [44]:
if extract == True:
    from Functions import Extraction as FE
    key_extraction = cwd + "/Input/Default/PepKey_All.csv"
    key_extraction = pd.read_csv(key_extraction)

    extraction_path = output_path + "/Extraction/"
    if not os.path.exists(extraction_path):
        os.makedirs(extraction_path)

    params_extractor = cwd + "/Input/Default/Default_ExtractionParams.yaml"


    # Loop over all patients
    print("Extracting features for patients...")
    for pat in key_extraction["PatID"].unique():
        print("Processing patient: " + str(pat) + "...")

        # Get the patient's key
        key_pat = key_extraction[key_extraction["PatID"] == pat]
        Features_pat = pd.DataFrame()
        # loop over all rows
        if os.path.exists(output_path + "/Extraction/" + str(pat) + "_" + tag + ".csv"):
            print(" ")
        else:
            for i, row in key_pat.iterrows():

                PatID = row["PatID"]
                Fraction = row["Fraction"]
                Mask = row["Contour"]
                ContourType = row["ContourType"]
                ImagePath = row["ImagePath"]
                MaskPath = row["MaskPath"]
                
                # Extract features
                Features = FE.ExtractFeatures(PatID, Fraction, Mask, ContourType, ImagePath, MaskPath, params_extractor)

                Features_pat = Features_pat.append(Features)
            
                
            Features_pat.to_csv(output_path + "/Extraction/" + str(pat) + "_" + tag + ".csv", index=False)


#### Once all features have been extracted, combine in to one dataframe
#### Or
#### Specify the path of the feature values. 
##### Default is to read in a parquet file (smaller file sizes - so quicker), make sure to change to pd.read_csv if reading in csv and change path.

In [45]:
if extract == True:
    df_all = pd.DataFrame()
    files = os.listdir(output_path + "/Extraction/")
    for file in files:
        df_pat = pd.read_csv(output_path + "/Extraction/" + file)
        df_all = df_all.append(df_pat)
        

        # Save the features to parquet
    if os.path.exists(output_path + "/Features/") == False:
        os.mkdir(output_path + "/Features/")   
    df_all.to_csv(output_path + "/Features/Features_All.csv")

else:
    if os.path.exists(output_path + "/Features/Features_All.csv") == False:
        df_all = pd.read_csv('Input/Default/Features_HM-FS.csv')
        df_all.to_csv(output_path + "/Features/Features_All.csv")
    else:
        df_all = pd.read_csv(output_path + "/Features/Features_All.csv")

In [46]:
df_all = df_all[['PatID', 'Fraction', 'Contour', 'ContourType', 'Feature', 'FeatureValue']]
df_all = df_all[~df_all["Feature"].isin(["firstorder_Minimum", "firstorder_Maximum"])]
df_all.head()

Unnamed: 0,PatID,Fraction,Contour,ContourType,Feature,FeatureValue
0,1642,1,RP,Manual,shape_Elongation,0.839757
1,1642,1,RP,Manual,shape_Flatness,0.679305
2,1642,1,RP,Manual,shape_LeastAxisLength,23.488558
3,1642,1,RP,Manual,shape_MajorAxisLength,34.57734
4,1642,1,RP,Manual,shape_Maximum2DDiameterColumn,40.050273


In [47]:
df_man = df_all.loc[df_all["ContourType"] == "Manual"]
df_limbus = df_all.loc[df_all["ContourType"] == "Auto"]

In [48]:
def RescaleFts(df):
    '''
    Rescale features to be between -1 and 1 across all patients
    '''
    from sklearn.preprocessing import MinMaxScaler
    
    df = df.copy()

    # Get the features
    fts = df["Feature"].unique()
    for ft in fts:
        # Get the feature
        df_ft = df.loc[df["Feature"] == ft]
        # Get the values
        vals = df_ft["FeatureValue"].values
        vals = MinMaxScaler(feature_range=(0,1)).fit_transform(vals.reshape(-1,1))
        # Replace
        df.loc[df["Feature"] == ft, "FeatureValue"] = vals

    return df


In [49]:
df_man = RescaleFts(df_man)
df_limbus = RescaleFts(df_limbus)

In [50]:
df_man.to_csv('./Features_ManualRS.csv')

# Feature Reduction
#### Due to the high dimensionnality of radiomics values, it is vital that some of the features are removed if they offer no unique information. 
#### Since features are calculated by applying different formulas to images, many of these formulas are similar and so some features can be quite similar. We aim to remove all redundant features - redundant features in this model are those that are strongly correlated to volume, Spearman Rank coefficient rho > 0.6 and unstable due to contour differences, as measured by an ICC value < 0.5.

In [51]:
from Functions import Reduction as FR

## Volume Correlation
#### Previous studies have shown that radiomic feature values have a strong correlation with the volume of the mask



In [52]:
df_man.head()

Unnamed: 0,PatID,Fraction,Contour,ContourType,Feature,FeatureValue
0,1642,1,RP,Manual,shape_Elongation,0.484071
1,1642,1,RP,Manual,shape_Flatness,0.428445
2,1642,1,RP,Manual,shape_LeastAxisLength,0.579815
3,1642,1,RP,Manual,shape_MajorAxisLength,0.642106
4,1642,1,RP,Manual,shape_Maximum2DDiameterColumn,0.642992


In [53]:
print("Manual")
FR.Volume(df_man, output_path, plot=False)
print("Limbus")
FR.Volume(df_limbus, output_path, plot=False)

Manual
------------------------------
Volume Correlation
Correlating features to volume...


100%|██████████| 105/105 [00:00<00:00, 466.22it/s]
100%|██████████| 105/105 [00:00<00:00, 486.33it/s]
100%|██████████| 105/105 [00:00<00:00, 345.80it/s]
100%|██████████| 105/105 [00:00<00:00, 532.98it/s]
100%|██████████| 105/105 [00:00<00:00, 573.49it/s]


Volume redundant features: 22/105
------------------------------
Limbus
------------------------------
Volume Correlation
Correlating features to volume...


100%|██████████| 105/105 [00:00<00:00, 620.84it/s]
100%|██████████| 105/105 [00:00<00:00, 655.11it/s]
100%|██████████| 105/105 [00:00<00:00, 617.52it/s]
100%|██████████| 105/105 [00:00<00:00, 657.25it/s]
100%|██████████| 105/105 [00:00<00:00, 568.43it/s]

Volume redundant features: 26/105
------------------------------





## ICC Stability
#### Intra-class correlation coefficient is used as a statistical measure of how much two observed quantities within a group tend to agree with each other. 
#### Been used widely within radiomics studies as a test-retest stability measure between two delineations.



In [54]:
FR.ICC(df_all, output_path, plot=False)

------------------------------
Stability Test
Calculating ICC...


100%|██████████| 105/105 [00:04<00:00, 22.30it/s]
100%|██████████| 105/105 [00:04<00:00, 23.96it/s]
100%|██████████| 105/105 [00:04<00:00, 22.12it/s]
100%|██████████| 105/105 [00:04<00:00, 22.77it/s]
100%|██████████| 105/105 [00:04<00:00, 24.23it/s]

ICC redudant features: 16/105
------------------------------





#### Remove redundant features

#### Still need to do further feature reduction

In [55]:
df_red = FR.RemoveFts(df_man, output_path)

------------------------------
Removing redundant features...
Number of features removed: 33
Number of features remaining: 72
------------------------------


## Clustering I - Distance between Feature Trajectories
#### Calculate the Euclidean distance between feature pairs.
#### Distance values can then be used to visualise the relationship between features.
#### Can also be used to group features together.

In [56]:
from Functions import model as Ln
Ln.DistanceMatrix(df_red, False, output_path, plot=True)

Calculating Euclidean distance between feature pair trajectories...


100%|██████████| 20/20 [00:50<00:00,  2.55s/it]


## Clustering II - Grouping Features
#### Hierarchical clustering using SciPy
####   - Weighted linkage (Refers to the algorithm by which clusters are formed)
####   - Starting T-val = 2 (Refers to the threshold value for which to go to a different cluster, i.e. how far away a value is to a cluster before a new cluster is created/put in another cluster.)
##### Clusters with < 3 features discarded as deemed too unstable.
##### Clusters with > 10 features re-clustered to subclusters. 


In [57]:
Ln.ClusterFeatures(df_red, output_path, 1, plot=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["ClusterLabel"] = ""
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["ClusterNumFts"] = ""


------------------------------
Clustering Feature Trajectories...


100%|██████████| 20/20 [00:01<00:00, 17.39it/s]

------------------------------





## Clustering III - Feature Selection
#### Cross-correlation value between feature trajectories within each cluster performed to determine most “representative” feature
####    - Highest mean cross-correlation passed through.
####    - Top 20% of features.
#### Each patient passes through a set of features.
#### Features then tallied up and the top 10 ranked features are selected.


In [58]:
Ln.FeatureSelection(df_red, True, output_path)

------------------------------
Feature Selection
Calculating Cross-Correlation values...
Rescaling Features...


100%|██████████| 20/20 [00:01<00:00, 13.62it/s]

------------------------------
Selected Features: 
glcm_JointEnergy
gldm_LowGrayLevelEmphasis
glrlm_HighGrayLevelRunEmphasis
firstorder_Mean
glcm_Correlation
glcm_MCC
glcm_Idm
glcm_Contrast
firstorder_Uniformity
glcm_DifferenceEntropy
glcm_JointAverage
gldm_DependenceEntropy
firstorder_MeanAbsoluteDeviation
------------------------------
Number of Selected Features: 13
------------------------------



