# MR-Longitudinal Radiomics
### Radiomics pipeline created for longitudinal images collected at subsequent fractions of treatment.
##### Full model: Feature Extraction, Feature Reduction via volume correlation & test-retest stability, Feature Selection via Euclidean distance between feature pair trajectories and hierachical clustering.
##### Compares the results of the longitudinal model with a standard delta-radiomics approach to illustrate the importance of accounting for the full feature trajectory over treatment.

###
#### Below specify certain variables and options for customising the notebook

In [1]:
import os
import pandas as pd
from tqdm import tqdm

# Specify the output path
# specify the tag to use - could be anything, helps to identify the output if running multiple models
# default is "Test"
tag = "HM-FSTP-Delta"
# output_path = "C:/Users/b01297ar/Documents/ProstateMRL-local/ProstateMRL-Radiomics/ReleaseCode/Output/" + tag + "/"
cwd = os.getcwd()
output_path = cwd + "/Output/" 
if os.path.exists(output_path) == False:
        os.mkdir(output_path)
if os.path.exists(output_path + tag) == False:
    os.mkdir(output_path + tag)
    os.mkdir(output_path + tag + "/Features/")
    os.mkdir(output_path + tag + "/Extraction/")
    os.mkdir(output_path + tag + "/Plots/")
output_path = cwd + "/Output/" + tag + "/"

# specify if you want to compare to a delta model
# default is False
Delta_Model = False

# Specify if you want to visualise the results in plots
# default is False, can specify at given stages below if you want to visualise
plot = False

# Specify if you want to extract Features
# default is False, option to do so is below
# If features are already extracted, set to false and provide the path to the extracted features below
extract = True


## Feature Extraction
#### If you want to extract features, provide a csv containing the following:
####               | PatID | Fraction | Image file | Mask Name | Mask file | 
#### Specify the root of the csv in the Input dir.
#### Calculates features based on the parameter file specified. Default setting is currently set at PyRadiomics base extraction parameters - Fixed bin size (FBS) of 25, no resampling, no normalisation, 107 features (IBSI compliant) and no wavelet/laplacian filters applied. 
#### Features are then calculated and then saved in a new folder in the Output dir - with files in parquet format. Columns will be:
#### PatID | Fraction | Mask | Feature | Feature Value |

#### Once all features have been extracted, combine in to one dataframe
#### Or
#### Specify the path of the feature values. 
##### Default is to read in a parquet file (smaller file sizes - so quicker), make sure to change to pd.read_csv if reading in csv and change path.

In [2]:
df_all = pd.read_csv(output_path.replace("-Delta", "") + "Features/Features_All.csv")
df_all = df_all.drop(columns=["Unnamed: 0"])
df_all = df_all[~df_all["Feature"].isin(["firstorder_Minimum", "firstorder_Maximum"])]

df_all = df_all[df_all["Fraction"].isin([1, 5])]

patIDs = df_all["PatID"].unique()
fts = df_all["Feature"].unique()
Contours = df_all["ContourType"].unique()

df_all["Delta"] = 0

for pat in patIDs:
    for ft in fts:
        for ct in Contours:
            df_all.loc[(df_all["PatID"] == pat) & (df_all["Feature"] == ft) & (df_all["ContourType"] == ct), "Delta"] = df_all.loc[(df_all["PatID"] == pat) & (df_all["Feature"] == ft) & (df_all["ContourType"] == ct), "FeatureValue"].diff()
        
df_all.drop(columns=["FeatureValue"], inplace=True)
df_all.rename(columns={"Delta": "FeatureValue"}, inplace=True)

df_all = df_all[df_all["Fraction"] == 5]

df_all.head()

Unnamed: 0,PatID,Fraction,Contour,ContourType,Feature,FeatureValue
428,1601,5,RP,Manual,shape_Elongation,0.026751
429,1601,5,RP,Manual,shape_Flatness,-0.023634
430,1601,5,RP,Manual,shape_LeastAxisLength,-1.104368
431,1601,5,RP,Manual,shape_MajorAxisLength,-0.59049
432,1601,5,RP,Manual,shape_Maximum2DDiameterColumn,-0.90709


In [3]:
df_man = df_all.loc[df_all["ContourType"] == "Manual"]
df_limbus = df_all.loc[df_all["ContourType"] == "Auto"]

# Feature Reduction
#### Due to the high dimensionnality of radiomics values, it is vital that some of the features are removed if they offer no unique information. 
#### Since features are calculated by applying different formulas to images, many of these formulas are similar and so some features can be quite similar. We aim to remove all redundant features - redundant features in this model are those that are strongly correlated to volume, Spearman Rank coefficient rho > 0.6 and unstable due to contour differences, as measured by an ICC value < 0.5.

In [4]:
from Functions import Reduction as FR

## Volume Correlation
#### Previous studies have shown that radiomic feature values have a strong correlation with the volume of the mask



In [5]:
df_man.head()

Unnamed: 0,PatID,Fraction,Contour,ContourType,Feature,FeatureValue
428,1601,5,RP,Manual,shape_Elongation,0.026751
429,1601,5,RP,Manual,shape_Flatness,-0.023634
430,1601,5,RP,Manual,shape_LeastAxisLength,-1.104368
431,1601,5,RP,Manual,shape_MajorAxisLength,-0.59049
432,1601,5,RP,Manual,shape_Maximum2DDiameterColumn,-0.90709


In [6]:
df_limbus.head()

Unnamed: 0,PatID,Fraction,Contour,ContourType,Feature,FeatureValue
963,1601,5,Limbus,Auto,shape_Elongation,-0.07288
964,1601,5,Limbus,Auto,shape_Flatness,-0.01623
965,1601,5,Limbus,Auto,shape_LeastAxisLength,1.348814
966,1601,5,Limbus,Auto,shape_MajorAxisLength,2.755153
967,1601,5,Limbus,Auto,shape_Maximum2DDiameterColumn,1.781248


In [7]:
print("Manual")
FR.Volume(df_man, output_path, plot=False)
print("Limbus")
FR.Volume(df_limbus, output_path, plot=False)

Manual
------------------------------
Volume Correlation
Correlating features to volume...


100%|██████████| 105/105 [00:00<00:00, 720.46it/s]


Volume redundant features: 11/105
------------------------------
Limbus
------------------------------
Volume Correlation
Correlating features to volume...


100%|██████████| 105/105 [00:00<00:00, 581.76it/s]


Volume redundant features: 8/105
------------------------------


## ICC Stability
#### Intra-class correlation coefficient is used as a statistical measure of how much two observed quantities within a group tend to agree with each other. 
#### Been used widely within radiomics studies as a test-retest stability measure between two delineations.



In [8]:
FR.ICC(df_all, output_path, plot=False)

------------------------------
Stability Test
Calculating ICC...


100%|██████████| 105/105 [00:03<00:00, 30.05it/s]

ICC redudant features: 37/105
------------------------------





#### Remove redundant features

#### Still need to do further feature reduction

In [9]:
df_red = FR.RemoveFts(df_man, output_path)

------------------------------
Removing redundant features...
Number of features removed: 37
Number of features remaining: 68
------------------------------


## Clustering I - Distance between Feature Trajectories
#### Calculate the Euclidean distance between feature pairs.
#### Distance values can then be used to visualise the relationship between features.
#### Can also be used to group features together.

In [10]:
from Functions import Clustering as Cl
Cl.DistanceMatrix_Delta(df_red, True, output_path, True)

Calculating Euclidean distance between feature pair trajectories...
Rescaling Features...


100%|██████████| 20/20 [00:37<00:00,  1.88s/it]


In [None]:
from Functions import Delta as Dl

Dl.CorrMatrix()

## Clustering II - Grouping Features
#### Hierarchical clustering using SciPy
####   - Weighted linkage (Refers to the algorithm by which clusters are formed)
####   - Starting T-val = 2 (Refers to the threshold value for which to go to a different cluster, i.e. how far away a value is to a cluster before a new cluster is created/put in another cluster.)
##### Clusters with < 3 features discarded as deemed too unstable.
##### Clusters with > 10 features re-clustered to subclusters. 


In [11]:
Cl.ClusterFeatures(df_red, output_path, 2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["ClusterLabel"] = ""
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["ClusterNumFts"] = ""


------------------------------
Clustering Feature Trajectories...


100%|██████████| 20/20 [00:01<00:00, 15.11it/s]

------------------------------





## Clustering III - Feature Selection
#### Cross-correlation value between feature trajectories within each cluster performed to determine most “representative” feature
####    - Highest mean cross-correlation passed through.
####    - Top 20% of features.
#### Each patient passes through a set of features.
#### Features then tallied up and the top 10 ranked features are selected.


In [12]:
Cl.FeatureSelection_Delta(df_red, output_path)

------------------------------
Feature Selection
Calculating Cross-Correlation values...


100%|██████████| 20/20 [00:00<00:00, 29.88it/s]


------------------------------
Selected Features: 
glrlm_LongRunHighGrayLevelEmphasis
glcm_ClusterTendency
gldm_HighGrayLevelEmphasis
glcm_JointEntropy
glrlm_ShortRunHighGrayLevelEmphasis
glrlm_GrayLevelVariance
glcm_Autocorrelation
glrlm_LongRunEmphasis
glcm_Contrast
firstorder_10Percentile
glrlm_RunPercentage
firstorder_Mean
glcm_SumAverage
firstorder_RobustMeanAbsoluteDeviation
glszm_ZoneEntropy
glrlm_LongRunLowGrayLevelEmphasis
gldm_SmallDependenceHighGrayLevelEmphasis
------------------------------
Number of Selected Features: 17
------------------------------


In [14]:
long = pd.read_csv(output_path.replace("-Delta", "") + "Features/Features_Selected.csv")
delta = pd.read_csv(output_path + "Features/Features_Selected.csv")

long_fts = long["Feature"].unique()
delta_fts = delta["Feature"].unique()

sim_fts = [x for x in long_fts if x not in delta_fts]
print(sim_fts)
print(len(sim_fts))

['gldm_SmallDependenceEmphasis', 'glcm_JointEnergy', 'glcm_Imc2', 'glrlm_RunEntropy', 'firstorder_Uniformity', 'gldm_DependenceEntropy', 'glrlm_LowGrayLevelRunEmphasis', 'gldm_LargeDependenceLowGrayLevelEmphasis', 'glcm_Idmn', 'firstorder_MeanAbsoluteDeviation', 'glcm_JointAverage', 'glrlm_ShortRunEmphasis']
12
