# Neural Fine Gray on FRAMINGHAM Dataset

In this notebook, we will apply Neural Fine Gray on the FRAMINGHAM data.

In [None]:
import sys
sys.path.append('../')
sys.path.append('../DeepSurvivalMachines/')

### Load the FRAMINGHAM Dataset

The package includes helper functions to load the dataset.

X represents an np.array of features (covariates),
T is the event/censoring times and,
E is the censoring indicator.

In [None]:
from nfg import datasets
x, t, e, columns = datasets.load_dataset('FRAMINGHAM', path = '../', competing = True)

### Compute horizons at which we evaluate the performance of Neural Fine Gray

Survival predictions are issued at certain time horizons. Here we will evaluate the performance
of NFG to issue predictions at the 25th, 50th and 75th event time quantile as is standard practice in Survival Analysis.

In [None]:
import numpy as np
import torch
np.random.seed(42)
torch.random.manual_seed(42)

horizons = [0.25, 0.5, 0.75]
times = np.quantile(t[e > 0], horizons) # Fixed horizons for accurate comparison between competing and non competing

In [None]:
# Display the percentage of observed event at different time horizon
for time in times:
    print('At time {:.2f}'.format(time))
    for risk in np.unique(e):
        print('\t {:.2f} % observed risk {}'.format(100 * ((e == risk) & (t < time)).mean(), risk))

print('Total')
for risk in np.unique(e):
    print('\t {:.2f} % observed risk {}'.format(100 * ((e == risk)).mean(), risk))

### Splitting the data into train, test and validation sets

We will train NFG on 80% of the Data (10 % of which is used for stopping criterion and 10% for model Selection) and report performance on the remaining 20% held out test set.

In [None]:
from sklearn.model_selection import train_test_split

x_train, x_test, t_train, t_test, e_train, e_test = train_test_split(x, t, e, test_size = 0.2, random_state = 42)
x_train, x_val, t_train, t_val, e_train, e_val = train_test_split(x_train, t_train, e_train, test_size = 0.2, random_state = 42)
x_dev, x_val, t_dev, t_val, e_dev, e_val = train_test_split(x_val, t_val, e_val, test_size = 0.5, random_state = 42)

# Time normalisaiton is critical as it is an input of the network
# Time 0 must also be 0 so we use min max
minmax = lambda x: x / t_train.max() # Enforce to be inferior to 1
t_train_ddh = minmax(t_train)
t_dev_ddh = minmax(t_dev)
t_val_ddh = minmax(t_val)

### Setting the parameter grid

As a constrained neural network, training is sensitive to the hyperparameters, we recommend a grid search as followed:

In [None]:
from sklearn.model_selection import ParameterSampler

In [None]:
layers = [[50], [50, 50], [50, 50, 50], [100], [100, 100], [100, 100, 100]]
param_grid = {
            'learning_rate' : [1e-3, 1e-4],
            'layers_surv': layers,
            'layers' : layers,
            'batch': [100, 250],
            }
params = ParameterSampler(param_grid, 5, random_state = 42)

### Model Training and Selection

In [None]:
from nfg import NeuralFineGray

In [None]:
models = []
for param in params:
    model = NeuralFineGray(layers = param['layers'], layers_surv = param['layers_surv'])
    # The fit method is called to train the model
    model.fit(x_train, t_train_ddh, e_train, n_iter = 1000, bs = param['batch'],
            lr = param['learning_rate'], val_data = (x_dev, t_dev_ddh, e_dev))
    nll = model.compute_nll(x_val, t_val_ddh, e_val)
    if not(np.isnan(nll)):
        models.append([nll, model])
    else:
        print("WARNING: Nan Value Observed")

In [None]:
best_model = min(models, key = lambda x: x[0])
model = best_model[1]

### Inference

Model prediction for the different patients and analysis of the results. As we use cumulative metrics, we predict over a grid of points.

In [None]:
pred_times = np.linspace(0, t_train.max(), 100)
out_survival = model.predict_survival(x_test, minmax(pred_times).tolist())
out_risk = 1 - out_survival

### Evaluation

We evaluate the performance of NFG in its discriminative ability (Time Dependent Concordance Index and Cumulative Dynamic AUC) as well as Brier Score. Note that we implemented competing risks metrics.

In [None]:
from metrics import truncated_concordance_td, auc_td, brier_score

In [None]:
cis, brs, rocs = [], [], []
km = (e_train, t_train)
for i, te in enumerate(times):
    # Compute metrics for risk = 1 and estimate the km used for IPCW
    ci, km = truncated_concordance_td(e_test, t_test, out_risk, pred_times, te, km = km, risk = 1) 
    cis.append(ci)
    brs.append(brier_score(e_test, t_test, out_risk, pred_times, te, km = km, risk = 1)[0])
    rocs.append(auc_td(e_test, t_test, out_risk, pred_times, te, km = km, risk = 1)[0])

for i, horizon in enumerate(horizons):
    print(f"For {horizon} quantile,")
    print("Truncated Concordance Index: {:.3f}".format(cis[i]))
    print("tdAUC: {:.3f}".format(rocs[i]))
    print("Brier Score: {:.3f}".format(brs[i]))
    print('*' * 50)