<a href="https://colab.research.google.com/github/Code-CloudSG/CS506-Computational-Tools-for-Data-Science/blob/master/Done!_DCM_on_SUPPORT_Dataset_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DSM on SUPPORT Dataset

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
%cd gdrive/My Drive/project_folder/MA/auton-survival

/content/gdrive/My Drive/project_folder/MA/auton-survival


The SUPPORT dataset comes from the Vanderbilt University study
to estimate survival for seriously ill hospitalized adults.
(Refer to http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc.
for the original datasource.)

In this notebook, we will apply Deep Survival Machines for survival prediction on the SUPPORT data.

### Load the SUPPORT Dataset

The package includes helper functions to load the dataset.

X represents an np.array of features (covariates),
T is the event/censoring times and,
E is the censoring indicator.

In [None]:
import sys
sys.path.insert(1, "/content/gdrive/MyDrive/project_folder/MA/auton-survival/auton_survival/")

In [None]:
from dsm import datasets
x, t, e = datasets.load_dataset('SUPPORT')

ModuleNotFoundError: ignored

In [None]:
import auton_survival


ModuleNotFoundError: ignored

In [None]:
from auton_survival.models.dsm import DeepSurvivalMachines

In [None]:
from auton_survival.models.dsm import datasets
x, t, e = datasets.load_dataset('SUPPORT')

In [None]:
!apt update
!apt install -y cmake

[33m0% [Working][0m            Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
[33m0% [Connecting to archive.ubuntu.com (185.125.190.36)] [1 InRelease 14.2 kB/88.[0m[33m0% [Connecting to archive.ubuntu.com (185.125.190.36)] [Waiting for headers] [W[0m[33m0% [1 InRelease gpgv 88.7 kB] [Connecting to archive.ubuntu.com (185.125.190.36[0m                                                                               Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
[33m0% [1 InRelease gpgv 88.7 kB] [Waiting for headers] [2 InRelease 3,626 B/3,626 [0m[33m0% [1 InRelease gpgv 88.7 kB] [Waiting for headers] [Waiting for headers] [Wait[0m                                                                               Hit:3 http://archive.ubuntu.com/ubuntu bionic InRelease
[33m                                                                               0% [1 InRelease gpgv 88.7 kB] [Waiting for headers] [

In [None]:
!pip install scikit-survival

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting scikit-survival
  Downloading scikit_survival-0.17.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[K     |████████████████████████████████| 3.1 MB 33.0 MB/s 
Installing collected packages: scikit-survival
Successfully installed scikit-survival-0.17.2


In [None]:
!pwd

/content/gdrive/MyDrive/project_folder/MA/auton-survival


### Compute horizons at which we evaluate the performance of DSM

Survival predictions are issued at certain time horizons. Here we will evaluate the performance
of DSM to issue predictions at the 25th, 50th and 75th event time quantile as is standard practice in Survival Analysis.

In [None]:
import numpy as np
horizons = [0.25, 0.5, 0.75]
times = np.quantile(t[e==1], horizons).tolist()

### Splitting the data into train, test and validation sets

We will train DSM on 70% of the Data, use a Validation set of 10% for Model Selection and report performance on the remaining 20% held out test set.

In [None]:
n = len(x)

tr_size = int(n*0.70)
vl_size = int(n*0.10)
te_size = int(n*0.20)

x_train, x_test, x_val = x[:tr_size], x[-te_size:], x[tr_size:tr_size+vl_size]
t_train, t_test, t_val = t[:tr_size], t[-te_size:], t[tr_size:tr_size+vl_size]
e_train, e_test, e_val = e[:tr_size], e[-te_size:], e[tr_size:tr_size+vl_size]

### Setting the parameter grid

Lets set up the parameter grid to tune hyper-parameters. We will tune the number of underlying survival distributions, 
($K$), the distribution choices (Log-Normal or Weibull), the learning rate for the Adam optimizer between $1\times10^{-3}$ and $1\times10^{-4}$ and the number of hidden layers between $0, 1$ and $2$.

In [None]:
from sklearn.model_selection import ParameterGrid

In [None]:
param_grid = {'k' : [3, 4, 6],
              'distribution' : ['LogNormal', 'Weibull'],
              'learning_rate' : [ 1e-4, 1e-3],
              'layers' : [ [], [100], [100, 100] ]
             }
params = ParameterGrid(param_grid)

### Model Training and Selection

In [None]:
from auton_survival.models.dsm import DeepSurvivalMachines

In [None]:
models = []
for param in params:
    model = DeepSurvivalMachines(k = param['k'],
                                 distribution = param['distribution'],
                                 layers = param['layers'])
    # The fit method is called to train the model
    model.fit(x_train, t_train, e_train, iters = 100, learning_rate = param['learning_rate'])
    models.append([[model.compute_nll(x_val, t_val, e_val), model]])
best_model = min(models)
model = best_model[0][1]

 12%|█▏        | 1190/10000 [00:02<00:16, 529.48it/s]
100%|██████████| 100/100 [00:15<00:00,  6.43it/s]
 12%|█▏        | 1190/10000 [00:02<00:15, 557.66it/s]
 56%|█████▌    | 56/100 [00:08<00:06,  6.84it/s]
 12%|█▏        | 1190/10000 [00:02<00:15, 555.93it/s]
 92%|█████████▏| 92/100 [00:14<00:01,  6.17it/s]
 12%|█▏        | 1190/10000 [00:02<00:15, 557.25it/s]
 16%|█▌        | 16/100 [00:02<00:14,  5.62it/s]
 12%|█▏        | 1190/10000 [00:02<00:21, 409.34it/s]
 60%|██████    | 60/100 [00:11<00:07,  5.31it/s]
 12%|█▏        | 1190/10000 [00:02<00:16, 525.28it/s]
 11%|█         | 11/100 [00:02<00:18,  4.88it/s]
 12%|█▏        | 1190/10000 [00:02<00:15, 560.92it/s]
100%|██████████| 100/100 [00:16<00:00,  6.20it/s]
 12%|█▏        | 1190/10000 [00:02<00:15, 561.76it/s]
 92%|█████████▏| 92/100 [00:14<00:01,  6.21it/s]
 12%|█▏        | 1190/10000 [00:02<00:15, 558.88it/s]
 92%|█████████▏| 92/100 [00:17<00:01,  5.27it/s]
 12%|█▏        | 1190/10000 [00:02<00:18, 471.75it/s]
 16%|█▌        | 

### Inference

In [None]:
out_risk = model.predict_risk(x_test, times)
out_survival = model.predict_survival(x_test, times)

### Evaluation

We evaluate the performance of DSM in its discriminative ability (Time Dependent Concordance Index and Cumulative Dynamic AUC) as well as Brier Score.

In [None]:
from sksurv.metrics import concordance_index_ipcw, brier_score, cumulative_dynamic_auc

In [None]:
cis = []
brs = []

et_train = np.array([(e_train[i], t_train[i]) for i in range(len(e_train))],
                 dtype = [('e', bool), ('t', float)])
et_test = np.array([(e_test[i], t_test[i]) for i in range(len(e_test))],
                 dtype = [('e', bool), ('t', float)])
et_val = np.array([(e_val[i], t_val[i]) for i in range(len(e_val))],
                 dtype = [('e', bool), ('t', float)])

for i, _ in enumerate(times):
    cis.append(concordance_index_ipcw(et_train, et_test, out_risk[:, i], times[i])[0])
brs.append(brier_score(et_train, et_test, out_survival, times)[1])
roc_auc = []
for i, _ in enumerate(times):
    roc_auc.append(cumulative_dynamic_auc(et_train, et_test, out_risk[:, i], times[i])[0])
for horizon in enumerate(horizons):
    print(f"For {horizon[1]} quantile,")
    print("TD Concordance Index:", cis[horizon[0]])
    print("Brier Score:", brs[0][horizon[0]])
    print("ROC AUC ", roc_auc[horizon[0]][0], "\n")

For 0.25 quantile,
TD Concordance Index: 0.7555853798907455
Brier Score: 0.1113088360231026
ROC AUC  0.7649158542616377 

For 0.5 quantile,
TD Concordance Index: 0.699380756108624
Brier Score: 0.18341050843284837
ROC AUC  0.718643804635684 

For 0.75 quantile,
TD Concordance Index: 0.6531435936608161
Brier Score: 0.22063558940580705
ROC AUC  0.7093596655964137 



In [None]:
from auton_survival.models.dsm import DeepSurvivalMachines
from auton_survival.models.dsm import datasets


# load the SUPPORT dataset.
x, t, e = datasets.load_dataset('SUPPORT')
# instantiate a DeepSurvivalMachines model.
model = DeepSurvivalMachines()
 # fit the model to the dataset.
model.fit(x, t, e)
 # estimate the predicted risks at the time
model.predict_risk(x, 10)

 19%|█▉        | 1881/10000 [00:03<00:16, 492.02it/s]
100%|██████████| 1/1 [00:00<00:00,  5.11it/s]


array([[0.1513105 ],
       [0.18146217],
       [0.17543338],
       ...,
       [0.21745589],
       [0.20384838],
       [0.16535427]])

In [None]:
from auton_survival.models.dsm import DeepSurvivalMachines
model = DeepSurvivalMachines()
model.fit(x, t, e)

 19%|█▉        | 1881/10000 [00:03<00:15, 518.99it/s]
100%|██████████| 1/1 [00:00<00:00,  6.01it/s]


<auton_survival.models.dsm.DeepSurvivalMachines at 0x7f0dacae4c10>

https://autonlab.org/auton-survival/models/dsm/index.html#example-usage

from auton_survival.models.dsm.datasets 
from auton_survival.models.dsm import DeepSurvivalMachines
from auton_survival.models.dsm import datasets


# load the SUPPORT dataset.
 x, t, e = datasets.load_dataset('SUPPORT')
# instantiate a DeepSurvivalMachines model.
 model = DeepSurvivalMachines()
 # fit the model to the dataset.
model.fit(x, t, e)
 # estimate the predicted risks at the time
model.predict_risk(x, 10)