This repository provides code that implements the state learner which is a super learner for right-censored data. We provide examples of how to use it as a stand-alone tool and in combination with targeted learning. We also provide code to reproduce the numerical experiments described in the article The state learner – a super learner for right-censored data (Munch & Gerds). Details are given in ./experiments/readme-experiments.org. The article describes a real data example, but as we cannot share the data from that study, the tutorial below uses an emulated data set with the same structure as the original data analysed in the article.
To run the examples below, first run the following code to load the needed functions.
library(here)
library(targets)
tar_source(here("R-code/functions"))
We load and setup the emulated data as follows.
library(riskRegression)
canc_dt <- fread(here("emulated-data/emulated-data.txt"))
canc_dt[,stage:=factor(stage)]
canc_dt[,hormones:=factor(hormones)]
The data looks as follows.
canc_dt
time status logPSA stage ggtot sDose hormones A <num> <int> <num> <fctr> <num> <num> <fctr> <num> 1: 28.92331 1 2.043593 T3ab 9.851837 -0.76183395 No 0 2: 14.20877 1 2.274948 T3c 6.881895 -0.35527003 No 0 3: 15.89122 0 1.133611 T2a 7.794667 -1.59754020 No 0 4: 61.61761 2 2.595044 T1c 8.879701 -0.81713882 Yes 1 5: 26.01292 0 2.360323 T1c 5.804236 0.61412150 Yes 1 --- 1038: 44.26178 0 1.137501 T2b 5.942560 -0.06968029 Yes 1 1039: 13.93434 0 1.898109 T1c 6.785073 0.46413725 No 0 1040: 52.19202 1 3.521309 T3ab 5.539725 -1.02355475 No 0 1041: 10.06398 0 2.900889 T2b 5.906983 1.18218871 Yes 1 1042: 25.70154 0 1.888293 T3c 7.163754 1.25946357 Yes 1
We specify a list of learners. Each learner is specified through a list with a
model
entry (for now, cox
, GLMnet
, and rfsrc
are implemented), and a
x_form
entry, which provides a formula for which and how the covariates enter
the model. Additional arguments can be supplied to each learner by adding
entries to the list (e.g., ntree
for a random survival forest).
library(glmnet)
library(randomForestSRC)
learners <- list(
cox = list(model = "cox", x_form = ~logPSA+stage+ggtot+sDose+hormones),
cox_penalty = list(model = "GLMnet", x_form = ~logPSA+stage+ggtot+sDose+hormones),
N_Aa = list(model = "cox", x_form = ~1),
N_Aa_strat = list(model = "cox", x_form = ~strata(hormones)),
rf = list(model = "rfsrc", x_form = ~logPSA+stage+ggtot+sDose+hormones, ntree = 50)
)
Below we use the same list of learners to learn the cumulative hazards for cause
1, cause 2, and censoring. The state learner returns a list
consisting of
- a
data.table
of triples of learners sorted according to their joint performance in predicting the state of the observed data in the interval 0 totime
; - the top ranked triple of models fitted to the full data.
Here we use an interval of 36 month and show the 6 highest ranked triples.
set.seed(114)
sl = statelearner(learners = list(cause1 = learners,
cause2 = learners,
censor = learners),
data = canc_dt,
time = 36)
head(sl$cv_fit)
Key: <loss> cause1 cause2 censor loss b <fctr> <fctr> <fctr> <num> <int> 1: cox cox_penalty cox 8.608565 1 2: cox N_Aa cox 8.608565 1 3: cox_penalty cox_penalty cox 8.609008 1 4: cox_penalty N_Aa cox 8.609008 1 5: cox cox cox 8.610104 1 6: cox_penalty cox cox 8.610470 1
We estimate effect of hormone therapy on tumor recurrence (cause=1) and death (cause=2). To do so, we need an estimate of the treatment mechanism, i.e., the probability of the presence of ulcers given other covariates.
canc_dt[,A := as.numeric(hormones)-1]
treat_fit <- GLMnet(A ~ logPSA+stage+ggtot+sDose, family = binomial, data = canc_dt)
We can now estimate the ATE using the fitted nuisance models.
ate_est <- os_abs_risk_ate(data = canc_dt,
eval_times = seq(6,36,6),
fit_1 = sl$fitted_winners$cause1,
fit_2 = sl$fitted_winners$cause2,
fit_cens = sl$fitted_winners$censor,
fit_treat = treat_fit)
Below we show the estimated ATE for both tumor recurrence and death for 6-month intervals up to 3 years after baseline.
setorder(ate_est, cause, time)
ate_est[effect=="ATE" & est_type=="one-step"]
cause time effect est_type est see lower upper <char> <num> <char> <char> <num> <num> <num> <num> 1: cause1 6 ATE one-step 0.012155916 0.013268835 -0.013850999 0.03816283 2: cause1 12 ATE one-step 0.010001492 0.017758405 -0.024804982 0.04480796 3: cause1 18 ATE one-step -0.003063623 0.019611287 -0.041501745 0.03537450 4: cause1 24 ATE one-step -0.005076999 0.023713768 -0.051555983 0.04140199 5: cause1 30 ATE one-step 0.020519005 0.029185821 -0.036685203 0.07772321 6: cause1 36 ATE one-step 0.034855532 0.032061411 -0.027984834 0.09769590 7: cause2 6 ATE one-step 0.006344508 0.005310849 -0.004064755 0.01675377 8: cause2 12 ATE one-step -0.004604911 0.009646595 -0.023512237 0.01430241 9: cause2 18 ATE one-step -0.007466150 0.013269851 -0.033475058 0.01854276 10: cause2 24 ATE one-step -0.006363876 0.015592043 -0.036924281 0.02419653 11: cause2 30 ATE one-step 0.014983272 0.031678356 -0.047106306 0.07707285 12: cause2 36 ATE one-step 0.014682592 0.032697166 -0.049403852 0.07876904