-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time-dependent C-index #6
Comments
Hello @bblodfon, Thank you! I am thrilled to hear
This is correct, I have waffled back and forth on whether it makes sense to take a mean C-statistic over a range of times. My rationale for doing it in this analysis was that I wanted to make relative comparisons between different models rather than interpret how effectively a model would discriminate high and low risk at a specific task, so taking a mean made sense. If I were trying to estimate how effective a model would be for a specific prediction task, e.g., 10-year risk of CVD, I would just get the time-dependent value of C at 10 years.
Equation 3 in Blanche et al, 2013. This method has traversed through different software packages over the years. I include a reprex below so that you can see
I would say the above calculation is just a mean of time-dependent C-statistics, and in that sense, you could say we are 'integrating out' the time dimension. One reason for taking the mean here was that otherwise, I would have to pick a specific prediction horizon for each prediction task, and I did not see a reasonably objective way to do that. Additionally, for the sake of comparing models, I thought taking the mean C-stat made the most sense (it's pretty similar to how we compare Brier scores via integration over time, right?). Overall, my opinion on the matter is that time-dependent C-statistics are confusing because (1) there are a lot of ways to compute them I would love to see a stable R package that clears this up. Making them more accessible might just convince the majority of scientific authors to stop using Harrell's C-statistic in analyses with lots of censoring! library(timeROC)
library(riskRegression)
#> riskRegression version 2023.03.22
library(aorsf)
library(survival)
cstat_timeroc <- timeROC(pbc_orsf$time,
delta=pbc_orsf$status,
marker=pbc_orsf$bili,
cause=1,
weighting="marginal",
times=c(1000, 2000, 3000),
iid=TRUE)
cstat_riskregression <- Score(object = list(pbc_orsf$bili),
formula = Surv(time, status) ~ 1,
data = pbc_orsf,
times = c(1000, 2000, 3000),
metrics = 'auc')
cstat_timeroc$AUC - cstat_riskregression$AUC$score$AUC
#> t=1000 t=2000 t=3000
#> 0 0 0 Created on 2023-10-11 with reprex v2.0.2 |
Thanks so much for your thoughts on this! I also exchanged a few emails with Thomas Gerds and Paul Blanche to clear this up. It's as you said. In particular, the library(timeROC)
library(riskRegression)
#> riskRegression version 2023.03.22
library(aorsf)
library(survival)
library(survAUC)
cstat_timeroc <- timeROC(pbc_orsf$time,
delta=pbc_orsf$status,
marker=pbc_orsf$bili,
cause=1,
weighting="marginal",
times=c(1000, 2000, 3000),
iid=TRUE)
cstat_riskregression <- Score(object = list(pbc_orsf$bili),
formula = Surv(time, status) ~ 1,
data = pbc_orsf,
times = c(1000, 2000, 3000),
metrics = 'auc')
cstat_survAUC <- survAUC::AUC.uno(
Surv.rsp = Surv(pbc_orsf$time, pbc_orsf$status),
Surv.rsp.new = Surv(pbc_orsf$time, pbc_orsf$status), # train = test data
times = c(1000, 2000, 3000),
lpnew = pbc_orsf$bili
)
cstat_timeroc$AUC - cstat_riskregression$AUC$score$AUC
#> t=1000 t=2000 t=3000
#> 0 0 0
# SAME RESULTS! USE riskRegression!
#> t=1000 t=2000 t=3000
#> 0 0 0
cstat_survAUC$auc - cstat_timeroc$AUC
#> t=1000 t=2000 t=3000
#> 2.220446e-16 -7.090504e-04 -3.869590e-03 Created on 2023-10-11 with reprex v2.0.2 My only concern is that since the above are the same (some differences in implementation for sure, but same I would call them) and their formulas reduce to Uno's AUC for right-cencored data using the Kaplan Meier for the estimation of the censoring distribution, then wouldn't Uno's AUC assumptions hold? e.g. 1-1 correspondence between predictor/marker and the expected survival times and random censoring? |
FYI @RaphaelS1 |
I'm not sure if the equality of the C-statistics implies the assumptions are true, but I do think it makes sense that the |
I am still trying to piece out if the 1-1 assumption should hold or not (maybe it doesn't affect so much results in the end). My rationale comes from the following example: If we have two or more patients/observations where their survival predictions/curves (with risk(t) = 1 - surv(t) as is documented in Note that for a particular model or for comparing models (as you suggested) taking a single time point, then definitely I see no problem with that (and that's time-dependent AUC but not integrated, maybe Paul's answer to the above was:
I agree also with what you said:
Yes, it would have been the better solution but I see the difficulty trying to find out the optimal per task horizon (and there might be mutiple depending on the application). But may I ask what was the vector of time points/horizons chosen for each particular task (I guess something aking to unique event points, different per test set?) I am asking because a weighted-AUC-mean (by the difference between time-points) would be better by a simple mean in the case that the time points weren't equidistant. |
It depends, here is the relevant code:
A time-weighted C would certainly be more like the IBS. Tldr I did not use the time-weighted C. |
ok! So you take the unique event times on the test set, keep the 95% interval and you force the vector by "spreading out" the time values so as they are not more than 30. Why the value 30 was chosen? |
Arbitrary choice. Some prediction tasks had hundreds of unique event times and that made |
Minimizing the number of time points to calculate the AUC at, is a very smart tactic I would say! I think we can implement something similar in Feel free to keep the post as a question/discussion, thanks for spending the time to answer all of this Byron! |
Hi @bcjaeger,
Congrats on publishing the
aorsf
paper! It has also come up as best learner in my own work.I wanted to ask your opinion on the following regarding this benchmarking work:
I was looking at the code that calculates the time-dependent C-index and I saw you are using the
riskRegression::Score()
function and getting theAUC
for each of thetimes = pred_horizon
(vector of time points). Then you take themean
of those values. And you have calculated risk scores per time point for each model here. Right?First, do you know exactly from which paper is the AUC calculation coming from? In the paper you cite the
timeROC
paper Blanche (2013) but that is different?Secondly, is the above calculation really a general time-dependent C-index? I have read the work of Antolini's (2005) which uses the survival distribution prediction to derive a C-index (so it's quite general) and there are other AUC-ROC-based C-index metrics that use linear predictions (so they require Cox models and/or some random censoring assumption need to be in place). I really don't know, I just wanted your opinion on this, since the details where missing from the paper.
In mlr3proba we will soon include the Antolini's C-index, and after I saw this, I was considering also adding this one as it is already implemened in
RiskRegression
but we'll see!The text was updated successfully, but these errors were encountered: