Add `outputmargin = TRUE` in `predict` of a xgboost model #7

darentsai · 2024-03-30T18:38:43Z

Dear Maintainer,

I'm trying to train a xgboost model with the Cox proportional hazard loss using your codes.
You estimated the baseline cumulative hazard $H_0(t)$ by gbm::basehaz.gbm.

aorsf-bench/R/model_fit.R

Lines 397 to 406 in c8a9b5b

    
           # baseline hazard estimate at pred horizon 
        
           lin_preds <- predict(fit, newdata = xmat) 
        
           base_haz <- 
        
             gbm::basehaz.gbm(t = train[, 'time'], 
        
                              delta = train[, 'status'], 
        
                              f.x = lin_preds, 
        
                              t.eval = pred_horizon, 
        
                              smooth = TRUE, 
        
                              cumulative = TRUE)

I search its document and see that the parameter f.x needs to be the predicted values of the regression model on the log hazard scale. However, you passed the output of predict(fit, newdata = xmat) into it, where the predict method for class 'xgb.Booster' defaults to return predictions on the hazard ratio scale (i.e., as $HR = exp(X\beta)$ in the proportional hazard function $h(t) = h_0(t) * HR$), instead of log hazard scale $log(HR) = X\beta$.

I think you may need to set outputmargin = TRUE before passing it into basehaz.gbm, i.e.

lin_preds <- predict(fit, newdata = xmat, outputmargin = TRUE)

outputmargin controls whether the predictions should be returned in the form of original untransformed sum of predictions from boosting iterations' results.

The same issue appears in the xgb_cox_pred function from model_pred.R.

aorsf-bench/R/model_pred.R

Lines 209 to 213 in c8a9b5b

    
           lin_preds <- predict(object$fit, newdata = .test) 
        
           for(i in seq_along(pred_horizon)){ 
        
             res[, i] <- exp(exp(lin_preds) * -object$base_haz[i]) 
        
           }

It seems that you use the formula $S(t) = exp(exp(X\beta) * -H_0(t))$ to evaluate survival probabilities. lin_preds you define has been $exp(X\beta)$, and you take one more exp() outside of it. So I think you also need to set outputmargin = TRUE here.

Maybe all above are my misunderstanding! Thank you for any replies. This project is an outstanding work!

Best regards,

bcjaeger · 2024-03-31T18:35:01Z

Thank you! Your change is correct. I very much appreciate your PR and explanation. I'll merge this in now.

darentsai · 2024-05-11T15:09:55Z

@bcjaeger Thanks for your quick reply and accepting the PR.

I'm reading your excellent work published on JCGS(2024) about AORSF, and some contents confuse me a little. Could I ask you about it here?

I have also read your article published on AOAS (2019) about ORSF. In that work, you used a regularized Cox model to calculate the decision boundary. But in the recent paper, you used a standard Cox model. Besides improving computational speed, are there any other reasons for this change?
You said that memory is conserved by conducting bootstrap resampling via "random integer-valued weights", and you implement this part coded below in the Algorithm:

w <- sample(from={0,...,10},size=nrow(xtrain),replace=T)

(the weights w is then used in Newton-Raphson scoring to solve betas)
It's the first time I've seen this Bootstrap scheme. Are there literatures mentioning this approach? (Or any keyword I can google?)

Thanks for any reply!

bcjaeger · 2024-05-12T16:01:04Z

@darentsai no problem! Thank you for reviewing the code and for your kind words about the paper.

You've asked two very good questions.

I have also read your article published on AOAS (2019) about ORSF. In that work, you used a regularized Cox model to calculate the decision boundary. But in the recent paper, you used a standard Cox model. Besides improving computational speed, are there any other reasons for this change?

There is no benefit apart from the improved computational speed. One of the most interesting findings from the JCGS paper is that the much faster version of using 1 iteration of the newton-raphson procedure ends up being practically equivalent in terms of C-statistic compared to the much more thorough penalized Cox regression approach from the AOAS paper.

You said that memory is conserved by conducting bootstrap resampling via "random integer-valued weights". Are there literatures mentioning this approach? (Or any keyword I can google?)

I would guess that other packages (e.g., ranger) do this to make bootstrapping more efficient. I don't know of a good prior reference for this approach (you can always cite the JCGS paper though). I don't think you need a reference for this approach though, as you can convince yourself it is asymptotically identical to the usual bootstrap procedure with code.

# bootstrap a standard error

n_obs <- 1000
n_boots <- 10000
x <- rnorm(n_obs)

boot_stats_regular <- vector('numeric', n_boots)
boot_stats_weights <- vector('numeric', n_boots)

for(i in seq(n_boots)){
 # regular bootstrap approach
 index <- sample(n_obs, replace = TRUE)
 
 # weighted bootstrap (using same sample as regular)
 weights <- vector("numeric", n_obs)
 for(w in seq_along(weights)) weights[w] <- sum(index==w)
 
 # do this to match what aorsf does (i.e., no getting sampled > 10 times)
 weights <- pmin(weights, 10)
 
 boot_stats_regular[i] <- mean(x[index])
 boot_stats_weights[i] <- weighted.mean(x, w = weights)
}

# check
sd(boot_stats_regular)
#> [1] 0.03138031
sd(boot_stats_weights)
#> [1] 0.03138031
t.test(x)$stderr
#> [1] 0.03088334

^{Created on 2024-05-12 with reprex v2.1.0}

darentsai · 2024-05-14T13:03:47Z

@bcjaeger Very appreciate your kindness and detailed explanation!

After testing your example code, I clearly understand that using "random integer-valued weights" indeed mimics the bootstrap procedure. Initially I had doubts about whether it's valid because of this line of pseudocode from JCGS paper:

$$w \leftarrow sample(from = \{0, ..., 10\}, size = nrow(x_{train}), replace = T)$$

I intuitively thought that the weights are drawn from a discrete uniform distribution from 0 to 10.
However, If we need to use a statistical distribution to describe the number of times an observation is selected in a bootstrap sample, it should be binomial or Poisson.

I adapt your example code to include weights generated by the binomial, Poisson, and uniform distributions.
It shows that uniformly distributed weights underestimate the standard error.
Does it mean the pseudocode is misleading?

n_obs <- 1000
n_boots <- 10000
x <- rnorm(n_obs)

boot_stats_weights <- vector('numeric', n_boots)
boot_stats_weights_binom <- vector('numeric', n_boots)
boot_stats_weights_pois <- vector('numeric', n_boots)
boot_stats_weights_unif <- vector('numeric', n_boots)

for(i in seq(n_boots)){

  # weighted bootstrap
  index <- sample(n_obs, replace = TRUE)
  weights <- vector("numeric", n_obs)
  for(w in seq_along(weights)) weights[w] <- sum(index == w)
  
  # weighted bootstrap (Binomial distribution)
  w_binom <- rbinom(n_obs, size = n_obs, prob = 1/n_obs)
  
  # weighted bootstrap (Poisson distribution)
  w_pois <- rpois(n_obs, lambda = 1)
  
  # weighted bootstrap (Uniform distribution)
  w_unif <- sample(0:10, size = n_obs, replace = TRUE)
  
  boot_stats_weights[i] <- weighted.mean(x, w = weights)
  boot_stats_weights_binom[i] <- weighted.mean(x, w = w_binom)
  boot_stats_weights_pois[i] <- weighted.mean(x, w = w_pois)
  boot_stats_weights_unif[i] <- weighted.mean(x, w = w_unif)
}

# check
t.test(x)$stderr               # 0.03226258
sd(boot_stats_weights)         # 0.03236514
sd(boot_stats_weights_binom)   # 0.03218649
sd(boot_stats_weights_pois)    # 0.03287533
sd(boot_stats_weights_unif)    # 0.02058976

bcjaeger · 2024-05-14T13:22:39Z

Ahh I see. This is a very helpful illustration!

I can see how the pseudocode implies uniform sampling. That was not my intention and I apologize for the confusion 😞

The sampling procedure used to mimic bootstrapping is carried out mostly by the code below

uword i, draw, n = data->n_rows;

std::uniform_int_distribution<uword> udist_rows(0, n - 1);

  if(sample_with_replacement){

   for (i = 0; i < n; ++i) {
    draw = udist_rows(random_number_generator);
    ++w_inbag[draw];
   }

  }

do you think we could describe this with pseudocode? I am thinking maybe something like this:

probs <- (1 / n_obs) ^ (seq(0, 10))
w <- sample(seq(0, 10), size = n_obs, replace = TRUE, probs = probs)

darentsai · 2024-05-15T11:31:02Z

Maybe using a loop to express binomial sampling is easy to read, but it could be somewhat lengthy for pseudocode.

w <- rep(0, times = nrow(x))
for(i in seq_along(w)) {
  pos <- sample(from = seq_along(w), size = 1)
  w[pos] <- w[pos] + 1
}

Your idea is nice! But the probabilities probs may not align with the distribution behind bootstrap. I run it with R and get a vector of weights with all values being 0. Perhaps using probabilities from binomial distribution would be reasonable.

probs <- choose(n_obs, seq(0, 10)) * (1 / n_obs) ^ seq(0, 10) * (1 - 1 / n_obs) ^ (n_obs - seq(0, 10))
# equivalent to
probs <- dbinom(0:10, n_obs, 1 / n_obs)

w <- sample(seq(0, 10), size = n_obs, replace = TRUE, prob = probs)

I'm not sure if putting this arithmetic into an algorithm pseudocode is proper or not. Maybe we can just write:

$$ probs \leftarrow prob\_binomial(x = seq(0, 10), n\_trial = n\_obs, prob = 1 / n\_obs)$$

$$ w \leftarrow sample(from = seq(0, 10), size = n\_obs, replace = TRUE, probs = probs) $$

Sorry I don't have better ideas now...

bcjaeger · 2024-05-15T11:58:07Z

Oh right, probs in my idea was off b/c it didn't include the probability of not being sampled. dbinom works very well. How about this?

$probs \leftarrow dbinom(seq(0, 10), size = n_{obs}, prob = 1 / n_{obs})$

$w \leftarrow sample(seq(0, 10), size = n_{obs}, replace = TRUE, probs = probs) $

darentsai · 2024-05-15T13:22:49Z

It looks great! My concern is that $dbinom(...)$ is R-specific syntax, and I'm not sure if those non R-users will understand what it means. So I use the word $prob\_binomial(...)$ in the previous comment to make it more informative. If you think it's not an issue, you can go ahead and use $dbinom$.

Another option is

$$w \leftarrow sample \_ binomial(n_{obs}, size = n_{obs}, probs = 1/n_{obs})$$

corresponding to R command:

w <- rbinom(n_obs, size = n_obs, prob = 1/n_obs)

This way saves the line to define $probs \leftarrow (...)$ and avoid an upper bound 10.

best regards,

bcjaeger · 2024-05-16T18:03:40Z

I love this!

I would like to re-run this analysis with xgboost predictions working as intended and then make the update to my pseudocode. This will take a little while since the analysis requires a lot of computing time, but when it's done, May I include an acknowledgment to you in the paper for your help?

darentsai · 2024-05-17T06:56:04Z

Of course, It's my pleasure!
I'm still a first-year PhD student with limited insight, so communicating with you about this project is a valuable experience for me. I use the name Dai-Rong Tsai in academic articles. You could mention me in the paper. I add my email to my profile intro. Feel free to contact me whenever needed.

darentsai added 2 commits March 31, 2024 01:19

fix xgb_cox_fit

0c84e97

fix xgb_cox_pred

02a4fdf

bcjaeger merged commit 1bc0fde into bcjaeger:main Mar 31, 2024

bcjaeger mentioned this pull request May 20, 2024

allow survival predictions for VI ropensci/aorsf#63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `outputmargin = TRUE` in `predict` of a xgboost model #7

Add `outputmargin = TRUE` in `predict` of a xgboost model #7

darentsai commented Mar 30, 2024

bcjaeger commented Mar 31, 2024

darentsai commented May 11, 2024 •

edited

Loading

bcjaeger commented May 12, 2024

darentsai commented May 14, 2024 •

edited

Loading

bcjaeger commented May 14, 2024 •

edited

Loading

darentsai commented May 15, 2024 •

edited

Loading

bcjaeger commented May 15, 2024

darentsai commented May 15, 2024 •

edited

Loading

bcjaeger commented May 16, 2024

darentsai commented May 17, 2024

	# baseline hazard estimate at pred horizon
	lin_preds <- predict(fit, newdata = xmat)

	base_haz <-
	gbm::basehaz.gbm(t = train[, 'time'],
	delta = train[, 'status'],
	f.x = lin_preds,
	t.eval = pred_horizon,
	smooth = TRUE,
	cumulative = TRUE)

	lin_preds <- predict(object$fit, newdata = .test)

	for(i in seq_along(pred_horizon)){
	res[, i] <- exp(exp(lin_preds) * -object$base_haz[i])
	}

Add outputmargin = TRUE in predict of a xgboost model #7

Add outputmargin = TRUE in predict of a xgboost model #7

Conversation

darentsai commented Mar 30, 2024

bcjaeger commented Mar 31, 2024

darentsai commented May 11, 2024 • edited Loading

bcjaeger commented May 12, 2024

darentsai commented May 14, 2024 • edited Loading

bcjaeger commented May 14, 2024 • edited Loading

darentsai commented May 15, 2024 • edited Loading

bcjaeger commented May 15, 2024

darentsai commented May 15, 2024 • edited Loading

bcjaeger commented May 16, 2024

darentsai commented May 17, 2024

Add `outputmargin = TRUE` in `predict` of a xgboost model #7

Add `outputmargin = TRUE` in `predict` of a xgboost model #7

darentsai commented May 11, 2024 •

edited

Loading

darentsai commented May 14, 2024 •

edited

Loading

bcjaeger commented May 14, 2024 •

edited

Loading

darentsai commented May 15, 2024 •

edited

Loading

darentsai commented May 15, 2024 •

edited

Loading