predict.flexsurvreg #39

chjackson · 2017-05-30T13:28:25Z

General function to return predictions, either corresponding to the observed data, or to a user-supplied "newdata". Should behave similarly to other predict methods in base R and common packages. Could return predictions as linear predictors for the location parameter, or on the scale of the response, e.g. mean or median survival.

jrdnmdhl · 2017-05-30T13:35:50Z

The typical functionality of predict is the same as what summary.flexsurvreg already does, except it returns a simple vector instead of a dataframe or list of dataframes.

In order to avoid duplicate code, I think the best way to do this might be to take the existing code for summary, use it to write a predict method which returns a simple vector, then call predict from summary to add the predicted values to the dataframe(s).

jwdink · 2017-06-08T01:20:05Z

I've written a function for generating predictions from flexsurvreg objects for one of my packages:

https://github.com/jwdink/tidysurv/blob/master/R/flexsurv-helpers.R

It's probably lacking in some important functionality, but thought I'd share in case its helpful.

topepo · 2018-08-16T14:48:26Z

Any thoughts/progress on this?

I'd like to make flexsurv the default parametric modeling function in the parsnip package. Using the summary method is kinda kludgy and a bona fide prediction method would be great.

chjackson · 2018-08-17T09:26:52Z

I'll happily add this to play nicely with a tidyverse package!

What would you need it to do? Would predicted means (as type="response") and medians / other quantiles (as type="quantile") be sufficient? That should be an easy wrapper around summary.flexsurvreg. Linear predictors for the location parameter as well?

And what should be the returned format? A simple vector if no SEs are requested, or a matrix for multiple quantiles? What about with SEs? Would it have to return list(fit, se.fit) by convention, even though a data frame would be nicer?

Ideologically I prefer to present CIs than SEs, but I can add SEs as well if that's what people want.

topepo · 2018-08-17T11:48:34Z

I'll happily add this to play nicely with a tidyverse package!

Excellent!

What would you need it to do? Would predicted means (as type="response") and medians / other quantiles (as type="quantile") be sufficient? That should be an easy wrapper around summary.flexsurvreg. Linear predictors for the location parameter as well?

Use "response" for for the mean response and "percentile" for others. Is there a parameter for the percentile in the predict function? Linear predictors (maybe type = "link"?) would be good too.

And what should be the returned format? A simple vector if no SEs are requested, or a matrix for multiple quantiles? What about with SEs? Would it have to return list(fit, se.fit) by convention, even though a data frame would be nicer?

Ideologically I prefer to present CIs than SEs, but I can add SEs as well if that's what people want.

Yeah, that's not obvious (and we are coming up with a standard in tidymodels/parsnip#41). I'd say to keep everything in a data frame. For simple mean predictions, I would suggest having a column named .pred.

Put the standard error in .std_error and bounds in .pred_lower and .pred_upper. For those, I would generate them when type = "conf_int" or type = "pred_int" (as appropriate) along with a level parameter.

I'm not sure about the percentiles. Ideally, predict should return the same number of rows as was in newdata so you would need to put then percentile predictions in columns. However, I hate to encode the percentile number (e.g. 75th, etc) in the column name. I'll think about that and then make a suggestion soon.

topepo · 2018-08-17T14:02:21Z

How about a list column? Basically, you can add a data frame column (more easily with tibbles) that contains each subject-specific data frame. It's pretty easy to adapt your current data structure to that (code below).

You can still easily get to the single, stacked data frame easily (see the code at the end).

Here's an example using one of the models in ? flexsurvreg :

data(ovarian)
fitg <- flexsurvreg(formula = Surv(futime, fustat) ~ age, data = ovarian, dist="gengamma")

preds <- summary(fitg, newdata = ovarian[1:2, "age", drop = FALSE], t = c(100, 200, 300))

# set the names

preds <- lapply(preds, setNames, c(".time", ".pred", ".pred_lower", ".pred_upper"))

library(tibble)

preds <- tibble(.pred = preds)
preds

# if someone wants a single data frame, they could just:

tidyr::unnest(preds)

chjackson · 2018-09-06T09:26:03Z

I've uploaded a work in progress on predict.flexsurvreg to https://github.com/chjackson/flexsurv-dev/blob/master/R/predict.flexsurvreg.R. Happy to keep on refining it as your spec becomes more official. This needed some additions to summary.flexsurvreg to deal with SEs and type="link".

As I understand it, your suggestion was to return a data frame with pred,std_error etc columns for scalar predictions, but for vector predictions it would return a data frame with a single list column.
What do you think of the alternative I've written where it always returns pred,std_error etc columns, but these would be numeric columns for a scalar prediction and list columns for a vector prediction? That would seem to degrade more nicely.

I'm not sure about having "prediction intervals" for this kind of model. predict.glm and predict.survreg don't have that. type="quantile" seems to cover that concept by returning quantiles of the estimated distribution of the data.

topepo · 2018-09-19T18:58:18Z

I'm not sure about having "prediction intervals" for this kind of model.

The notion of prediction intervals would be complex here and I don't know of any censored regression packages that do that so I wouldn't worry about it.

You could do it with Bayesian models really easily but otherwise it can be difficult to compute.

As I understand it, your suggestion was to return a data frame with pred, std_error etc columns for scalar predictions, but for vector predictions it would return a data frame with a single list column. What do you think of the alternative I've written where it always returns pred , std_error etc columns, but these would be numeric columns for a scalar prediction and list columns for a vector prediction? That would seem to degrade more nicely.

Based some feedback that we've gotten, I think that the optimal structure would be a list column of data frames (or tibbles). For the quantile code, that would look like:

[[1]]
# A tibble: 2 x 2
 .quantile  .pred
     <dbl>  <dbl>
1      0.1 243.  
2      0.9   9.93

[[2]]
# A tibble: 2 x 2
 .quantile  .pred
     <dbl>  <dbl>
1      0.1 1256. 
2      0.9   51.2

Similar for probabilities, the columns would be .time and .pred. On the website, we suggest:

> tidy_preds
# A tibble: 2 x 1
  .pred           
  <list>          
1 <tibble [3 × 4]>
2 <tibble [3 × 4]>
> tidy_preds$.pred[[1]]
# A tibble: 3 x 4
  .time .pred .pred_lower .pred_upper
  <dbl> <dbl>       <dbl>       <dbl>
1   100 0.799      0.484        0.966
2   200 0.485      0.128        0.806
3   300 0.278      0.0123       0.663

Always returning .pred and .std_error as their own columns (and others that are scalars per row) is perfect.

Looking at the predict code

Would we still use the summary function to get survival and/or hazard probabilities (or will you do these via predict too)?

##' TODO remove covariates?

Yes please!

How much would you want to use our prediction specification? It's not finalized yet but I feel pretty solid about what is in there. There are generally good ideas but we aren't trying to coerce people.

mattwarkentin · 2020-05-15T20:55:42Z

Hi @chjackson,

Any interest in resurrecting this issue? I would be interested in helping out and could start working on a PR. I am an avid user of the tidyverse and tidymodels ecosystems and also flexsurv. I could probably also help with #71.

I also wanted to check whether you would be interested in a PR which would add a function for estimating cumulative incidence in the presence of competing-risks when modeling cause-specific hazards with flexible survival models (flexsurvreg() or flexsurvspline()). I wrote this function for my own project, but I think it would be at home in this package. Happy to discuss further and I can create a separate issue, if you prefer.

Let me know!

chjackson · 2020-05-16T11:51:32Z

Thanks - happy for you to work on this. I've updated the current source a bit to make the TODOs clearer. I think the main thing left to do is to create those list columns when there's multiple predictions, and write examples that demonstrate how users can deal with them. Someone fluent in tidyverse would be appreciated for that.

I'm actively working on multi-state model stuff at the moment. The package already has some functions in this area, but anything that enhances it would be welcome. Happy to discuss in another issue.

mattwarkentin · 2020-05-20T01:02:13Z

Great to hear! I've got a deadline this Friday for one of my doctoral projects thats currently occupying most of my time. I am planning to spend some time this weekend to read over the relevant source code and get more familiar with what exactly needs to be done.

I can start a new issue once I've got a better grasp on what the next steps might look like and we can discuss things further, if that works for you. I look forward to helping out in any way I can.

mattwarkentin · 2022-03-30T15:26:36Z

@chjackson This can be closed, I think. Up to you.

chjackson added the enhancement label May 30, 2017

chjackson mentioned this issue Apr 5, 2020

support for broom #71

Closed

chjackson mentioned this issue May 16, 2020

predictions on the scale of the response #17

Closed

mattwarkentin mentioned this issue May 27, 2020

Develop out S3 methods for predict and broom generics #75

Closed

chjackson closed this as completed Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict.flexsurvreg #39

predict.flexsurvreg #39

chjackson commented May 30, 2017

jrdnmdhl commented May 30, 2017

jwdink commented Jun 8, 2017

topepo commented Aug 16, 2018

chjackson commented Aug 17, 2018

topepo commented Aug 17, 2018

topepo commented Aug 17, 2018

chjackson commented Sep 6, 2018

topepo commented Sep 19, 2018

mattwarkentin commented May 15, 2020 •

edited

chjackson commented May 16, 2020

mattwarkentin commented May 20, 2020

mattwarkentin commented Mar 30, 2022

predict.flexsurvreg #39

predict.flexsurvreg #39

Comments

chjackson commented May 30, 2017

jrdnmdhl commented May 30, 2017

jwdink commented Jun 8, 2017

topepo commented Aug 16, 2018

chjackson commented Aug 17, 2018

topepo commented Aug 17, 2018

topepo commented Aug 17, 2018

chjackson commented Sep 6, 2018

topepo commented Sep 19, 2018

mattwarkentin commented May 15, 2020 • edited

chjackson commented May 16, 2020

mattwarkentin commented May 20, 2020

mattwarkentin commented Mar 30, 2022

mattwarkentin commented May 15, 2020 •

edited