Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create formula interface for RMST test #216

Merged
merged 7 commits into from
Mar 19, 2024
Merged

Conversation

jdblischak
Copy link
Collaborator

@jdblischak jdblischak commented Mar 14, 2024

Closes #189

xref: #188 (comment)

I added a simple formula interface, rmst.formula, and converted rmst() to an S3 generic function. I say simple because it only extracts the variable names from the formula, and then uses these to call rmst.default(). Converting to the S3 generic required changing the first argument name to something that could stand for a data frame or a formula. I chose x since this is a common S3 pattern (eg see ?aggregate). The main downside I see of making rmst() an S3 generic is that its first argument is no longer data like all the other test functions like wlr() and maxcombo(). UPDATE: After feedback, I instead added an additional argument formula = NULL to the existing rmst() function

Please give it a try and let me know how it can be improved.

Copy link
Collaborator

@LittleBeannie LittleBeannie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello John, I am thrilled to see the remarkable progress that has been achieved! I only have one minor comment to add. This topic would make an excellent presentation for Friday's meeting, where we can gather input and suggestions regarding the formula choice between month ~ event + trt or Surv(month, event, trt).

Thank you for your diligent efforts in moving this forward. It's truly outstanding work. I am fine with merging this pull request first, considering the numerous changes we already have in place.

R/rmst.R Outdated
#'
#' # Formula interface
#' rmst(
#' month ~ evntd + trt,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns regarding whether this formula makes statistical sense. Typically, the ~ symbol is used when fitting a linear model, as in y ~ x1 + x2. However, in the context of rmst(), where the object is a survival object, I am wondering if something like a Surv(month, event, trt) object would be more appropriate and meaningful.

Reference: https://www.rdocumentation.org/packages/survival/versions/2.11-4/topics/Surv

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That example is to demonstrate that only the order matters. month ~ evntd + trt is equivalent to Surv(month, event, trt). I'm open to updating the documentation in order to make this more clear to end users. I did my best to explain in the Details section that the formula syntax chosen is purely to indicate intent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to address your feedback in 3ccb493

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, John!

@nanxstats
Copy link
Collaborator

This is great start. Just a few design and style comments that can be easily addressed once the decision is made:

  1. I think data should be the first argument, Hadley also has a similar opinion.
  2. Would it make sense to name the formula argument as formula instead of x, to be obvious, and to be consistent with other model fitting packages?
  3. Make the argument only accept formula, instead of creating a dependency between arguments.
  4. If we enforce a single, strict formula syntax that is consistent with the conventions, I think it would encourage people to write correct code. For example, Surv(time, event) ~ treatment.
    Although we can still interpret it not literally, by only using the variable names, like your current implementation.
    tidymodels/censored also has thoughts on related issues.

@jdblischak
Copy link
Collaborator Author

jdblischak commented Mar 15, 2024

  1. I think data should be the first argument, Hadley also has a similar opinion.
  2. Would it make sense to name the formula argument as formula instead of x, to be obvious, and to be consistent with other model fitting packages?
  3. Make the argument only accept formula, instead of creating a dependency between arguments.

These are all objections to making rmst() an S3 generic. The alternative would be to have two separate functions. Your bullet points describe essentially what I had in the first commit of this PR, 0bbdaa4, prior to converting to an S3 generic

  1. If we enforce a single, strict formula syntax that is consistent with the conventions, I think it would encourage people to write correct code. For example, Surv(time, event) ~ treatment.
    Although we can still interpret it not literally, by only using the variable names, like your current implementation.
    tidymodels/censored also has thoughts on related issues.

I worry about the implementation. Dealing with formulas/language objects is difficult to do robustly. I think the end users need to know that Surv() is not executed. For example, what if a user used the named arguments and put them out of order, eg Surv(event = x, time2 = y, time = z). This would be expected to work, but would fail in our setup.

Advantages:

* Less complex
* Able to use more informative arg names like `data` and `formula`
* Still easily pipe `data` as first arg even when using formula arg
Copy link
Collaborator

@LittleBeannie LittleBeannie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thanks!

Copy link
Collaborator

@nanxstats nanxstats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Thanks for the update that simplifies things 💯

@nanxstats nanxstats merged commit 2c015f6 into Merck:main Mar 19, 2024
7 checks passed
@jdblischak jdblischak deleted the rmst-formula branch March 19, 2024 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create formula interface for RMST test
3 participants