Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed tidy method for Cumulative Incidence curves calcualted via surv… #180

Merged
merged 1 commit into from Dec 23, 2016

Conversation

MarcusWalz
Copy link
Contributor

@MarcusWalz MarcusWalz commented Dec 20, 2016

Execution stopped when running the cumulative incidence (CI) example in the doc:

> fitCI <- survfit(Surv(stop, status * as.numeric(event), type = "mstate") ~ 1,
+                  data = mgus1, subset = (start == 0))
> td_multi <- tidy(fitCI)
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 237, 0, 711

Before the code was 1 - CI estimate, this isn't the correct behavior because survival is not the same as incidence. "Censorship" parts of the matrices outputted in the survfit object are omitted --- state == "". Ideally CI and KM would have their own functions in the survival library because the behavior is quite different. Current code executed and tested by comparing default plot method for a survfit with a ggplot method constructed with tidied survfit (with and without a "strata"). The example in the documentation executes as desired:

> fitCI <- survfit(Surv(stop, status * as.numeric(event), type = "mstate") ~ 1,
+                  data = mgus1, subset = (start == 0))
> td_multi <- tidy(fitCI)
> head(td_multi)
  time n.risk n.event n.censor estimate std.error conf.high conf.low state
1    6      0       0        0        0         0         0        0     2
2    7      0       0        0        0         0         0        0     2
3   31      0       0        0        0         0         0        0     2
4   32      0       0        0        0         0         0        0     2
5   39      0       0        0        0         0         0        0     2
6   60      0       0        0        0         0         0        0     2
> tail(td_multi)
     time n.risk n.event n.censor  estimate  std.error conf.high  conf.low state
469 12689      0       0        1 0.6735070 0.03126710 0.7293816 0.6060959     3
470 12931      0       1        0 0.6856106 0.03183376 0.7422024 0.6165956     3
471 13019      0       0        1 0.6856106 0.03183376 0.7422024 0.6165956     3
472 13152      0       1        0 0.7017487 0.03241725 0.7589737 0.6309371     3
473 14111      0       1        0 0.7178868 0.03136419 0.7731231 0.6492025     3
474 14325      0       0        1 0.7178868 0.03136419 0.7731231 0.6492025     3

…fit`

Execution stopped when running the cumulative incidence (CI) example in the doc: 

```R
> fitCI <- survfit(Surv(stop, status * as.numeric(event), type = "mstate") ~ 1,
+                  data = mgus1, subset = (start == 0))
> td_multi <- tidy(fitCI)
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 237, 0, 711
```

Before the code was `1 - CI estimate`, this isn't the correct behavior because survival is not the same as incidence. "Censorship" parts of the  matrices outputted in the `survfit` object are omitted. Ideally CI and KM would have their own functions in the survival library because the behavior is quite different. Current code executed and tested by comparing default plot method for a `survfit` with a `ggplot` method without and without a strata.  The example in the documentation executes as desired:

```R
> fitCI <- survfit(Surv(stop, status * as.numeric(event), type = "mstate") ~ 1,
+                  data = mgus1, subset = (start == 0))
> td_multi <- tidy(fitCI)
> head(td_multi)
  time n.risk n.event n.censor estimate std.error conf.high conf.low state
1    6      0       0        0        0         0         0        0     2
2    7      0       0        0        0         0         0        0     2
3   31      0       0        0        0         0         0        0     2
4   32      0       0        0        0         0         0        0     2
5   39      0       0        0        0         0         0        0     2
6   60      0       0        0        0         0         0        0     2
> tail(td_multi)
     time n.risk n.event n.censor  estimate  std.error conf.high  conf.low state
469 12689      0       0        1 0.6735070 0.03126710 0.7293816 0.6060959     3
470 12931      0       1        0 0.6856106 0.03183376 0.7422024 0.6165956     3
471 13019      0       0        1 0.6856106 0.03183376 0.7422024 0.6165956     3
472 13152      0       1        0 0.7017487 0.03241725 0.7589737 0.6309371     3
473 14111      0       1        0 0.7178868 0.03136419 0.7731231 0.6492025     3
474 14325      0       0        1 0.7178868 0.03136419 0.7731231 0.6492025     3
```
@MarcusWalz
Copy link
Contributor Author

Seems like a lot of changes are being made to multistate survfit in the survival package and backwards compatibility is likely broken. So the previous tidy code was likely broken by version 2.40 of survival which was released in late October and my patch probably won't work for the earlier version. travis-ci fails because it is using version 2.39.

@dgrtwo dgrtwo merged commit 9be8dc8 into tidymodels:master Dec 23, 2016
@dgrtwo
Copy link
Collaborator

dgrtwo commented Dec 23, 2016

thanks!

@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants