The cdnbcr package provides tools for modeling time-to-event data in
the presence of a cure fraction, considering correlated competing risks.
The Correlated Destructive Negative Binomial Cure Rate (CDNBCR) model
describes the time until an event occurs while accounting for a cure
fraction, assuming multiple initial competing causes with potential
correlation. The model is destructive in the sense that some initial
competing causes contributing to the occurrence of the event can be
eliminated. Thus, the model is particularly useful in applications where
interventions (e.g., treatments) may eliminate or inactivate some
competing causes. Among the package’s contributions, we implemented an
Expectation-Maximization (EM) algorithm for maximum likelihood
estimation of the model parameters and we provide diagnostic plots based
on Cox-Snell residuals.
You can install the current development version of cdnbcr from
GitHub with:
devtools::install_github("rdmatheus/cdnbcr")To run the above command, it is necessary that the devtools package is
previously installed on R. If not, install it using the following
command:
install.packages("devtools")After installing the devtools package, if you are using Windows, install
the most current
RTools program.
Finally, run the command devtools::install_github("rdmatheus/cdnbcr"),
and then the package will be installed on your computer.
The cdnbcr package provides functions for estimation and inference for
the parameters as well as plot tools for assessing the goodness-of-fit
of the model. Below is an example of some functions usage and available
methods.
## Loading packages
library(cdnbcr)
library(survival)The e1690 object contains observations of a well-known dataset from a
Phase III cutaneous melanoma clinical trial. The study was conducted by
the Eastern Cooperative Oncology Group (ECOG) and aimed to evaluate the
effectiveness of the Interferon alpha-2b chemotherapy in preventing
recurrence after surgery. The observations include 417 patients from
1991 to 1995, with follow-up until 1998.
The main function of the package is the cdnbcr() function, which
implements the EM algorithm for the estimation of the CDNBCR model
parameters.
## Correlated destructive fit
fit <- cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV - 1 |
sex + trt + thickness + age, data = e1690)
class(fit)
#> [1] "cdnbcr"It returns an object of the "cdnbcr" class, which has common usage
methods (e.g., print, summary, plot, among others). The methods
currently available for the "cdnbcr" class are presented below.
methods(class = "cdnbcr")
#> [1] AIC coef logLik model.frame model.matrix
#> [6] plot predict print residuals summary
#> [11] vcov
#> see '?methods' for accessing help and source code## print
fit
#> Call:
#> cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV -
#> 1 | sex + trt + thickness + age, data = e1690)
#>
#> Regression model for theta (log link):
#> nodeII nodeIII nodeIV
#> 1.100414 1.592870 2.418928
#>
#> Regression model for p (logit link):
#> (Intercept) sexFemale trtChemotherapy thickness age
#> -0.8283426 -1.9336639 0.5597650 0.4299035 0.0533924
#>
#> Dispersion, dependence, and baseline parameters:
#> phi alpha mu sigma
#> 2.598226 0.9781685 3.394487 2.287898
#>
#> ---
#> Log-likelihood: -502.7273
#> AIC: 1029.455 BIC: 1077.852
#> Number of EM iterations:
## summary
summary(fit)
#> Call:
#> cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV -
#> 1 | sex + trt + thickness + age, data = e1690)
#>
#> Summary for residuals:
#> Min 1Q Median 3Q Max
#> 0.0000000 0.0000000 0.3721069 0.9213613 1.2807131
#>
#> Regression model for theta (log link):
#> Estimate Std. error t value Pr(>|t|)
#> nodeII 1.10041 0.44635 2.4654 0.013687 *
#> nodeIII 1.59287 0.54042 2.9475 0.003204 **
#> nodeIV 2.41893 0.51421 4.7042 2.5e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Regression model for p (logit link):
#> Estimate Std. error t value Pr(>|t|)
#> (Intercept) -0.828343 1.679746 -0.4931 0.6219
#> sexFemale -1.933664 1.299001 -1.4886 0.1366
#> trtChemotherapy 0.559765 0.788898 0.7096 0.4780
#> thickness 0.429904 0.363671 1.1821 0.2372
#> age 0.053392 0.033935 1.5734 0.1156
#>
#> Dispersion, dependence, and baseline parameters:
#> phi alpha mu sigma
#> Estimate 2.5982261 0.9781685 3.3944872 2.2878976
#> Std. error 0.7441392 0.0594861 0.3503527 0.2204368
#>
#> ---
#> Log-lik value: -502.7273
#> AIC: 1029.455 and BIC: 1077.852
#> EM iterations:
## plot
par(mfrow = c(1, 2))
plot(fit, ask = FALSE)par(mfrow = c(1, 1))By default, the predict() function returns the fitted survival time or
the fitted cure rate for the data that has been observed. If new
individuals are added to the sample, the survival function and cure rate
can also be estimated with the predict() function. See the example
below
## Suppose we are interested in obtaining predictions for the following individuals:
newdata <- data.frame(trt = factor(c("Control", "Control", "Chemotherapy", "Chemotherapy")),
age = median(e1690$age),
sex = factor(c("Male", "Female", "Male", "Female")),
thickness = median(e1690$thickness),
nodeII = c(0, 0, 0, 0),
nodeIII = c(0, 0, 0, 0),
nodeIV = c(1, 1, 1, 1))
newdata
#> trt age sex thickness nodeII nodeIII nodeIV
#> 1 Control 47.1075 Male 3.1 0 0 1
#> 2 Control 47.1075 Female 3.1 0 0 1
#> 3 Chemotherapy 47.1075 Male 3.1 0 0 1
#> 4 Chemotherapy 47.1075 Female 3.1 0 0 1
## Fitted survival curves
pred <- predict(fit, newdata)
plot(pred[1, ], type = "l", ylim = c(0, 1), xlab = "Time", ylab = "Survival")
lines(pred[2, ], col = 2, lty = 2)
lines(pred[3, ], col = 3, lty = 3)
lines(pred[4, ], col = 4, lty = 4)
legend("topright", legend = c("trt: Control, sex: Male",
"trt: Control, sex: Female",
"trt: Chemotherapy, sex: Male",
"trt: Chemotherapy, sex: Female"),
col = 1:4, lty = 1:4)## Predicted cure rates
predict(fit, newdata, type = "cure")
#> [1] 0.2957616 0.4190912 0.2847148 0.3633011
## Predicted expected number of initial competing causes
predict(fit, newdata, type = "theta")
#> [1] 11.23381 11.23381 11.23381 11.23381
## Predicted activation probability of an initial competing cause
predict(fit, newdata, type = "p")
#> [1] 0.9534491 0.7476045 0.9728620 0.8383012The use of the EM algorithm also makes it possible to extract latent
variables underlying the definition of the destructive model, such as
the number of initial competing causes for the occurrence of the event
(called “M”) and the remaining causes that were not destroyed (called
“D”). These variables are made available in the "cdnbcr" object. See
below.
M <- fit$latent$M
plot(M, ylab = "Initial competing causes", pch = 16, cex = 0.8)D <- fit$latent$D
plot(D, ylab = "Remaining competing causes", pch = 16, cex = 0.8)The parameter cdnbcr() function with the argument alpha = FALSE, which sets the
parameter
fit0 <- cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV - 1 |
sex + trt + thickness + age, alpha = FALSE, data = e1690)
summary(fit0)
#> Call:
#> cdnbcr(formula = Surv(time, status) ~ nodeII + nodeIII + nodeIV -
#> 1 | sex + trt + thickness + age, data = e1690, alpha = FALSE)
#>
#> Summary for residuals:
#> Min 1Q Median 3Q Max
#> 0.0000000 0.0000000 0.3738455 0.9595133 1.2372406
#>
#> Regression model for theta (log link):
#> Estimate Std. error t value Pr(>|t|)
#> nodeII 0.98674 0.43676 2.2592 0.023869 *
#> nodeIII 1.50469 0.54006 2.7861 0.005334 **
#> nodeIV 2.33377 0.54286 4.2990 1.72e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Regression model for p (logit link):
#> Estimate Std. error t value Pr(>|t|)
#> (Intercept) -1.482456 1.389650 -1.0668 0.2861
#> sexFemale -0.675326 0.856211 -0.7887 0.4303
#> trtChemotherapy -0.525835 1.004527 -0.5235 0.6007
#> thickness 0.187150 0.208348 0.8983 0.3690
#> age 0.045449 0.037213 1.2213 0.2220
#>
#> Dispersion, dependence, and baseline parameters:
#> phi mu sigma
#> Estimate 2.565274 3.1279706 2.1749280
#> Std. error 1.008614 0.3881323 0.2319883
#>
#> ---
#> Log-lik value: -505.0737
#> AIC: 1032.147 and BIC: 1076.511
#> EM iterations:


