-
Notifications
You must be signed in to change notification settings - Fork 0
/
boot.Rmd
134 lines (106 loc) · 3.31 KB
/
boot.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
title: Introduction to bootstrapping with (a)rcv.gmlnet
output:
rmarkdown::html_vignette:
toc_float: true
vignette: >
%\VignetteIndexEntry{Introduction to bootstrapping (a)rcv.glmnet}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography:
- bibliography.bib
- rpackages.bib
---
```{r setup, include = FALSE}
Sys.setenv(LANGUAGE = "en")
library("ameld")
knitr::write_bib("glmnet", "rpackages.bib")
```
**Authors**: `r packageDescription("ameld")[["Author"]] `<br />
**Last modified:** `r file.info("ameld.Rmd")$mtime`<br />
**Compiled**: `r date()`
# Introduction
The `ameld` R package extends `glmnet::cv.glmnet` [@R-glmnet; @glmnet2010].
It supports a repeated cross-validation (`rcv.glmnet`) and
a repeated cross-validation to tune *alpha* and *lambda* simultaneously
(`arcv.glmnet`). Additionally it provides a `bootstrap` function that could
utilize both functions and supports survival data as described in @harrell1996.
# Dataset
We use the `eldd` dataset provided by `ameld` (see `?eldd` for details) and
standardize it using the `zlog` [@hoffmann2017] method.
```{r dataset}
library("ameld")
library("zlog")
data(eldd)
data(eldr)
# transform reference data.frame for zlog
r <- eldr[c("Code", "AgeDays", "Sex", "LowerLimit", "UpperLimit")]
names(r) <- c("param", "age", "sex", "lower", "upper")
r$age <- r$age / 365.25
r <- set_missing_limits(r)
## we just want to standardize laboratory values
cn <- colnames(eldd)
cnlabs <- cn[grepl("_[SCEFQ1]$", cn)]
zeldd <- eldd
zeldd[c("Age", "Sex", cnlabs)] <- zlog_df(eldd[, c("Age", "Sex", cnlabs)], r)
zeldd[c("Age", "Sex", cnlabs)] <- impute_df(zeldd[c("Age", "Sex", cnlabs)], r)
zeldd <- na.omit(zeldd)
```
# Bootstrapping
Next we apply the bootstrapping. In general the number of bootstrap samples
`nboot` should be equal or larger than 100.
We use a much smaller number here to keep the runtime low.
```{r boot}
library("future")
srv <- Surv(zeldd$DaysAtRisk, zeldd$Deceased)
zeldd$DaysAtRisk <- zeldd$Deceased <- NULL
x <- data.matrix(zeldd)
bt <- bootstrap(
x, srv,
fun = rcv.glmnet,
family = "cox",
nboot = 3,
nfolds = 3,
nrep = 2
)
```
We could show an optimism corrected calibration curve.
```{r plotcal}
plot(bt, what = "calibration")
```
Additionally we could see which variables are selected in each bootstrap step.
```{r plotsel}
plot(bt, what = "selected")
```
## Automatically select best alpha in each Bootstrapping Step.
It is possible to use `arcv.glmnet` to automatically select the best alpha in
each bootstrap step.
```{r eval = FALSE}
selarcv <- function(...) {
dots <- list(...)
a <- arcv.glmnet(...)
i <- which.min.error(a, s = dots$s, maxnnzero = dots$maxnnzero)
a$models[[i]]
}
bt <- bootstrap(
x, srv,
fun = selarcv,
family = "cox",
alpha = seq(0, 1, len = 11)^3,
s = "lambda.1se",
maxnnzero = 9,
nboot = 10L, nfolds = 3, nrep = 5,
m = 50, times = 90
)
```
# Acknowledgment
This work is part of the [AMPEL](https://ampel.care/en/)
(Analysis and Reporting System for the Improvement of Patient Safety through
Real-Time Integration of Laboratory Findings) project.
This measure is co-funded with tax revenues based on the budget adopted by
the members of the Saxon State Parliament.
# Session Information
```{r sessionInfo}
sessionInfo()
```
# References