/
A205-CRM.Rmd
398 lines (287 loc) · 14 KB
/
A205-CRM.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
---
title: "Continual Reassessment Method"
output:
rmarkdown::html_vignette:
df_print: tibble
bibliography: library.bib
vignette: >
%\VignetteIndexEntry{Continual Reassessment Method}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
The `escalation` package by Kristian Brock.
Documentation is hosted at https://brockk.github.io/escalation/
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 7, fig.height = 5
)
options(rmarkdown.html_vignette.check_title = FALSE)
```
# Introduction
The continual reassessment method (CRM) was introduced by @OQuigley1990.
It has proved to be a truly seminal dose-finding design, spurring many revisions, variants and imitations.
# Summary of the CRM Design
Pinning a finger on what is _the_ CRM design is complicated because there have been so many versions over the years.
At its simplest, the CRM is a dose-escalation design that seeks a dose with probability of toxicity closest to some pre-specified target toxicity rate, $p_T$, in an homogeneous patient group.
The hallmark that unifies the CRM variants is the assumption that the probability of toxicity, $p_i$, at the $i$th dose, $x_i$, can be modelled using a smooth mathematical function:
$$ p_i = F(x_i, \theta), $$
where $\theta$ is a general vector of parameters.
Priors are specified on $\theta$, and the dose with posterior estimate of $p_i$ closest to $p_T$ is iteratively recommended to the next patient(s).
Different variants of the CRM use different forms for $F$.
We consider those briefly now.
### Hyperbolic tangent model
@OQuigley1990 first introduced the method using
$$ p_i = \left( \frac{\tanh{(x_i)} + 1}{2} \right)^\beta, $$
with an exponential prior was placed on $\beta$.
### Empiric model (aka power model)
$$ p_i = x_i^{\exp(\beta)}, $$
for dose $x_i \in (0, 1)$;
### One-parameter logistic model
$$ \text{logit} p_i = a_0 + \exp{(\beta)} x_i, $$
where $a_0$ is a pre-specified constant, and $x_i \in \mathbb{R}$.
### Two-parameter logistic model
$$ \text{logit} p_i = \alpha + \exp{(\beta)} x_i, $$
for $x_i \in \mathbb{R}$.
Other model considerations include:
### Priors
In each of these models, different distributions may be used for the parameter priors.
### Toxicity skeletons and standardised doses
The $x_i$ dose variables in the models above do not reflect the raw dose quantities given to patients.
For example, if the dose is 10mg, we do not use `x=10`.
Instead, a _skeleton_ containing estimates of the probabilities of toxicity at the doses is identified.
This skeleton could reflect the investigators' prior expectations of toxicities at all of the doses; or it could reflect their expectations for some of the doses with the others interpolated in some plausible way.
The $x_i$ are then calculated so that the model-estimated probabilities of toxicity with the parameters taking their prior mean values match the skeleton.
This will be much clearer with an example.
In a five dose setting, let the skeleton be $\pi = (0.05, 0.1, 0.2, 0.4, 0.7)$.
That is, the investigators believe before the trial commences that the probbaility of toxicity at the second dose is 10%, and so on.
Let us assume that we are using a one-parameter logistic model with $a_0 = 3$ and $\beta ~ \sim N(0, 1)$.
Then we require that
$$ \text{logit} \pi_i = 3 + e^0 x_i = 3 + x_i, $$
i.e.
$$ x_i = \text{logit} \pi_i - 3. $$
This yields the vector of standardised doses $x = (-5.94, -5.20, -4.39, -3.41, -2.15)$.
Equivalent transformations can be derived for the other model forms.
The $x_i$ are then used as covariates in the model-fitting.
CRM users specify their skeleton, $\pi$, and their parameter priors.
From these, the software calculates the $x_i$.
The actual doses given to patients in SI units do not actually feature in the model.
# Implementation in `escalation`
`escalation` simply aims to give a common interface to dose-selection models to facilitate a grammar for specifying dose-finding trial designs.
Where possible, it delegates the mathematical model-fitting to existing R packages.
There are several R packages that implement CRM models.
The two used in `escalation` are the `dfcrm` package [@Cheung2011; @dfcrm]; and the `trialr` package [@Brock2019; @trialr].
They have different strengths and weaknesses, so are suitable to different scenarios.
We discuss those now.
### dfcrm
[`dfcrm`](https://cran.r-project.org/package=dfcrm) offers:
* the **empiric** model with normal prior on $\beta$;
* the **one-parameter logistic** model with normal prior on $\beta$.
`dfcrm` models are fit in `escalation` using the `get_dfcrm` function.
Examples are given below.
### trialr
[`trialr`](https://cran.r-project.org/package=trialr) offers:
* the **empiric** model with normal prior on $\beta$;
* the **one-parameter logistic** model with normal prior on $\beta$;
* the **one-parameter logistic** model with gamma prior on $\beta$;
* the **two-parameter logistic** model with normal priors on $\alpha$ and $\beta$.
`trialr` models are fit in `escalation` using the `get_trialr_crm` function.
Let us commence by replicating an example from p.21 of @Cheung2011.
They choose the following parameters:
```{r}
skeleton <- c(0.05, 0.12, 0.25, 0.40, 0.55)
target <- 0.25
a0 <- 3
beta_sd <- sqrt(1.34)
```
Let us define a model fitter using the `dfcrm` package:
```{r, message=FALSE}
library(escalation)
model1 <- get_dfcrm(skeleton = skeleton, target = target, model = 'logistic',
intcpt = a0, scale = beta_sd)
```
and a fitter using the `trialr` package:
```{r}
model2 <- get_trialr_crm(skeleton = skeleton, target = target, model = 'logistic',
a0 = a0, beta_mean = 0, beta_sd = beta_sd)
```
Names for the function parameters `skeleton`, `target`, and `model` are standardised by `escalation` because they are fundamental.
Further parameters (i.e. those in the second lines of each of the above examples) are passed onwards to the model-fitting functions in `dfcrm` and `trialr`.
You can see that some of these parameter names vary between the approaches.
E.g., what `dfcrm` calls the `intcpt`, `trialr` calls `a0`.
Refer to the documentation of the `crm` function in `dfcrm` and `stan_crm` in `trialr` for further information.
We then fit those models to the notional outcomes described in the source text:
```{r}
outcomes <- '3N 5N 5T 3N 4N'
fit1 <- model1 %>% fit(outcomes)
fit2 <- model2 %>% fit(outcomes)
```
The dose recommended by each of the models for the next patient is:
```{r}
fit1 %>% recommended_dose()
fit2 %>% recommended_dose()
```
Thankfully, the models agree.
They advocate staying at dose 4, wary of the toxicity already seen at dose 5.
If we take a summary of each model fit:
```{r}
fit1 %>% summary()
```
```{r}
fit2 %>% summary()
```
We can see that they closely agree on model estimates of the probability of toxicity at each dose.
Note that the median perfectly matches the mean in the `dfcrm` fit because it assumes a normal posterior distribution on $\beta$.
In contrast, the `trialr` class uses `Stan` to fit the model using Hamiltonian Monte Carlo sampling.
The posterior distributions for the probabilities of toxicity are evidently non-normal and positively-skewed because the median estimates are less than the mean estimates.
Let us imagine instead that we want to fit the empiric model.
That simply requires we change the `model` variable and adjust the prior parameters:
```{r}
model3 <- get_dfcrm(skeleton = skeleton, target = target, model = 'empiric',
scale = beta_sd)
model4 <- get_trialr_crm(skeleton = skeleton, target = target, model = 'empiric',
beta_sd = beta_sd)
```
Fitting each to the same set of outcomes yields:
```{r}
fit3 <- model3 %>% fit(outcomes)
fit4 <- model4 %>% fit(outcomes)
```
```{r}
fit3 %>% summary()
```
```{r}
fit4 %>% summary()
```
In this example, the model estimates are broadly consistent across methodology and model type.
However, this is not the general case.
To illustrate this point, let us examine a two parameter logistic model fit using `trialr` (note: this model is not implemented in `dfcrm`):
```{r}
model5 <- get_trialr_crm(skeleton = skeleton, target = target, model = 'logistic2',
alpha_mean = 0, alpha_sd = 2, beta_mean = 0, beta_sd = 1)
fit5 <- model5 %>% fit(outcomes)
fit5 %>% summary()
```
Now the estimate of toxicity at the highest dose is high relative to the other models.
The extra free parameter in the two-parameter model offers more flexibility.
There has been a debate in the literature about one-parameter vs two-parameter models (and possibly more).
It is generally accepted that a single parameter model is too simplistic to accurately estimate $p_i$ over the entire dose range.
However, it will be sufficient to identify the dose closest to $p_T$, and if that is the primary objective of the trial, the simplicity of a one-parameter model may be entirely justified.
The interested reader is directed to @OQuigley1990 and @Neuenschwander2008.
Note that the CRM does not natively implement stopping rules, so these classes on their own will always advocate trial continuance:
```{r}
fit1 %>% continue()
fit5 %>% continue()
```
and identify each dose as admissible:
```{r}
fit1 %>% dose_admissible()
```
This behaviour can be altered by appending classes to advocate stopping for consensus:
```{r}
model6 <- get_trialr_crm(skeleton = skeleton, target = 0.3, model = 'empiric',
beta_sd = 1) %>%
stop_when_n_at_dose(dose = 'recommended', n = 6)
fit6 <- model6 %>% fit('2NNN 3TTT 2NTN')
fit6 %>% continue()
fit6 %>% recommended_dose()
```
Or for stopping under excess toxicity:
```{r}
model7 <- get_trialr_crm(skeleton = skeleton, target = 0.3, model = 'empiric',
beta_sd = 1) %>%
stop_when_too_toxic(dose = 1, tox_threshold = 0.3, confidence = 0.8)
fit7 <- model7 %>% fit('1NTT 1TTN')
fit7 %>% continue()
fit7 %>% recommended_dose()
fit7 %>% dose_admissible()
```
Or both:
```{r}
model8 <- get_trialr_crm(skeleton = skeleton, target = 0.3, model = 'empiric',
beta_sd = 1) %>%
stop_when_n_at_dose(dose = 'recommended', n = 6) %>%
stop_when_too_toxic(dose = 1, tox_threshold = 0.3, confidence = 0.8)
```
For more information, check the package README and the other vignettes.
### dfcrm vs trialr
So which method should you use?
The answer depends on how you plan to use the models.
The `trialr` classes produce posterior samples:
```{r}
fit7 %>%
prob_tox_samples(tall = TRUE) %>%
head()
```
and these facilitate flexible visualisation:
```{r, message=FALSE}
library(ggplot2)
library(dplyr)
fit7 %>%
prob_tox_samples(tall = TRUE) %>%
mutate(.draw = .draw %>% as.integer()) %>%
filter(.draw <= 200) %>%
ggplot(aes(dose, prob_tox)) +
geom_line(aes(group = .draw), alpha = 0.2)
```
However, MCMC sampling is an expensive computational procedure compared to the numerical integration used in `dfcrm`.
If you envisage fitting lots of models, perhaps in simulations or dose-paths (see below) and favour a model offered by `dfcrm`, we recommend using `get_dfcrm`.
However, if you favour a model only offered by `trialr`, or if you are willing for calculation to be slow in order to get posterior samples, then use `trialr`.
## Dose paths
We can use the `get_dose_paths` function in `escalation` to calculate exhaustive model recommendations in response to every possible set of outcomes in future cohorts.
For instance, at the start of a trial using an empiric CRM, we can examine all possible paths a trial might take in the first two cohorts of three patients, starting at dose 2:
```{r, message=FALSE}
skeleton <- c(0.05, 0.12, 0.25, 0.40, 0.55)
target <- 0.25
beta_sd <- 1
model <- get_dfcrm(skeleton = skeleton, target = target, model = 'empiric',
scale = beta_sd)
paths <- model %>% get_dose_paths(cohort_sizes = c(3, 3), next_dose = 2)
graph_paths(paths)
```
We see that the design would willingly skip dose 3 if no tox is seen in the first cohort.
This might warrant suppressing dose-dkipping by appending a `dont_skip_doses(when_escalating = TRUE)` selector.
Dose-paths can also be run for in-progress trials where some outcomes have been established.
For more information on working with dose-paths, refer to the dose-paths vignette.
## Simulation
We can use the `simulate_trials` function to calculate operating characteristics for a design.
Let us use the example above and tell the design to stop when the lowest dose is too toxic, when 9 patients have already been evaluated at the candidate dose, or when a sample size of $n=24$ is reached:
```{r}
model <- get_dfcrm(skeleton = skeleton, target = target, model = 'empiric',
scale = beta_sd) %>%
stop_when_too_toxic(dose = 1, tox_threshold = target, confidence = 0.8) %>%
stop_when_n_at_dose(dose = 'recommended', n = 9) %>%
stop_at_n(n = 24)
```
For the sake of speed, we will run just fifty iterations:
```{r}
num_sims <- 50
```
In real life, however, we would naturally run many thousands of iterations.
Let us investigate under the following true probabilities of toxicity:
```{r}
sc1 <- c(0.25, 0.5, 0.6, 0.7, 0.8)
```
The simulated behaviour is:
```{r}
set.seed(123)
sims <- model %>%
simulate_trials(num_sims = num_sims, true_prob_tox = sc1, next_dose = 1)
sims
```
We see that the chances of stopping for excess toxicity and recommending no dose is about 1-in-4.
Dose 1 is the clear favourite to be identified.
Interestingly, the `stop_when_n_at_dose` class reduces the expected sample size to 12-13 patints.
Without it:
```{r}
get_dfcrm(skeleton = skeleton, target = target, model = 'empiric',
scale = beta_sd) %>%
stop_when_too_toxic(dose = 1, tox_threshold = target, confidence = 0.8) %>%
stop_at_n(n = 24) %>%
simulate_trials(num_sims = num_sims, true_prob_tox = sc1, next_dose = 1)
```
expected sample size is much higher and the chances of erroneously stopping early are also higher.
These phenomena would justify a wider simulation study in a real situation.
For more information on running dose-finding simulations, refer to the simulation vignette.
# References