/
usage_sprtt.Rmd
274 lines (192 loc) · 10 KB
/
usage_sprtt.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
---
title: "How to use the sprtt package"
author: "Meike Steinhilber"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 4
description: >
This vignette describes the sequential t-test and the usage of the
`sprtt` package.
vignette: >
%\VignetteIndexEntry{How to use the sprtt package}
%\VignetteEncoding{UTF-8}{inputenc}
%\VignetteEngine{knitr::rmarkdown}
bibliography: references.bib
csl: "apa.csl"
---
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Note
- The data sets used below are included in the `sprtt` package. Thus, the data sets are available when the package is loaded.
- In the R code sections:
- `# comment`: is a comment
- `function()`: is R code
- `#> results of function()`: is console output
## Overview
The `sprtt` package is a **s**equential **p**robability **r**atio **t**ests **t**oolbox (**sprtt**). This vignette demonstrates the usage of the package using the `seq_ttest()` function and all arguments with short examples. The theoretical background of the sequential *t*-test will not be covered by this vignette.
The `seq_ttest()` function has arguments to specify the requested sequential *t*-test. The table below shows all possible combinations which can be performed with the package.
| | Two-sample test | One-sample test |
|----------------------------|-----------------|-----------------|
| Two-sided | x | x |
| One-sided | x | x |
| Paired (repeated measures) | x | |
Other recommended vignettes cover:
- the [theoretical background](https://meikesteinhilber.github.io/sprtt/articles/sequential_testing.html) and
- an extended [use case](https://meikesteinhilber.github.io/sprtt/articles/use-case.html).
## Installation
Prior to using the `sprtt` package it must be installed and loaded. The latest release version of the package can either be installed from CRAN or in the latest development version from GitHub. More information for the installation can be found [here](https://meikesteinhilber.github.io/sprtt/#installation).
```{r}
# load the package
library(sprtt)
```
## What does the package contain?
The `sprtt` package contains:
- `seq_ttest()` : a function which performs sequential *t*-tests
- `df_income`: a set of data to run the examples given in this vignette
- `df_stress`: a set of data to run the examples given in this vignette
- `df_cancer`: a set of data to run the examples given in this vignette
## How do I use the seq_ttest() function?
The `seq_ttest()` function works similarly to the `t.test()` function from the `stats` package if one is familiar with that already.
Sequential *t*-tests require some specification from the user:
- the variables, which contain the data,
- the error probability `alpha`,
- the `power` (1 - 𝛽),
- the effect size Cohen\`s `d`, which represents the expected effect size or the lowest effect size of interest, and
- optional arguments to further specify the test.
However, in some cases, it is not necessary to specify all arguments because some of them have default values. If these values are the ones required, they can be skipped.
| Argument | Default value | Input option |
|-------------|:--------------|:-------------------------------|
| x | \- | formula or numeric input |
| y | NULL | numeric vector |
| data | NULL | data frame |
| mu | 0 | numeric value |
| d | \- | numeric value |
| alpha | 0.05 | numeric value between 0 and 1 |
| power | 0.95 | numeric value between 0 and 1 |
| alternative | "two.sided" | "two.sided", "greater", "less" |
| paired | FALSE | TRUE or FALSE |
| na.rm | TRUE | TRUE or FALSE |
| verbose | TRUE | TRUE or FALSE |
There are two different ways how the data can be transferred into the function. The `x` argument takes either `formula` or `numeric` input. Which input option is recommended depends on the structure of the data.
### x argument: formula input
The `formula` input is used when both groups are merged in one variable and there is a second variable that indicates group membership. This input option uses the `x` argument and the `data` argument if the variables are stored in a data frame.
#### *Two-sample test*
```{r}
# show data frame --------------------------------------------------------------
head(df_income)
# sequential t-test: data argument ---------------------------------------------
seq_ttest(monthly_income ~ sex, # x argument
data = df_income,
d = 0.2)
```
#### *One-sample test*
To perform a one-sample test, the right side of the formula has to be 1. The `mu` argument is also required, which specifies the mean value that one wants to test against.
```{r}
# show data frame --------------------------------------------------------------
head(df_income)
# sequential t-test: data argument ---------------------------------------------
seq_ttest(monthly_income ~ 1, # x argument
mu = 3000,
d = 0.5,
data = df_income)
```
### x argument: numeric input
The `numeric` input is used when each group has its own variable. The variables can either be put in the global environment directly or be stored in a data frame.
#### *Two-sample test*
If one wants to perform a two-sample test, the `y` argument is required in addition to `x`. If the data are stored in a data frame, the `$` operator is essential to get access to the variables.
```{r}
# show data frame --------------------------------------------------------------
head(df_cancer)
# sequential t-test: $ operator ------------------------------------------------
seq_ttest(df_cancer$treatment_group, # x argument
df_cancer$control_group, # y argument
d = 0.3,
verbose = FALSE)
# sequential t-test: global variables ------------------------------------------
treatment <- df_cancer$treatment_group
control <- df_cancer$control_group
seq_ttest(treatment,
control,
d = 0.3,
verbose = FALSE)
```
#### *One-sample test*
If one wants to perform a one-sample test there is only one group and therefore only one variable. If the data are in a data frame, the `$` operator is essential to get access to the variables. The `mu` argument is additionally required, which specifies the mean which one wants to test against.
```{r}
# show data frame --------------------------------------------------------------
head(df_cancer)
# sequential t-test: $ operator ------------------------------------------------
seq_ttest(df_cancer$treatment_group, # x argument
mu = 3.5,
d = 0.2,
verbose = FALSE)
# sequential t-test: global variables ------------------------------------------
treatment <- df_cancer$treatment_group
seq_ttest(treatment, # x argument
mu = 3.5,
data = df,
d = 0.2,
verbose = FALSE)
```
### Further arguments
#### Paired
The `paired` argument states if the data are paired. To perform a paired sequential *t*-test, `paired` has to be set to `TRUE`.
```{r}
# show data frame --------------------------------------------------------------
head(df_stress)
# sequential t-test: $ operator ------------------------------------------------
seq_ttest(df_stress$baseline_stress, # x argument
df_stress$one_year_stress, # y argument
d = 0.3,
paired = TRUE,
data = df_stress)
```
#### Alternative
The `alternative` argument states in which way the test is to be performed:
- two-sided: `"two.sided"` or
- one-sided: `"less"` or `"greater"`.
```{r}
# show data frame --------------------------------------------------------------
head(df_income)
# sequential t-test: data argument ---------------------------------------------
seq_ttest(monthly_income ~ 1, # x argument
mu = 1000,
d = 0.3,
alternative = "two.sided",
data = df_income)
```
#### Na.rm
The `na.rm` argument defines the handling of missing values. If set to `TRUE` (default value), it will remove all missing values automatically. If this behavior is not wanted, the `na.rm` argument has to be set to `FALSE`. If missing values are discovered, an error is triggered.
#### Verbose
The `verbose` argument defines the extent of the output shown in the console. If set to `TRUE` (default value), the output will be elaborate, if set to `FALSE` the output will be short.
```{r}
# sequential t-test: verbose FALSE ---------------------------------------------
seq_ttest(df_cancer$treatment_group, # x argument
df_cancer$control_group, # y argument
d = 0.3,
verbose = FALSE)
# sequential t-test: verbose TRUE ----------------------------------------------
seq_ttest(df_cancer$treatment_group, # x argument
df_cancer$control_group, # y argument
d = 0.3,
verbose = TRUE)
```
## How do I get access to the results?
The simplest way to get access to the results is to run the `seq_ttest()` function. It will print the results automatically in the console. The verbose argument specifies how much information is wished to be shown.
However, the recommended way is to save the results in an object (e.g "results"). Doing so allows running further calculations with it afterward. It is important to keep in mind that the output object will be an S4 class. Therefore the access operator is the `@` sign or the `[]` brackets.
```{r}
# save the resuts in a object
results <- seq_ttest(df_cancer$treatment_group,
df_cancer$control_group,
d = 0.3)
# access the object with the @ operator
results@decision
# access the object with the [] brackets
results["decision"]
```