-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathprocs.Rmd
More file actions
352 lines (288 loc) · 12.8 KB
/
Copy pathprocs.Rmd
File metadata and controls
352 lines (288 loc) · 12.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
---
title: "procs"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{procs}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The **procs** package contains functions that replicate procedures from
SAS® software. The intention of the package is to ease R
adoption by providing SAS® programmers a familiar conceptual framework and
functions that produce nearly identical output. Along the way, the
functions in the **procs** package also provide much nicer output
than many R statistical functions.
## Key Functions
The package includes the following functions:
* `proc_freq()`: A function to simulate the FREQ procedure.
* `proc_means()`: A function to simulate the MEANS or SUMMARY procedure.
* `proc_ttest()`: A function to perform T-Tests and calculate confidence limits.
* `proc_reg()`: A function to perform a regression.
* `proc_transpose()`: A function to pivot data similar in syntax to the
TRANSPOSE procedure.
* `proc_sort()`: A function to sort and dedupe datasets.
* `proc_print()`: A quick-print function to send procedure results to the viewer
or a report.
## How to Use
### Frequency Statistics
The `proc_freq()` function generates frequency statistics in a manner
similar to SAS® PROC FREQ. You can request one-way and two-way frequency
tables using the `tables` parameter. Frequencies may be weighted using
the `weight` parameter. Simply pass the name of the variable that contains
the weighted values. There are many
options to control the output data and display.
Note that the `options("procs.print" = FALSE)` global option
has been added to these examples to allow the **procs** package to pass
CRAN checks. When running sample code yourself,
remove this option or set it to TRUE.
```{r eval=FALSE, echo=TRUE}
library(procs)
# Turn off printing for CRAN checks
options("procs.print" = FALSE)
# Prepare sample data
dt <- as.data.frame(HairEyeColor, stringsAsFactors = FALSE)
# Assign labels
labels(dt) <- list(Hair = "Hair Color",
Eye = "Eye Color")
# Produce frequency statistics
res <- proc_freq(dt, tables = v(Hair, Eye, Hair * Eye),
weight = Freq,
output = report,
options = chisq,
titles = "Hair and Eye Frequency Statistics")
```
When printing is allowed, the `proc_freq()` function will display the
following report in the viewer:
<img src="../man/images/freq1.png" alt = "Proc freq example"/>
The function call requested output datasets using the "report" option. If
there is more than one output dataset, they will be returned in a list.
The output dataset list can be seen below:
```{r eval=FALSE, echo = TRUE}
# View output datasets
res
# $Hair
# CAT N CNT PCT CUMSUM CUMPCT
# 1 Black 592 108 18.24324 108 18.24324
# 2 Blond 592 127 21.45270 235 39.69595
# 3 Brown 592 286 48.31081 521 88.00676
# 4 Red 592 71 11.99324 592 100.00000
#
# $Eye
# CAT N CNT PCT CUMSUM CUMPCT
# 1 Blue 592 215 36.31757 215 36.31757
# 2 Brown 592 220 37.16216 435 73.47973
# 3 Green 592 64 10.81081 499 84.29054
# 4 Hazel 592 93 15.70946 592 100.00000
#
# $`Hair * Eye`
# CAT Statistic Blue Brown Green Hazel Total
# 1 Black Frequency 20.000000 68.000000 5.0000000 15.000000 108.00000
# 2 Black Percent 3.378378 11.486486 0.8445946 2.533784 18.24324
# 3 Black Row Pct 18.518519 62.962963 4.6296296 13.888889 NA
# 4 Black Col Pct 9.302326 30.909091 7.8125000 16.129032 NA
# 5 Blond Frequency 94.000000 7.000000 16.0000000 10.000000 127.00000
# 6 Blond Percent 15.878378 1.182432 2.7027027 1.689189 21.45270
# 7 Blond Row Pct 74.015748 5.511811 12.5984252 7.874016 NA
# 8 Blond Col Pct 43.720930 3.181818 25.0000000 10.752688 NA
# 9 Brown Frequency 84.000000 119.000000 29.0000000 54.000000 286.00000
# 10 Brown Percent 14.189189 20.101351 4.8986486 9.121622 48.31081
# 11 Brown Row Pct 29.370629 41.608392 10.1398601 18.881119 NA
# 12 Brown Col Pct 39.069767 54.090909 45.3125000 58.064516 NA
# 13 Red Frequency 17.000000 26.000000 14.0000000 14.000000 71.00000
# 14 Red Percent 2.871622 4.391892 2.3648649 2.364865 11.99324
# 15 Red Row Pct 23.943662 36.619718 19.7183099 19.718310 NA
# 16 Red Col Pct 7.906977 11.818182 21.8750000 15.053763 NA
# 17 Total Frequency 215.000000 220.000000 64.0000000 93.000000 592.00000
# 18 Total Percent 36.317568 37.162162 10.8108108 15.709459 100.00000
#
# $Chisq
# STAT DF VAL PROB
# 1 Chi-Square 9 138.2898 2.325287e-25
# 2 Continuity Adj. Chi-Square 9 138.2898 2.325287e-25
```
### Summary Statistics
The `proc_means()` function calculates summary statistics, similar to
the SAS® PROC MEANS procedure. The variable or variables to generate
statistics for is passed on the `var` parameter. The `class` parameter
tells the function to group results by the indicated variable. The
`by` parameter will subset the data according to the distinct by
values. Note that any class groupings are nested in the by.
```{r eval=FALSE, echo=TRUE}
# Perform calculations
res2 <- proc_means(dt, var = Freq,
class = Hair,
by = Sex,
options = c(maxdec = 4),
titles = "Hair and Eye Summary Statistics by Sex")
```
The following is sent to the viewer:
<img src="../man/images/means1.png" alt="Proc means example"/>
And here is the output dataset. Observe that the output dataset
is not identical to the displayed report. The output dataset has
been optimized for data manipulation, while the displayed
report has been optimized for viewing:
```{r eval=FALSE, echo = TRUE}
# View the summary statistics
res2
# BY CLASS TYPE FREQ VAR N MEAN STD MIN MAX
# 1 Female <NA> 0 16 Freq 16 19.5625 20.713824 2 66
# 2 Female Black 1 4 Freq 4 13.0000 15.599145 2 36
# 3 Female Blond 1 4 Freq 4 20.2500 29.216149 4 64
# 4 Female Brown 1 4 Freq 4 35.7500 21.884165 14 66
# 5 Female Red 1 4 Freq 4 9.2500 4.500000 7 16
# 6 Male <NA> 0 16 Freq 16 17.4375 16.008201 3 53
# 7 Male Black 1 4 Freq 4 14.0000 12.516656 3 32
# 8 Male Blond 1 4 Freq 4 11.5000 12.503333 3 30
# 9 Male Brown 1 4 Freq 4 35.7500 18.679311 15 53
# 10 Male Red 1 4 Freq 4 8.5000 1.732051 7 10
```
Notice that the output data contains breakouts by class and summaries
by group. The summaries by group are identifed by rows where TYPE = 0.
The breakouts by class are TYPE = 1. In other words, rows 1 and 6
provide summary statistics for the by groups FEMALE and MALE, while the other
rows provide statistics for each class category.
Also note that both `proc_freq()`
and `proc_means()` output datasets follow the convention of naming
columns according to the statistic or parameter they represent. This convention
is somewhat different from SAS®, but makes it easier to manipulate
the output data.
### Hypothesis Testing
Similar to PROC TTEST in SAS®, the `proc_ttest()` function performs
hypothesis testing and calculates confidence limits. The function can operate
on a single variable, two independent samples, or paired observations.
The paired observation test can be used in cases when you have samples on the
same measure at different time points, or before and after a procedure. This example
attempts to determine whether there is a significant change in systolic blood
pressure before and after a stimulus. It uses the following data:
```{r eval=FALSE, echo=TRUE}
# Create sample data
pressure <- read.table(header = TRUE, text = '
SBPbefore SBPafter
120 128
124 131
130 131
118 127
140 132
128 125
140 141
135 137
126 118
130 132
126 129
127 135
')
```
Now let's determine if there is a significant change before and after the stimulus.
For this type of T-Test test, we will use the `paired` parameter. To better
understand the data, we will also turn on the default plots:
```{r eval=FALSE, echo=TRUE}
# Perform T-Test
res3 <- proc_ttest(pressure, paired = "SBPbefore * SBPafter",
plots = TRUE)
# View results
res3
# $Statistics
# VAR N MEAN STD STDERR MIN MAX
# 1 SBPbefore-SBPafter 12 -1.833333 5.828353 1.682501 -9 8
#
# $ConfLimits
# VAR MEAN LCLM UCLM STD
# 1 SBPbefore-SBPafter -1.833333 -5.536492 1.869825 5.828353
#
# $TTests
# VAR DF T PROBT
# 1 SBPbefore-SBPafter 11 -1.089648 0.2991635
```
If printing is enabled, the following is sent to the viewer:
<img src="../man/images/ttesta.png" alt="Proc ttest example chart 1"/>
<img src="../man/images/ttestb.png" alt="Proc ttest example chart 2"/>
<img src="../man/images/ttestc.png" alt="Proc ttest example chart 3"/>
<img src="../man/images/ttestd.png" alt="Proc ttest example chart 4"/>
The above test shows although there was a change before and after the stimulus,
that change was not statistically significant.
Note that the `proc_freq()` and `proc_req()` functions also have the capability
to generate plots for the selected analysis. See the documentation of those
functions for more details.
### Other Functions
While the main focus of the **procs** package is on statistical
procedures, the output from these procedures is often manipulated to
produce a more desirable result. Some additional functions have been added
to the **procs** package that SAS® programmers will be familiar with:
`proc_transpose()`, `proc_sort()`, and `proc_print()`.
Continuing from the means example above, let's take some additional steps
to produce a little report:
```{r eval=FALSE, echo=TRUE}
library(fmtr)
# Filter and select using subset function
res4 <- subset(res2, TYPE != 0, c(BY, CLASS, N, MEAN, STD, MIN, MAX))
# Transpose statistics
res5 <- proc_transpose(res3, id = CLASS, by = BY, name = Statistic)
# View transformed data
res5
# BY Statistic Black Blond Brown Red
# 1 Female N 4.00000 4.00000 4.00000 4.000000
# 2 Female MEAN 13.00000 20.25000 35.75000 9.250000
# 3 Female STD 15.59915 29.21615 21.88416 4.500000
# 4 Female MIN 2.00000 4.00000 14.00000 7.000000
# 5 Female MAX 36.00000 64.00000 66.00000 16.000000
# 6 Male N 4.00000 4.00000 4.00000 4.000000
# 7 Male MEAN 14.00000 11.50000 35.75000 8.500000
# 8 Male STD 12.51666 12.50333 18.67931 1.732051
# 9 Male MIN 3.00000 3.00000 15.00000 7.000000
# 10 Male MAX 32.00000 30.00000 53.00000 10.000000
# Assign factor to BY so we can sort
res5$BY <- factor(res5$BY, levels = c("Male", "Female"))
# Sort male to top
res6 <- proc_sort(res5, by = BY)
# BY Statistic Black Blond Brown Red
# 6 Male N 4.00000 4.00000 4.00000 4.000000
# 7 Male MEAN 14.00000 11.50000 35.75000 8.500000
# 8 Male STD 12.51666 12.50333 18.67931 1.732051
# 9 Male MIN 3.00000 3.00000 15.00000 7.000000
# 10 Male MAX 32.00000 30.00000 53.00000 10.000000
# 1 Female N 4.00000 4.00000 4.00000 4.000000
# 2 Female MEAN 13.00000 20.25000 35.75000 9.250000
# 3 Female STD 15.59915 29.21615 21.88416 4.500000
# 4 Female MIN 2.00000 4.00000 14.00000 7.000000
# 5 Female MAX 36.00000 64.00000 66.00000 16.000000
# Create formatting list
fmt <- flist(STD = "%.3f", type = "row", lookup = res6$Statistic)
# Create vector lookup
vf <- c(MEAN = "Mean", STD = "Std", MEDIAN = "Median",
MIN = "Min", MAX = "Max")
# Assign formats
formats(res6) <- list(Statistic = vf,
Black = fmt,
Blond = fmt,
Brown = fmt,
Red = fmt)
# Reassign first column name
names(res6)[1] <- "stub"
# Assign labels
labels(res6) <- list(stub = "Sex")
# Create list for reporting
prnt <- list(res$`Hair * Eye`, res$Chisq, res6)
# Print result
proc_print(prnt,
titles = "Analysis of Hair and Eyes Dataset",
view = FALSE) # Set view = TRUE to see results
```
<img src="../man/images/print1.png" alt = "Proc print example"/>
If desired, we could print this report to the file system using
the `file_path` and `output_type` parameters on `proc_print()`.
## Next Steps
For next steps, please review the examples provided in the vignette articles.
Those articles include:
* [The Frequency Function](procs-freq.html)
* [The Means Function](procs-means.html)
* [The T-Test Function](procs-ttest.html)
* [Data Manipulation Functions](procs-dm.html)
* [Example 1: Demographics](procs-example1.html)
* [Example 2: Adverse Events](procs-example2.html)
* [Example 3: Disposition](procs-example3.html)