/
i_helper_functions.Rmd
172 lines (134 loc) · 6.64 KB
/
i_helper_functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
title: "Helper and utility functions"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Healper and utility functions}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
% \VignetteDepends{survival}
% \VignetteDepends{dplyr}
% \VignetteDepends{magrittr}
---
## Introduction
## Metadata
_bllflow_ supports metadata from DDI and from the R packages hmisc and rjlabelled.
### DDI metadata
The [Data Documentation Intitiatve](https://www.ddialliance.org) (DDI) is is an
international standard for describing the data produced by surveys and other
observational methods in the social, behavioral, economic, and health sciences.
There is metadata for variables labels, but also important data provenance information.
Get the description of the study data from DDI and add the description to
_bllflow_ object
```{r Load libraries, warning=TRUE, include=FALSE}
library(survival) # for pbc data
data(pbc)
library(bllflow)
```
### Example 1: Read the DDI xml file for your data.
```{r}
pbc_DDI <- read_DDI(file.path(getwd(), "../inst/extdata"), "pbcDDI.xml")
```
Your metadata is now stored in the `pbcDDI` object in two formats:
1) Variable and format labels can be accessed in `pbcDDI$variableMetaData`.
All DDI metadata is brought into the `pbcDDI` object. For example, variable labels can
be accessed as follows:
```{r}
str(pbc_DDI$variableMetaData$varlab) # variable labels
str(pbc_DDI$variableMetaData$vallab) # value labels for categorial variables
```
`variableMetaData` is generated using the `DDIwR` package.
2) All metadata from the DDI document `pbcDDI$ddiObject`. Careful
Get the varialbe metadata for variableDetails sheet from DDI and add
that the bllFlow MSW-DDI object. Careful printing: there can be a lot
of information! And there can be a lot of nested lists.
Get the name of the data.
```{r}
cat("Dataset name: \n")
pbc_DDI$ddiObject$codeBook$docDscr$citation$titlStmt$titl
```
#### Example 2: Use DDI to add variable labels and other information for your model variables.
The MSW `variables` and `variable_details.csv` file contains the variables in your model.
Use the `BLLFlow()` function with the MSW files as attributes to add labels to the
BLLFlow object, (`variables` and `variable_details` and the DDI object (`pbc_DDI`).
```{r}
# read the MSW files
variables <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variables.csv'))
variable_details <- read.csv(file.path(getwd(), '../inst/extdata/PBC-variableDetails.csv'))
# create a BLLFlow object and add labels.
pbc_model <- BLLFlow(pbc, variables, variableDetails, pbcDDI)
```
Metadata is added to `variables` and `variable_details`. If labels and other metadata
already were in the files, the DDI metadata is added to `start_label`, `start_type`,
`cat_start_value`, `cat_start_label`, `start_low`, `start_high` --- if that data is in
the DDI file.
```{r}
cat("Variable labels in the original MSW\n")
variables$label
cat("\nNo variable labels from DDI in variable details\n")
variable_details$label
cat("\nDDI variable lables added to variableDetailsWithDDI\n")
pbc_model$populated_variable_details$label
```
#### Example 3: Write MSW_DDI to MSW.csv
Use `write_DDI_populated_MSW()` to create a MSW `variable_details` CSV file.
(change to new name `variable_details_with_DDI()`) using `variable_details_with_DDI()`.
Then export using `write_DDI_populated_MSW()` as a CSV file to help further develop
your study protocol.
`write_DDI_populated_MSW()` is a handy function at the beginning of your study.
First, identify thes ariable you need in your study. Then create a 'bane bones'
MSW `variables` CSV sheet, import into R and `BLLFlow()` to add additional
information from the DDI reference file such as labels, variable types, categories.
Alternatively,
1) directly add metadata to variables in a CSV file (see general utility
functions in *Example X*); or,
2) export a CSV with metadata from an R variable list and a DDI file.
```{r}
write_DDI_populated_MSW(pbcModel, "../inst/extdata/", "new_MSW_variable_details.csv")
```
Creates a directory and file (if they don't already exists). `new_MSW_variable_details.csv`
will be overwritten if the file already exists. `new_MSW_variable_details.csv` is
created from `variable_details_with_DDI`. First create `variable_details_with_DDI`
with either `BLLFLow()` or `get_DDI_variables()`.
Create a new file (`new_name.csv`) directly from an existing variable_details.csv.
```{r}
# Writing does not work with pkgdown howver evaluation works on user machine
#WriteDDIPopulatedMSW(pbcDDI, "../inst/extdata/",
# "newMSWvariableDetails.csv",
# "nonBLLPopulated.csv")
```
#### Example 4: Update BLLFLow with new variables
Do we need or want this? Rusty, can you write in a vignette?
Replaces the models msw with all or either variables or variableDetails.
Passing variableDetails also updates the populatedVariableDetails
```{r}
pbc_model <- update_MSW(pbc_model, variables, variable_details)
```
#### Example 5. More general DDI utility functions
Described previously,`read_DDI()` returns a dataframe with all DDI metadata from
a DDI.xml file.
There two additional DDI utility functions that return the two main parts of the
DDI file.
`get_DDI_description()` returns a dataframe with DDI 'header' information from a DDI file.
The header information includes the dataset ID, name, creatation date and other
important information to preserve the provenance of your study data.
`get_DDI_variables()` returns a dataframe with variable DDI metadata for a list of
variables. DDI variable metadata is typical codebook information such as labels,
types, valid and invalid categories, category labels, and descriptive statistics,
such as number valid responses, missing responses, minimum, maximum and mean values.
The _bllflow_ workflow creates an ongoing, updated codebook as you perform your
study. The oringial data DDI metadata is the starting point for keeping the updated
codebook. The _bllflow_ then creates and modifies an new instance of the codebook
as you perform analyses. In addition, a log is created that describes the data
cleaning and data transformation steps.
get_DDI_description(DDI_path = ddi_path, DDI_file = ddi_path)
# same as calling BLLFlow(pbc, get_DDI = c(ddi_path, ddi_path, "description"), but
would not add the info to the bllFlow object.
```{r}
pbc_DDI <- read_DDI(file.path(getwd(), "../inst/extdata"),
"pbcDDI.xml")
# all the DDI metadata
vars_for_pbc <- get_DDI_variables(pbc_DDI, c("age", "sex"))
# all the variable metadata (but no study description infomation)
descr_for_pbc <- get_DDI_description(pbc_DDI)
# all the study description metadata (but no variable infomation)
```