Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
90 lines (61 sloc) 6.79 KB
---
title: "Recode with table"
author: "Doug Manuel"
date: "01/07/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# RecWTable
Recode with Table
Recoding variables is a common task in research that is time consuming and error prone. RecWTable compliments sjmisc::rec() and other recoding methods with the following features:
- Rules (syntax) for recoding are contained in a data.table (default <- `variableDetails`), thereby allowing rules for recoding multiple variables.
- Each row in the data.table comprises one category in a newly transformed variable.
- The syntax for recoding is the same as sjmisc::rec() -- see below -- and contained in variableDetails$from
- Included in each row in the data.table is the variable label, value label and name of the new categorical variable.
Defaults:
+ variable labels <- variableDetails$label
+ value (factor) labels <- variableDetails$valueLabel
+ variable units <- variableDetils$units
- Recoding from a continuous or categorical variabe to new categorical variable.
- Recoding from a categorical variable to either a new continuous or catecategorical variable (todo).
- Recoding to a common target variable from different variables in different databases. See below 'recoding to a common target variable'.
- VariableDetails labels and other metadata can be imported from an accompanying metadata file (DDI supported).
- A log of transformations can be printed to console.
- `attr` for new variable set. `recodeFrom` to the original variable. (for discussion but I would like to include).
- Additional features are available if `variableDetails` is a bllflow class. That is, `variableDetails `is passed as an attribute when creating a _bllflow_ object using `bllflow()`-- see [Speciying your model](vignettes/b_model_specification.html#reading-the-model-specification-workbook)
+ 'variables' sheet is used to identify which variables are recoded. `variableDetails` provides the rules for how to recode.
+ the log of recoding is appended to the _bllflow_ object. see....
## Syntax for variable recoding
The syntax is the same syntax as sjmisc.
The recode-pattern, i.e. which new values should replace the old values, is defined using the `rec` variable in `variableDetails` data.frame. This argument has a specific "syntax":
the pairs are obtained from the RecFrom and RecTo columns
* **recode pairs**: Each recode pair is row. e.g. `rec = "1=1", "2=4", "3=2", "4=3"`
* **multiple values**: Multiple old values that should be recoded into a new single value may be separated with comma, e.g. "1,2=1", "3,4=2"`
* **value range**: A value range is indicated by a colon, e.g. `rec = "1:4=1", "5:8=2"` (recodes all values from 1 to 4 into 1, and from 5 to 8 into 2)
* **value range for doubles**: For double vectors (with fractional part), all values within the specified range are recoded; e.g. `rec = "1:2.5=1", "2.6:3=2"` recodes 1 to 2.5 into 1 and 2.6 to 3 into 2, "but 2.55 would not be recoded (since it's not included in any of the specified ranges).
Different from sjmisc::rec(), there is the ability to define intervals uising `interval`. The default interval, `[,)` which corresponds to the common math [notation](https://en.wikipedia.org/wiki/Interval_(mathematics)) where a *closed interval* is denoted with a closed bracket `[` or `]` and an *open interval* is denoted with an open bracket `(` or `)`. A closed interval is an interval which includes all it limit points. For example, `[0,1] means greater than or equal to 0 and less than or equal to 1. For example, `from "1:2.5=1"` recodes to the default interval, where any value greater than or equal to 1 and less than 2.5 to the new value 1.
* **"min" and "max"**: Minimum and maximum values are indicates by `min` (or `lo`) and `max` (or `hi`), e.g. `from = "min:4=1", "5:max=2"` (recodes all values from minimum values of x to 4 into 1, and from 5 to maximum values of x into 2) (for discussion....You can also use `min` or `max` to recode a value into the minimum or maximum value of a variable, e.g. `rec = "min:4=1" "5:7=max"` (recodes all values from minimum values of x to 4 into 1, and from 5 to 7 into the maximum value of x).
* **"else"**: All other values, which have not been specified yet, are indicated by else, e.g. `rec = "3=1", "1=2", "else=3"` (recodes 3 into 1, 1 into 2 and all other values into 3)
* **"copy"**: The `"else"`-token can be combined with `"copy"`, indicating that all remaining, not yet recoded values should stay the same (are copied from the original value), e.g. `rec = "3=1; 1=2; else=copy"` (recodes 3 into 1, 1 into 2 and all other values like 2, 4 or 5 etc. will not be recoded, but copied.
* **NA's**: `NA` values are allowed both as old and new value, e.g. `rec = "NA=1", "3:5=NA"` (recodes all `NA` into 1, and all values from 3 to 5 into NA in the new variable)
[note from Doug: these descriptors for rev and direct value labels will be modified in our final documentation. Indicated here to identify how bllflow differs from sjmisc::rec().
rev is available in sjmisc, but not available in bllflow. * **"rev"**: `"rev"` is a special token that reverses the value order.
Direct value label is avaiable in sjmis. In bllflow, value labelling is performed using 'valueLabel' for the corresponding row in the `variableDetails` data.table. * **direct value labelling**: Value labels for new values can be assigned inside the recode pattern by writing the value label in square brackets after defining the new value in a recode pair, e.g. `rec = "15:30=1 [young aged]; 31:55=2 [middle aged]; 56:max=3 [old aged]"`
* **non-captured values**: Non-matching values will be set to `NA` (default), unless captured by the `"else"`- or `"copy"`-token.
----------
————-
Notes:
1) If a startVariable is not present in the data, should we consider whether it is an intermediary variable, and then transform (recode) that variable? An intermediary variable is a variable that exists as a variable variableDetails (it is created by variables in `data` and then used in a transformation with other)
For initial version, give warnings and errors only.
If missing intermediary variable: “Error: recoding {variable} requires {startVariable} variable. Recode available in {variableDetails}. Suggest first recoding {startVariable} variable, then try again.”
If missing startVariable (altogether): “Error: missing required starting variable(s): {startingVariable}”
2) Check to make sure all possible values are recoded. What should we do if values cannot be recoded? See outOfRange = NA
3) Return error if any required fields are missing, including: startVarible, type, etc.
4) Log example:
“{variable} created from: startVariables.
Observations: {n}
type: {continuous, factor, etc.}
(if continuous:) min: {min}, max: {min}, NA: {n of missing}
(if factor:) {n} factors, NA: {n of missing}”
You can’t perform that action at this time.