-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
89 lines (62 loc) · 4.23 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#muir
The **muir** package allows users to explore a data.frame using the tree data structure with minimal effort by simply providing the data.frame columns to be explored. In addition, the **muir** package allows for more targeted tree data structures to be created with specific column criteria as a method for documenting and communicating the data structure and relationships within a data set. These data tree structures can be viewed within the *RStudio* console, standard browers, or saved as HTML for sharing using the **htmlwidgets** package.
The package leverages the infrastructure provided by [**DiagrammeR**](http://rich-iannone.github.io/DiagrammeR/).
## Installation
Install the development version of **muir** from GitHub using the **devtools** package.
```r
devtools::install_github('alforj/muir')
```
Or, get the v0.1.0 release from **CRAN**.
```r
install.packages('muir')
```
## Using muir
### Basic example
A basic example using **muir** to explore the **mtcars** data set showing the 'cyl' and 'carb' columns
and the relationship between them (in that order).
Providing the "\*" qualifier (after the ":" separator) indicates that all distinct values in the column
should be returned as separate nodes. Obviously, depending on the data this may result in a LOT of nodes.
As a guard against this, especially during initial exploration, a *node.limit* parameter limits the
number of nodes returned at each level and is defaulted to three (3). If there are more than 3 distinct
values, then only the Top 3 occuring values will be returned. This global parameter can be adjusted for all node levels or specific limits can be provided individually for each *node.level* value (for example, by providing "cyl:4" instead of "cyl:*" to indicate a
node limit of 4 for the 'cyl' column only).
The resulting tree will be rendered starting with a level 0 node counting all rows in the data set. Each resulting level will be based on the columns provided by the user in *node.levels*. Each subsequent level will carry the filters from previous parent nodes forward. Percentages will be provided for each node (compared to the level 0 count) by default and can be turned off if not desired.
*tree.height* and *tree.width* values control how the tree is rendered and can be adjusted to best fit trees of various depths and widths.
```{r }
library(muir)
data(mtcars)
mtTree <- muir(data = mtcars, node.levels = c("cyl:*", "carb:*"),
tree.height = 1200, tree.width = 800)
mtTree
```
### More complicated example
Instead of (or in combination with) just returning top counts for columns provided in *node.levels*,
you can provide custom filter criteria and custom node titles in *level.criteria*
(*level.criteria* could also be read in from a stored file (e.g., a crtieria.csv) as a data.frame)
The **criteria** data.frame below includes the column names, operators, and associated values
(e.g., "cyl" <= 4), and a node title to accompany each node generated for that filter criteria.
Adding a "+" suffix after the column name in the *node.levels* parameter will add an extra
"Other" node that will aggregate all values not already provided in the *level.criteria* value
or for all distinct values below the *node.limit* provided.
Additional label values can be provided with the *label.vals* parameter using
[**dplyr**](https://github.com/hadley/dplyr) summary functions. Custom labels for each value
can be provided by adding custom text after the ":" separator.
Lastly, the direction the tree is drawn can be changed from the default left-to-right to a
top-to-bottom ("TB") rendering by providing a new value for *tree.dir*.
```{r}
criteria <- data.frame(col = c("cyl", "cyl", "carb"),
oper = c("<=", ">", "=="),
val = c(4, 4, 2),
title = c("Up to 4 Cylinders", "More than 4 Cylinders", "2 Carburetors"))
mtTree <- muir(data = mtcars, node.levels = c("cyl", "carb:+"),
level.criteria = criteria,
label.vals = c("n():n", "min(wt):Min Weight", "max(wt):Max Weight"),
tree.dir = "TB",
tree.height = 400, tree.width = 800)
mtTree
```
### More examples
More examples can be found in the *examples* in the **muir** function
```{r}
help(muir)
```