/
hello-foofactors.Rmd
executable file
·81 lines (63 loc) · 2.1 KB
/
hello-foofactors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
title: "stringsAsFactors = HELLNO"
author: "Ziqiang Tang"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{stringsAsFactors = HELLNO}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
Factors are a very useful type of variable in R, but they can also drive you nuts. Especially the "stealth factor" that you think of as character.
Can we soften some of their sharp edges?
Binding two factors via `fbind()`:
```{r}
library(foofactors)
a <- factor(c("character", "hits", "your", "eyeballs"))
b <- factor(c("but", "integer", "where it", "counts"))
```
Simply catenating two factors leads to a result that most don't expect.
```{r}
c(a, b)
```
The `fbind()` function glues two factors together and returns factor.
```{r}
fbind(a, b)
```
Often we want a table of frequencies for the levels of a factor. The base `table()` function returns an object of class `table`, which can be inconvenient for downstream work. Processing with `as.data.frame()` can be helpful but it's a bit clunky.
```{r}
set.seed(1234)
x <- factor(sample(letters[1:5], size = 100, replace = TRUE))
table(x)
as.data.frame(table(x))
```
The `freq_out()` function returns a frequency table as a well-named `tbl_df`:
```{r}
freq_out(x)
```
### detect factors that should be character because # unique values = length
```{r}
f_detect(factor(c("a", "b", "c","a")))
f_detect(factor(c("a", "b", "c","d")))
```
### write a version of reorder() that uses desc() a la (d)plyr
```{r}
f_reorder(factor(c("B", "A", "D")))
```
### write a version of factor() that sets levels to the order in which they appear in the data, i.e. set the levels “as is”
```{r}
f_set(factor(c("B", "A", "D")))
```
### functions to write and read data frames to plain text delimited files while retaining factor levels; maybe by writing/reading a companion file?
```{r}
# De
set.seed(1234)
df <- data.frame(
kids = factor(c(1,0,1,0,0,0), levels = c(0, 1),
labels = c("boy", "girl"))
)
levels(df$kids)
x_write(df, "./df_x.csv", "./df_x.txt")
read_return <- x_read("./df_x.csv", "./df_x.txt")
levels(read_return$kids)
```