/
fs_pivotlonger.Rmd
155 lines (122 loc) · 4.95 KB
/
fs_pivotlonger.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
title: "Convert Table from Wide to Long with dplyr"
titleshort: "Convert Table from Wide to Long with dplyr"
description: |
Given a matrix of values with row and column labels, create a table where the unit of observation are the row and column categories, and the values in the matrix is stored in a single variable.
Reshape wide to long two sets of variables, two categorical variables added to wide table.
core:
- package: tidyr
code: |
pivot_longer(cols = starts_with('zi'), names_to = c('zi'), names_pattern = paste0("zi(.)"), values_to = "ev")
pivot_longer(cols = matches('a line b'), names_to = c('va', 'vb'), names_pattern = paste0("(.)_(.)"), values_to = "ev")
- package: dplyr
code: |
left_join()
date: 2023-07-18
date_start: 2020-05-14
output:
pdf_document:
pandoc_args: '../../_output_kniti_pdf.yaml'
includes:
in_header: '../../preamble.tex'
html_document:
pandoc_args: '../../_output_kniti_html.yaml'
includes:
in_header: "../../hdga.html"
always_allow_html: true
urlcolor: blue
---
### Wide to Long
```{r global_options, include = FALSE}
try(source("../../.Rprofile"))
```
`r text_shared_preamble_one`
`r text_shared_preamble_two`
`r text_shared_preamble_thr`
Using the [pivot_wider](https://tidyr.tidyverse.org/reference/pivot_wider.html) function in tidyr to reshape panel or other data structures
#### Wide to long panel, single variable
We have a matrix of values, the values are *ev*. Each row corresponds to a different value of the *a* variable, each column represents a different value of the *z* variable.
Based on this matrix, we create a table where each unit of observation is for a specific *a* and *z* variable combination. So the matrix is turned from wide to long.
The resulting long table has 5 variables
1. *a*: values of the *a* variable, the original matrix row labels
2. *ai*: an index from 1, indicating the original matrix row index
3. *z*: values of the *z* variable, the original matrix column lables
4. *zi*: an index from 1, indicating hte original matrix column index
First, we create the matrix.
```{r}
# Generate A Matrix
set.seed(123)
ar_a <- c(1.1,5.1)
ar_z <- seq(-2.5, 2.53, length.out=11)
mt_ev = matrix(rnorm(length(ar_a)*length(ar_z)),
nrow=length(ar_a), ncol=length(ar_z))
# Name Matrix
rownames(mt_ev) <- paste0('ai', seq(1:length(ar_a)))
colnames(mt_ev) <- paste0('zi', seq(1:length(ar_z)))
# to tibble
tb_ev <- as_tibble(mt_ev) %>% rowid_to_column(var = "ai")
# Print
print(mt_ev)
# Display
kable(tb_ev, caption = "Wide table") %>% kable_styling_fc()
```
Second, we convert the table wide to long.
```{r}
# longer
tb_ev_long <- tb_ev %>%
pivot_longer(cols = starts_with('zi'),
names_to = c('zi'),
names_pattern = paste0("zi(.*)"),
values_to = "ev") %>%
mutate(zi = as.numeric(zi))
# Merge with a and z values
tb_ev_long <- tb_ev_long %>%
left_join(as_tibble(ar_a) %>%
rowid_to_column(var = "ai") %>%
rename(a = value)
, by = 'ai') %>%
left_join(as_tibble(ar_z) %>%
rowid_to_column(var = "zi") %>%
rename(z = value),
by = 'zi') %>%
select(a,ai,z,zi,ev)
# Display
kable(tb_ev_long, caption = "Long table") %>% kable_styling_fc()
```
#### Wide to long panel, multiple variables
We have a dataset where each row contains data from a different year. We have four variables, observed wage, simulated wage, observed labor quantities, and simulated labor quantities.
We generate reshape this file to have four variables:
1. year
2. categorical for wage or quantity
3. categorical for observed or simulated
4. a numerical column with wage and quantity values
This is different then the situation prior, because we are need to convert to long two different numerical variables that will be in the same long variable, but differentiated by two categorical variables (rather than one).
First, we create the matrix.
```{r}
# Generate A Matrix
set.seed(123)
ar_year <- c(1995, 1997, 1999)
ar_vars <- c("wage_model", "quant_model", "wage_simu", "quant_simu")
mt_equi = matrix(rnorm(length(ar_year)*length(ar_vars)),
nrow=length(ar_year), ncol=length(ar_vars))
# Name Matrix
rownames(mt_equi) <- ar_year
colnames(mt_equi) <- ar_vars
# to tibble
tb_equi <- as_tibble(mt_equi, rownames = "year")
# Print
print(mt_equi)
# Display
kable(tb_equi, caption = "Wide table") %>% kable_styling_fc()
```
Second, we convert the table wide to long. We select columns that includes either wage or quant, see tidyselect [Select variables that match a pattern](https://tidyselect.r-lib.org/reference/starts_with.html) for additional verbs for how to select variables.
```{r}
# longer
tb_equi_long <- tb_equi %>%
pivot_longer(cols = matches('wage|quant'),
names_to = c('variable', 'source'),
names_pattern = paste0("(.*)_(.*)"),
values_to = "value")
# Display
kable(tb_equi_long, caption = "Long table, Two Variables") %>% kable_styling_fc()
```