/
midf_apply_primer.Rmd
425 lines (351 loc) · 12.6 KB
/
midf_apply_primer.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
---
title: "A matsindf_apply primer"
author: "Matthew Kuperus Heun"
date: "`r Sys.Date()`"
header-includes:
- \usepackage{amsmath}
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{A matsindf_apply primer}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
bibliography: References.bib
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(dplyr)
library(matsbyname)
library(matsindf)
library(tidyr)
```
## Introduction
`matsindf_apply()` is a powerful and versatile function
that enables analysis with lists and data frames by applying
`FUN` in helpful ways.
The function is called `matsindf_apply()`,
because it can be used to apply `FUN` to a `matsindf` data frame,
a data frame that contains matrices as individual entries in a data frame.
(A `matsindf` data frame can be created by
calling `collapse_to_matrices()`, as demonstrated below.)
But `matsindf_apply()` can apply `FUN` across much more:
data frames of single numbers,
lists of matrices,
lists of single numbers, and
individual numbers.
This vignette demonstrates `matsindf_apply()`,
starting with simple examples and
proceeding toward sophisticated analyses.
## The basics
The basis of all analyses conducted with `matsindf_apply()`
is a function (`FUN`) to be applied across data
supplied in `.dat` or `...`.
`FUN` must return a named list of variables as
its result.
Here is an example function that both adds and subtracts its arguments,
`a` and `b`, and
returns a list containing its result, `c` and `d`.
```{r}
example_fun <- function(a, b){
return(list(c = matsbyname::sum_byname(a, b),
d = matsbyname::difference_byname(a, b)))
}
```
Similar to `lapply()` and its siblings,
additional argument(s) to `matsindf_apply()` include
the data over which `FUN` is to be applied.
These arguments can, in the first instance,
be supplied as named arguments to the `...` argument
of `matsindf_apply()`.
All arguments in `...` must be named.
The `...` arguments to `matsindf_apply()`
are passed to `FUN` according to their names.
In this case, the output of `matsindf_apply()`
is the the named list returned by `FUN`.
```{r}
matsindf_apply(FUN = example_fun, a = 2, b = 1)
```
Passing an additional argument (`z = 2`)
causes an unused argument error,
because `example_fun` does not have a `z` argument.
```{r}
tryCatch(
matsindf_apply(FUN = example_fun, a = 2, b = 1, z = 2),
error = function(e){e}
)
```
Failing to pass a needed argument (`b`)
causes an error that indicates the missing argument.
```{r}
tryCatch(
matsindf_apply(FUN = example_fun, a = 2),
error = function(e){e}
)
```
Alternatively, arguments to `FUN` can be given
in a named list to `.dat`, the first argument of `matsindf_apply()`.
When a value is assigned to `.dat`,
the return value from `matsindf_apply()`
contains all named variables in `.dat`
(in this case both `a` and `b`)
in addition to the results provided by `FUN`
(in this case both `c` and `d`).
```{r}
matsindf_apply(list(a = 2, b = 1), FUN = example_fun)
```
Extra variables are tolerated in `.dat`,
because `.dat` is considered to be a store of data
from which variables can be drawn as needed.
```{r}
matsindf_apply(list(a = 2, b = 1, z = 42), FUN = example_fun)
```
In contrast, arguments to `...`
are named explicitly by the user,
so including an extra argument in `...` is considered an error,
as shown above.
## Some details
If a named argument is supplied by both `.dat` and `...`,
the argument in `...` takes precedence,
overriding the argument in `.dat`.
```{r}
matsindf_apply(list(a = 2, b = 1), FUN = example_fun, a = 10)
```
When supplying **both** `.dat` and `...`,
`...` can contain named strings of length `1`
which are interpreted as mappings
from named items in `.dat`
to arguments in the signature of `FUN`.
In the example below,
`a = "z"` indicates that argument `a` to `FUN`
should be supplied by item `z` in `.dat`.
```{r}
matsindf_apply(list(a = 2, b = 1, z = 42),
FUN = example_fun, a = "z")
```
If a named argument appears in both `.dat` and the output of `FUN`,
a name collision occurs in the output of `matsindf_apply()`, and
a warning is issued.
```{r}
tryCatch(
matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun),
warning = function(w){w}
)
```
`FUN` can accept more than just numerics.
`example_fun_with_string()` accepts a character string and a numeric.
However, because `...` argument that is a character string
of length `1` has special meaning
(namely mapping variables in `.dat` to arguments of `FUN`),
passing a character string of length `1` can cause an error.
To get around the problem, wrap the single string
in a list, as shown below.
```{r}
example_fun_with_string <- function(str_a, b) {
a <- as.numeric(str_a)
list(added = matsbyname::sum_byname(a, b), subtracted = matsbyname::difference_byname(a, b))
}
# Causes an error
tryCatch(
matsindf_apply(FUN = example_fun_with_string, str_a = "1", b = 2),
error = function(e){e}
)
# To solve the problem, wrap "1" in list().
matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = 2)
matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = list(2))
matsindf_apply(FUN = example_fun_with_string,
str_a = list("1", "3"),
b = list(2, 4))
matsindf_apply(.dat = list(str_a = list("1"), b = list(2)), FUN = example_fun_with_string)
matsindf_apply(.dat = list(m = list("1"), n = list(2)), FUN = example_fun_with_string,
str_a = "m", b = "n")
```
## `matsindf_apply()` and data frames
`.dat` can also contain a data frame (or tibble),
both of which are fancy lists.
When `.dat` is a data frame or tibble,
the output of `matsindf_apply()` is a tibble, and
`FUN` acts like a specialized `dplyr::mutate()`,
adding new columns at the right of `.dat`.
```{r}
matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)),
FUN = example_fun_with_string)
matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)),
FUN = example_fun_with_string,
str_a = "str_a", b = "b")
matsindf_apply(.dat = data.frame(m = c("1", "3"), n = c(2, 4)),
FUN = example_fun_with_string,
str_a = "m", b = "n")
```
Additional niceties are available when `.dat` is a data frame or a tibble.
`matsindf_apply()` works when the data frame is filled with single numeric values,
as is typical.
```{r}
df <- data.frame(a = 2:4, b = 1:3)
matsindf_apply(df, FUN = example_fun)
```
But `matsindf_apply()` also works with `matsindf` data frames,
data frames in which each cell of the data frame is filled with a single matrix.
To demonstrate use of `matsindf_apply()` with a `matsindf` data frame,
we'll construct a simple `matsindf` data frame (`midf`)
using functions in this package.
```{r}
# Create a tidy data frame containing data for matrices
tidy <- tibble::tibble(Year = rep(c(rep(2017, 4), rep(2018, 4)), 2),
matnames = c(rep("U", 8), rep("V", 8)),
matvals = c(1:4, 11:14, 21:24, 31:34),
rownames = c(rep(c(rep("p1", 2), rep("p2", 2)), 2),
rep(c(rep("i1", 2), rep("i2", 2)), 2)),
colnames = c(rep(c("i1", "i2"), 4),
rep(c("p1", "p2"), 4))) |>
dplyr::mutate(
rowtypes = case_when(
matnames == "U" ~ "Product",
matnames == "V" ~ "Industry",
TRUE ~ NA_character_
),
coltypes = case_when(
matnames == "U" ~ "Industry",
matnames == "V" ~ "Product",
TRUE ~ NA_character_
)
)
tidy
# Convert to a matsindf data frame
midf <- tidy |>
dplyr::group_by(Year, matnames) |>
collapse_to_matrices(rowtypes = "rowtypes", coltypes = "coltypes") |>
tidyr::pivot_wider(names_from = "matnames", values_from = "matvals")
# Take a look at the midf data frame and some of the matrices it contains.
midf
midf$U[[1]]
midf$V[[1]]
```
With `midf` in hand, we can demonstrate use of
[`tidyverse`](https://www.tidyverse.org)-style
functional programming to perform
matrix algebra within a data frame.
The functions of the `matsbyname` package
(such as `difference_byname()` below)
can be used for this purpose.
```{r}
result <- midf |>
dplyr::mutate(
W = difference_byname(transpose_byname(V), U)
)
result
result$W[[1]]
result$W[[2]]
```
This way of performing matrix calculations works equally well
within a 2-row `matsindf` data frame
(as shown above) or
within a 1000-row `matsindf` data frame.
## Programming with `matsindf_apply()`
Users can write their own functions using `matsindf_apply()`.
A flexible `calc_W()` function can be written as follows.
```{r}
calc_W <- function(.DF = NULL, U = "U", V = "V", W = "W") {
# The inner function does all the work.
W_func <- function(U_mat, V_mat){
# When we get here, U_mat and V_mat will be single matrices or single numbers,
# not a column in a data frame or an item in a list.
if (length(U_mat) == 0 & length(V_mat == 0)) {
# Tolerate zero-length arguments by returning a zero-length
# a list with the correct name and return type.
return(list(numeric()) |> magrittr::setnames(W))
}
# Calculate W_mat from the inputs U_mat and V_mat.
W_mat <- matsbyname::difference_byname(
matsbyname::transpose_byname(V_mat),
U_mat)
# Return a named list.
list(W_mat) |> magrittr::set_names(W)
}
# The body of the main function consists of a call to matsindf_apply
# that specifies the inner function in the FUN argument.
matsindf_apply(.DF, FUN = W_func, U_mat = U, V_mat = V)
}
```
This style of writing `matsindf_apply()` functions is incredibly versatile,
leveraging the capabilities of both the `matsindf` and `matsbyname` packages.
(Indeed, the `Recca` package
uses `matsindf_apply()` heavily and
is built upon the functions in the `matsindf` and `matsbyname` packages.)
Functions written like `calc_W()`
can operate in ways similar to `matsindf_apply()` itself.
To demonstrate, we'll use `calc_W()` in all the ways that `matsindf_apply()` can be used,
going in the reverse order to our demonstration of the capabilities of `matsindf_apply()` above.
`calc_W()` can be used as a specialized `mutate` function
that operates on `matsindf` data frames.
```{r}
midf |> calc_W()
```
The added column could be given a different name from the default ("`W`")
using the `W` argument.
```{r}
midf |> calc_W(W = "W_prime")
```
As with `matsindf_apply()`,
column names in `midf` can be mapped to the arguments of `calc_W()`
by the arguments to `calc_W()`.
```{r}
midf |>
dplyr::rename(X = U, Y = V) |>
calc_W(U = "X", V = "Y")
```
`calc_W()` can operate on lists of single matrices, too.
This approach works, because the default values for the
`U` and `V` arguments to `calc_W()` are
"U" and "V", respectively.
The input list members (in this case `midf$U[[1]]` and `midf$V[[1]]`)
are returned with the output, because
`list(U = midf$U[[1]], V = midf$V[[1]])` is passed to the `.dat` argument
of `matsindf_apply()`.
```{r}
calc_W(list(U = midf$U[[1]], V = midf$V[[1]]))
```
It may be clearer to name the arguments as required by the `calc_W()` function
without wrapping in a list first,
as shown below.
But in this approach, the input matrices are not returned with the output,
because arguments `U` and `V` are passed to the `...` argument of `matsindf_apply()`,
not the `.dat` argument of `matsindf_apply()`.
```{r}
calc_W(U = midf$U[[1]], V = midf$V[[1]])
```
`calc_W()` can operate on data frames containing single numbers.
```{r}
data.frame(U = c(1, 2), V = c(3, 4)) |> calc_W()
```
Finally, `calc_W()` can be applied to single numbers,
and the result is 1x1 matrix.
```{r}
calc_W(U = 2, V = 3)
```
It is good practice to write internal functions
that tolerate zero-length inputs, as `calc_W()` does.
Doing so, enables results from different calculations to be `rbind`ed together.
```{r}
calc_W(U = numeric(), V = numeric())
calc_W(list(U = numeric(), V = numeric()))
res <- calc_W(list(U = c(2, 3, 4, 5), V = c(3, 4, 5, 6)))
res0 <- calc_W(list(U = numeric(), V = numeric()))
dplyr::bind_rows(res, res0)
```
## Conclusion
This vignette demonstrated use of
the versatile `matsindf_apply()` function.
Inputs to `matsindf_apply()` can be
* single numbers,
* matrices, or
* data frames with appropriately-named columns.
`matsindf_apply()` can be used for programming, and
functions constructed as demonstrated above
share characteristics with `matsindf_apply()`:
* they can be used as specialized `dplyr::mutate()` operators, and
* they can be applied to single numbers, matrices, or
data frames with appropriately-named columns.