vignettes/midf_apply_primer.Rmd

---
title: "A matsindf_apply primer"
author: "Matthew Kuperus Heun"
date: "`r Sys.Date()`"
header-includes:
   - \usepackage{amsmath}
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{A matsindf_apply primer}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
bibliography: References.bib
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(dplyr)
library(matsbyname)
library(matsindf)
library(tidyr)
```


## Introduction

`matsindf_apply()` is a powerful and versatile function
that enables analysis with lists and data frames by applying
`FUN` in helpful ways.
The function is called `matsindf_apply()`,
because it can be used to apply `FUN` to a `matsindf` data frame,
a data frame that contains matrices as individual entries in a data frame.
(A `matsindf` data frame can be created by
calling `collapse_to_matrices()`, as demonstrated below.)

But `matsindf_apply()` can apply `FUN` across much more:
data frames of single numbers,
lists of matrices,
lists of single numbers, and
individual numbers.
This vignette demonstrates `matsindf_apply()`,
starting with simple examples and
proceeding toward sophisticated analyses.


## The basics

The basis of all analyses conducted with `matsindf_apply()`
is a function (`FUN`) to be applied across data
supplied in `.dat` or `...`.
`FUN` must return a named list of variables as
its result.
Here is an example function that both adds and subtracts its arguments,
`a` and `b`, and
returns a list containing its result, `c` and `d`.

```{r}
example_fun <- function(a, b){
  return(list(c = matsbyname::sum_byname(a, b), 
              d = matsbyname::difference_byname(a, b)))
}
```

Similar to `lapply()` and its siblings,
additional argument(s) to `matsindf_apply()` include
the data over which `FUN` is to be applied.
These arguments can, in the first instance,
be supplied as named arguments to the `...` argument
of `matsindf_apply()`.
All arguments in `...` must be named.
The `...` arguments to `matsindf_apply()`
are passed to `FUN` according to their names.
In this case, the output of `matsindf_apply()`
is the the named list returned by `FUN`.

```{r}
matsindf_apply(FUN = example_fun, a = 2, b = 1)
```

Passing an additional argument (`z = 2`)
causes an unused argument error,
because `example_fun` does not have a `z` argument.

```{r}
tryCatch(
  matsindf_apply(FUN = example_fun, a = 2, b = 1, z = 2),
  error = function(e){e}
)
```

Failing to pass a needed argument (`b`)
causes an error that indicates the missing argument.

```{r}
tryCatch(
  matsindf_apply(FUN = example_fun, a = 2),
  error = function(e){e}
)
```

Alternatively, arguments to `FUN` can be given
in a named list to `.dat`, the first argument of `matsindf_apply()`.
When a value is assigned to `.dat`,
the return value from `matsindf_apply()`
contains all named variables in `.dat`
(in this case both `a` and `b`)
in addition to the results provided by `FUN`
(in this case both `c` and `d`).

```{r}
matsindf_apply(list(a = 2, b = 1), FUN = example_fun)
```

Extra variables are tolerated in `.dat`,
because `.dat` is considered to be a store of data
from which variables can be drawn as needed.

```{r}
matsindf_apply(list(a = 2, b = 1, z = 42), FUN = example_fun)
```

In contrast, arguments to `...`
are named explicitly by the user,
so including an extra argument in `...` is considered an error,
as shown above.


## Some details

If a named argument is supplied by both `.dat` and `...`,
the argument in `...` takes precedence,
overriding the argument in `.dat`.

```{r}
matsindf_apply(list(a = 2, b = 1), FUN = example_fun, a = 10)
```

When supplying **both** `.dat` and `...`,
`...` can contain named strings of length `1`
which are interpreted as mappings
from named items in `.dat`
to arguments in the signature of `FUN`.
In the example below,
`a = "z"` indicates that argument `a` to `FUN`
should be supplied by item `z` in `.dat`.

```{r}
matsindf_apply(list(a = 2, b = 1, z = 42),
               FUN = example_fun, a = "z")
```

If a named argument appears in both `.dat` and the output of `FUN`,
a name collision occurs in the output of `matsindf_apply()`, and
a warning is issued.

```{r}
tryCatch(
  matsindf_apply(list(a = 2, b = 1, c = 42), FUN = example_fun),
  warning = function(w){w}
)
```

`FUN` can accept more than just numerics. 
`example_fun_with_string()` accepts a character string and a numeric.
However, because `...` argument that is a character string
of length `1` has special meaning
(namely mapping variables in `.dat` to arguments of `FUN`), 
passing a character string of length `1` can cause an error.
To get around the problem, wrap the single string
in a list, as shown below.

```{r}
example_fun_with_string <- function(str_a, b) {
  a <- as.numeric(str_a)
  list(added = matsbyname::sum_byname(a, b), subtracted = matsbyname::difference_byname(a, b))
}

# Causes an error
tryCatch(
  matsindf_apply(FUN = example_fun_with_string, str_a = "1", b = 2),
  error = function(e){e}
)
# To solve the problem, wrap "1" in list().
matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = 2)
matsindf_apply(FUN = example_fun_with_string, str_a = list("1"), b = list(2))
matsindf_apply(FUN = example_fun_with_string, 
               str_a = list("1", "3"), 
               b = list(2, 4))
matsindf_apply(.dat = list(str_a = list("1"), b = list(2)), FUN = example_fun_with_string)
matsindf_apply(.dat = list(m = list("1"), n = list(2)), FUN = example_fun_with_string, 
               str_a = "m", b = "n")
```


## `matsindf_apply()` and data frames

`.dat` can also contain a data frame (or tibble), 
both of which are fancy lists. 
When `.dat` is a data frame or tibble, 
the output of `matsindf_apply()` is a tibble, and 
`FUN` acts like a specialized `dplyr::mutate()`, 
adding new columns at the right of `.dat`.

```{r}
matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)), 
               FUN = example_fun_with_string)
matsindf_apply(.dat = data.frame(str_a = c("1", "3"), b = c(2, 4)), 
               FUN = example_fun_with_string, 
               str_a = "str_a", b = "b")
matsindf_apply(.dat = data.frame(m = c("1", "3"), n = c(2, 4)), 
               FUN = example_fun_with_string, 
               str_a = "m", b = "n")
```

Additional niceties are available when `.dat` is a data frame or a tibble.
`matsindf_apply()` works when the data frame is filled with single numeric values,
as is typical.

```{r}
df <- data.frame(a = 2:4, b = 1:3)
matsindf_apply(df, FUN = example_fun)
```

But `matsindf_apply()` also works with `matsindf` data frames,
data frames in which each cell of the data frame is filled with a single matrix.
To demonstrate use of `matsindf_apply()` with a `matsindf` data frame, 
we'll construct a simple `matsindf` data frame (`midf`)
using functions in this package.

```{r}
# Create a tidy data frame containing data for matrices
tidy <- tibble::tibble(Year = rep(c(rep(2017, 4), rep(2018, 4)), 2),
                       matnames = c(rep("U", 8), rep("V", 8)),
                       matvals = c(1:4, 11:14, 21:24, 31:34),
                       rownames = c(rep(c(rep("p1", 2), rep("p2", 2)), 2), 
                                    rep(c(rep("i1", 2), rep("i2", 2)), 2)),
                       colnames = c(rep(c("i1", "i2"), 4), 
                                    rep(c("p1", "p2"), 4))) |>
  dplyr::mutate(
    rowtypes = case_when(
      matnames == "U" ~ "Product",
      matnames == "V" ~ "Industry", 
      TRUE ~ NA_character_
    ),
    coltypes = case_when(
      matnames == "U" ~ "Industry",
      matnames == "V" ~ "Product",
      TRUE ~ NA_character_
    )
  )

tidy

# Convert to a matsindf data frame
midf <- tidy |>  
  dplyr::group_by(Year, matnames) |> 
  collapse_to_matrices(rowtypes = "rowtypes", coltypes = "coltypes") |> 
  tidyr::pivot_wider(names_from = "matnames", values_from = "matvals")

# Take a look at the midf data frame and some of the matrices it contains.
midf
midf$U[[1]]
midf$V[[1]]
```

With `midf` in hand, we can demonstrate use of 
[`tidyverse`](https://www.tidyverse.org)-style
functional programming to perform
matrix algebra within a data frame.
The functions of the `matsbyname` package
(such as `difference_byname()` below)
can be used for this purpose.

```{r}
result <- midf |> 
  dplyr::mutate(
    W = difference_byname(transpose_byname(V), U)
  )
result
result$W[[1]]
result$W[[2]]
```

This way of performing matrix calculations works equally well 
within a 2-row `matsindf` data frame
(as shown above) or
within a 1000-row `matsindf` data frame.


## Programming with `matsindf_apply()`

Users can write their own functions using `matsindf_apply()`. 
A flexible `calc_W()` function can be written as follows. 

```{r}
calc_W <- function(.DF = NULL, U = "U", V = "V", W = "W") {
  # The inner function does all the work.
  W_func <- function(U_mat, V_mat){
    # When we get here, U_mat and V_mat will be single matrices or single numbers, 
    # not a column in a data frame or an item in a list.
    if (length(U_mat) == 0 & length(V_mat == 0)) {
      # Tolerate zero-length arguments by returning a zero-length
      # a list with the correct name and return type.
      return(list(numeric()) |> magrittr::setnames(W))
    }
    # Calculate W_mat from the inputs U_mat and V_mat.
    W_mat <- matsbyname::difference_byname(
      matsbyname::transpose_byname(V_mat), 
      U_mat)
    # Return a named list.
    list(W_mat) |> magrittr::set_names(W)
  }
  # The body of the main function consists of a call to matsindf_apply
  # that specifies the inner function in the FUN argument.
  matsindf_apply(.DF, FUN = W_func, U_mat = U, V_mat = V)
}
```

This style of writing `matsindf_apply()` functions is incredibly versatile,
leveraging the capabilities of both the `matsindf` and `matsbyname` packages. 
(Indeed, the `Recca` package 
uses `matsindf_apply()` heavily and
is built upon the functions in the `matsindf` and `matsbyname` packages.)

Functions written like `calc_W()`
can operate in ways similar to `matsindf_apply()` itself.
To demonstrate, we'll use `calc_W()` in all the ways that `matsindf_apply()` can be used,
going in the reverse order to our demonstration of the capabilities of `matsindf_apply()` above.

`calc_W()` can be used as a specialized `mutate` function
that operates on `matsindf` data frames.

```{r}
midf |> calc_W()
```

The added column could be given a different name from the default ("`W`")
using the `W` argument.

```{r}
midf |> calc_W(W = "W_prime")
```

As with `matsindf_apply()`, 
column names in `midf` can be mapped to the arguments of `calc_W()`
by the arguments to `calc_W()`.

```{r}
midf |> 
  dplyr::rename(X = U, Y = V) |> 
  calc_W(U = "X", V = "Y")
```

`calc_W()` can operate on lists of single matrices, too.
This approach works, because the default values for the 
`U` and `V` arguments to `calc_W()` are 
"U" and "V", respectively.
The input list members (in this case `midf$U[[1]]` and `midf$V[[1]]`)
are returned with the output, because
`list(U = midf$U[[1]], V = midf$V[[1]])` is passed to the `.dat` argument
of `matsindf_apply()`.

```{r}
calc_W(list(U = midf$U[[1]], V = midf$V[[1]]))
```

It may be clearer to name the arguments as required by the `calc_W()` function
without wrapping in a list first,
as shown below.
But in this approach, the input matrices are not returned with the output,
because arguments `U` and `V` are passed to the `...` argument of `matsindf_apply()`,
not the `.dat` argument of `matsindf_apply()`.

```{r}
calc_W(U = midf$U[[1]], V = midf$V[[1]])
```

`calc_W()` can operate on data frames containing single numbers.

```{r}
data.frame(U = c(1, 2), V = c(3, 4)) |> calc_W()
```

Finally, `calc_W()` can be applied to single numbers,
and the result is 1x1 matrix.

```{r}
calc_W(U = 2, V = 3)
```

It is good practice to write internal functions
that tolerate zero-length inputs, as `calc_W()` does.
Doing so, enables results from different calculations to be `rbind`ed together.

```{r}
calc_W(U = numeric(), V = numeric())
calc_W(list(U = numeric(), V = numeric()))

res <- calc_W(list(U = c(2, 3, 4, 5), V = c(3, 4, 5, 6)))
res0 <- calc_W(list(U = numeric(), V = numeric()))
dplyr::bind_rows(res, res0)
```


## Conclusion

This vignette demonstrated use of
the versatile `matsindf_apply()` function.
Inputs to `matsindf_apply()` can be 

* single numbers,
* matrices, or
* data frames with appropriately-named columns.

`matsindf_apply()` can be used for programming, and 
functions constructed as demonstrated above
share characteristics with `matsindf_apply()`:

* they can be used as specialized `dplyr::mutate()` operators, and
* they can be applied to single numbers, matrices, or
  data frames with appropriately-named columns.