/
xmap.Rmd
135 lines (105 loc) · 4.98 KB
/
xmap.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
title: "Getting Started with Crossmaps"
output:
rmarkdown::html_vignette:
toc: yes
vignette: >
%\VignetteIndexEntry{Getting Started with Crossmaps}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Checking existing pipelines
`xmap` offers verification functions for mappings encoded in named vectors or lists as well as data frames. Many of these checks will seem trivial in short mappings, but can be useful checks for longer or more complex mappings.
For example, if you are using a named list to manually recode categorical variables (e.g. using `forcats::fct_recode`), then you might want to verify some properties of the level mappings -- e.g. each existing level is only mapped once:
```{r}
library(forcats)
library(xmap)
x <- factor(c("apple", "bear", "banana", "dear"))
levels <- ## new = "old" levels
c(fruit = "apple", fruit = "banana") |>
verify_named_all_values_unique()
forcats::fct_recode(x, !!!levels)
```
```{r error=TRUE}
mistake_levels <- c(fruit = "apple", fruit = "banana", veg = "banana") |>
verify_named_all_values_unique()
```
Similarly, if you are using a named list to encode group members, you could check the members against a reference list.
```{r error=TRUE}
## mistakenly assign kate to another group
student_group_mistake <- list(GRP1 = c("kate", "jane", "peter"),
GRP2 = c("terry", "ben", "grace"),
GRP2 = c("cindy", "lucy", "kate" ))
student_list <- c("kate", "jane", "peter", "terry", "ben",
"grace", "cindy", "lucy", "alex")
student_group_mistake |>
verify_named_matchset_values_exact(student_list)
```
If you are redistributing numeric values between categories, see `vignette("making-xmaps")` for details on how to check your transformation weights redistribute exactly 100% of your original data.
## Crossmap transformations
All crossmaps transformations can be decomposed into multiple "standard" data manipulation steps.
1. **Rename original categories into target categories**
2. **Mutate source node values by link weight.**
3. **Summarise mutated values by target node.**
To implement these steps use the following `dplyr` pipeline with verified links:
1. `dplyr::left_join` the target categories (`to`) to the original data via the source labels (`from`).
2. `dplyr::mutate` the source values by multiplying them with the link weights (`weights`) to transform the original values into redistributed values.
3. `dplyr::group_by` and `dplyr::summarise` values by target groups (`to`) to complete any many-to-1 mappings.
For example given some original data (`v19_data`) with source categories (`v19`), and a valid `xmap` with source categories (`from = version19`), target categories (`to = version18`) and link weights (`weights = w19to18`):
```{r message=FALSE}
library(xmap)
library(tibble)
library(dplyr)
```
```{r}
# original data
v19_data <- tibble::tribble(
~v19, ~count,
"1120", 300,
"1121", 400,
"1130", 200,
"1200", 600
)
# valid crossmap
xmap_19to18 <- tibble::tribble(
~version19, ~version18, ~w19to18,
# many-to-1 collapsing
"1120", "A2", 1,
"1121", "A2", 1,
# 1-to-1 recoding
"1130", "A3", 1,
# 1-to-many redistribution
"1200", "A4", 0.6,
"1200", "A5", 0.4
) |>
verify_links_as_xmap(from = version19, to = version18, weights = w19to18)
# transformed data
(v18_data <- dplyr::left_join(x = v19_data,
y = xmap_19to18,
by = c(v19 = "version19")) |>
dplyr::mutate(new_count = count * w19to18) |>
dplyr::group_by(version18) |>
dplyr::summarise(v20_count = sum(new_count))
)
```
Note that we expect multiple matches for 1-to-many relations so the `dplyr` warning can safely be ignored.
## Creating an `xmap_df` object (EXPERIMENTAL)
The `xmap_df` class aims to facilitates additional functionality such as graph property calculation and printing (i.e. relation types and crossmap direction) via a custom `print()` method, coercion to other useful classes (e.g. `xmap_to_matrix()`), visualisation (see `vignette("vis-xmaps")` for prototypes), and multi-step transformations (i.e. from nomenclature A to B to C).
Please note that the `xmap_df` class and related functions are still in active development and subject to breaking changes in future releases.
If you already have a data frame of candidate links, turn them into a valid `xmap` object by specifying the source (`from`) nodes, target (`to`) nodes, and weights (`weights`):
```{r}
uk_shares <- tibble::tribble(
~key1, ~key2, ~shares,
"UK, Channel Islands, Isle of Man", "Scotland", 0.1102047,
"UK, Channel Islands, Isle of Man", "Wales", 0.02720333,
"UK, Channel Islands, Isle of Man", "England", 0.862592
)
uk_shares |>
as_xmap_df(from = key1, to = key2, weights = shares, tol = 3e-08) |>
print()
```
Note that both verification and coercion will fail without adjusting the tolerance `tol`, which specifies differences to ignore (i.e. what counts as close enough to 1).
```{r error=TRUE}
uk_shares |>
verify_links_as_xmap(key1, key2, shares)
```