-
Notifications
You must be signed in to change notification settings - Fork 0
/
read_fr_tdr.Rmd
108 lines (81 loc) · 3.28 KB
/
read_fr_tdr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: "Read and manipulate a tabular-data-resource"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Read and manipulate a tabular-data-resource}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r}
library(fr)
```
The {fr} package comes with an example frictionless tabular-data-resource (tdr) named `hamilton_poverty_2020`. On disk, a tdr is composed of a folder containing a data CSV file (both named based on the `name` of the tdr) *and* a `tabular-data-resource.yaml` file, which contains the metadata descriptors:
```{r}
fs::dir_tree(fs::path_package("fr", "hamilton_poverty_2020"), recurse = TRUE)
```
Read the `hamilton_poverty_2020` tdr into R by specifying the location of the tabular-data-resource file *or* to a folder containing a `tabular-data-resource.yaml` file:
```{r}
d_fr <- read_fr_tdr(fs::path_package("fr", "hamilton_poverty_2020"))
```
Print the returned `fr_tdr` (frictionless tabular-data-resource) object to view all of the table-specific metadata descriptors and the underlying data:
```{r}
d_fr
```
Print the `schema` property to view the table-specific metadata:
```{r}
S7::prop(d_fr, "schema")
```
`fr_tdr` objects can be used mostly anywhere that the underlying data frame can be used because `as.data.frame` usually is used to coerce objects into data frames and works with `fr_tdr` objects:
```{r}
lm(fraction_poverty ~ year, data = d_fr)
```
Accessor functions (`[`, `[[`, `$`) work as they do with data frames and tibbles:
```{r}
head(d_fr$fraction_poverty)
```
In some cases, `fr_tdr` objects need to be disassociated into data and metadata before the data is manipulated and the metadata is rejoined:
```{r}
#| error: true
d_fr |>
dplyr::mutate(high_poverty = fraction_poverty > median(fraction_poverty))
```
In this case, explicitly convert the `fr_tdr` object to a tibble by dropping the metadata attributes using `as_tibble`, `as_data_frame`, or `as.data.frame` and then use `as_fr_tdr()` while specifying the original `fr_tdr` object as a template to convert back to a `fr_tdr` object:
```{r}
d_fr |>
tibble::as_tibble() |>
dplyr::mutate(high_poverty = fraction_poverty > median(fraction_poverty)) |>
as_fr_tdr(.template = d_fr)
```
Shortcuts are provided for some functions from {dplyr} (see `dplyr_methods()` for a full list).
```{r}
d_fr |>
fr_mutate(high_poverty = fraction_poverty > median(fraction_poverty)) |>
fr_select(-year) |>
fr_arrange(desc(fraction_poverty))
```
More complicated dplyr functions (e.g., `group_by()` and friends) as well as functions from other packages that do not coerce their inputs to data.frame objects will need to use the pattern above. Below is a simple example for `dplyr::left_join()`:
```{r}
library(dplyr, warn.conflicts = FALSE)
d_fr <- update_field(d_fr, "fraction_poverty", description = "the poverty fraction")
d_extant <-
d_fr |>
fr_mutate(score = 1 + fraction_poverty) |>
fr_select(-fraction_poverty, -year) |>
as_tibble()
d_fr_new <-
left_join(
as_tibble(d_fr),
d_extant,
by = join_by(census_tract_id_2020 == census_tract_id_2020)
) |>
as_fr_tdr(.template = d_fr) |>
update_field("score", description = "the score")
d_fr_new
S7::prop(d_fr_new, "schema")
```